40 resultados para CLUSTER ANALYSIS
Resumo:
The Indian Ocean water that ends up in the Atlantic Ocean detaches from the Agulhas Current retroflection predominantly in the form of Agulhas rings and cyclones. Using numerical Lagrangian float trajectories in a high-resolution numerical ocean model, the fate of coherent structures near the Agulhas Current retroflection is investigated. It is shown that within the Agulhas Current, upstream of the retroflection, the spatial distributions of floats ending in the Atlantic Ocean and floats ending in the Indian Ocean are to a large extent similar. This indicates that Agulhas leakage occurs mostly through the detachment of Agulhas rings. After the floats detach from the Agulhas Current, the ambient water quickly looses its relative vorticity. The Agulhas rings thus seem to decay and loose much of their water in the Cape Basin. A cluster analysis reveals that most water in the Agulhas Current is within clusters of 180 km in diameter. Halfway in the Cape Basin there is an increase in the number of larger clusters with low relative vorticity, which carry the bulk of the Agulhas leakage transport through the Cape Basin. This upward cascade with respect to the length scales of the leakage, in combination with a power law decay of the magnitude of relative vorticity, might be an indication that the decay of Agulhas rings is somewhat comparable to the decay of two-dimensional turbulence.
Resumo:
This article illustrates the usefulness of applying bootstrap procedures to total factor productivity Malmquist indices, derived with data envelopment analysis (DEA), for a sample of 250 Polish farms during 1996-2000. The confidence intervals constructed as in Simar and Wilson suggest that the common portrayal of productivity decline in Polish agriculture may be misleading. However, a cluster analysis based on bootstrap confidence intervals reveals that important policy conclusions can be drawn regarding productivity enhancement.
Resumo:
Pollinators provide essential ecosystem services, and declines in some pollinator communities around the world have been reported. Understanding the fundamental components defining these communities is essential if conservation and restoration are to be successful. We examined the structure of plant-pollinator communities in a dynamic Mediterranean landscape, comprising a mosaic of post-fire regenerating habitats, and which is a recognized global hotspot for bee diversity. Each community was characterized by a highly skewed species abundance distribution, with a few dominant and many rare bee species, and was consistent with a log series model indicating that a few environmental factors govern the community. Floral community composition, the quantity and quality of forage resources present, and the geographic locality organized bee communities at various levels: (1) The overall structure of the bee community (116 species), as revealed through ordination, was dependent upon nectar resource diversity (defined as the variety of nectar volume-concentration combinations available), the ratio of pollen to nectar energy, floral diversity, floral abundance, and post-fire age. (2) Bee diversity, measured as species richness, was closely linked to floral diversity (especially of annuals), nectar resource diversity, and post-fire age of the habitat. (3) The abundance of the most common species was primarily related to post-fire age, grazing intensity, and nesting substrate availability. Ordination models based on age-characteristic post-fire floral community structure explained 39-50% of overall variation observed in bee community structure. Cluster analysis showed that all the communities shared a high degree of similarity in their species composition (27-59%); however, the geographical location of sites also contributed a smaller but significant component to bee community structure. We conclude that floral resources act in specific and previously unexplored ways to modulate the diversity of the local geographic species pool, with specific disturbance factors, superimposed upon these patterns, mainly affecting the dominant species.
Resumo:
Thymus is taxonomically a very complex genus with a high frequency of hybridisation and introgression among sympatric species. The variation in accumulation of leaf-surface flavonoids was investigated in 71 wild populations of Thymus front different putative hybrid swarm areas in Andalucia, Spain. Twenty-two flavones, five flavanones, two dihydroflavonols, a flavonol and two unknowns were detected by HPLC-DAD combined with LC-APCI-MS analysis. The majority of compounds were flavones with a lutelin-type substitution of the B-ring, in contrast to previous reports on Macedonian taxa, which predominantly accumulate flavones with apigenin-type substitution of the B-ring. Anatomical and morphometric studies, supported by cluster analysis, identified pure Thymus hyemalis and Thymus baeticus populations, and a large number of putative hybrids. Flavonoid variation was closely related to morphological variation in all populations and is suspected to be a result of genetic polymorphism. Principal component analysis identified the presence of species-specific and geographically linked chemotypes and putative hybrids with mixed morphological and chemical characteristics. Qualitative and quantitative flavonoid accumulation appears to be genetically regulated, while external factors play a secondary role. Flavonoid profiles can thus provide diagnostic markers for the taxonomy of Thymus and are also useful in detecting hybridising taxa. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Much prior research on the structure and performance of UK real estate portfolios has relied on aggregated measures for sector and region. For these groupings to have validity, the performance of individual properties within each group should be similar. This paper analyses a sample of 1,200 properties using multiple discriminant analysis and cluster analysis techniques. It is shown that conventional property type and spatial classifications do not capture the variation in return behaviour at the individual building level. The major feature is heterogeneity - but there may be distinctions between growth and income properties and between single and multi-let properties that could help refine portfolio structures.
Resumo:
Gene Chips are finding extensive use in animal and plant science. Generally microarrays are of two kind, cDNA or oligonucleotide. cDNA microarrays were developed at Stanford University, whereas oligonucleotide were developed by Affymetrix. The construction of cDNA or oligonucleotide on a glass slide helps to compare the gene expression level of treated and control samples by labeling mRNA with green (Cy3) and red (Cy5) dyes. The hybridized gene chip emit fluorescence whose intensity and colour can be measured. RNA labeling can be done directly or indirectly. Indirect method involves amino allyle modified dUTP instead of pre-labelled nucleotide. Hybridization of gene chip generally occurs in a minimum volume possible and to ensure the hetroduplex formation, a ten fold more DNA is spotted on slide than in the solutions. A confocal or semi confocal laser technologies coupled with CCD camera are used for image acquisition. For standardization, house keeping genes are used or cDNA are spotted in gene chip that are not present in treated or control samples. Moreover, statistical analysis (image analysis) and cluster analysis softwares have been developed by Stanford University. The gene-chip technology has many applications like expression analysis, gene expression signatures (molecular phenotypes) and promoter regulatory element co-expression.
Resumo:
Investments in direct real estate are inherently difficult to segment compared to other asset classes due to the complex and heterogeneous nature of the asset. The most common segmentation in real estate investment analysis relies on property sector and geographical region. In this paper, we compare the predictive power of existing industry classifications with a new type of segmentation using cluster analysis on a number of relevant property attributes including the equivalent yield and size of the property as well as information on lease terms, number of tenants and tenant concentration. The new segments are shown to be distinct and relatively stable over time. In a second stage of the analysis, we test whether the newly generated segments are able to better predict the resulting financial performance of the assets than the old dichotomous segments. Applying both discriminant and neural network analysis we find mixed evidence for this hypothesis. Overall, we conclude from our analysis that each of the two approaches to segmenting the market has its strengths and weaknesses so that both might be applied gainfully in real estate investment analysis and fund management.
Resumo:
Purpose - The role of affective states in consumer behaviour is well established. However, no study to date has empirically examined online affective states as a basis for constructing typologies of internet users and for assessing the invariance of clusters across national cultures. Design/methodology/approach - Four focus groups with internet users were carried out to adapt a set of affective states identified from the literature to the online environment. An online survey was then designed to collect data from internet users in four Western and four East Asian countries. Findings - Based on a cluster analysis, six cross-national market segments are identified and labelled "Positive Online Affectivists", "Offline Affectivists", "On/Off-line Negative Affectivists", "Online Affectivists", "Indistinguishable Affectivists", and "Negative Offline Affectivists". The resulting clusters discriminate on the basis of national culture, gender, working status and perceptions towards online brands. Practical implications - Marketers may use this typology to segment internet users in order to predict their perceptions towards online brands. Also, a standardised approach to e-marketing is not recommended on the basis of affective state-based segmentation. Originality/value - This is the first study proposing affective state-based typologies of internet users using comparable samples from four Western and four East Asian countries.
Resumo:
This study investigated 37 diverse sainfoin (Onobrychis viciifolia Scop.) accessions from the EU ‘HealthyHay’ germplasm collection for proanthocyanidin (PA) content and composition. Accessions displayed a wide range of differences: PA contents varied from 0.57 to 2.80 g/100 g sainfoin; the mean degree of polymerisation from 12 to 84; the proportion of prodelphinidin tannins from 53% to 95%, and the proportion of trans-flavanol units from 12% to 34%. A positive correlation was found between PA contents (thiolytic versus acid–butanol degradation; P < 0.001; R2 = 0.49). A negative correlation existed between PA content (thiolysis) and mDP (P < 0.05; R2 = −0.30), which suggested that accessions with high PA contents had smaller PA polymers. Cluster analysis revealed that European accessions clustered into two main groups: Western Europe and Eastern Europe/Asia. In addition, accessions from USA, Canada and Armenia tended to cluster together. Overall, there was broad agreement between tannin clusters and clusters that were based on morphological and agronomic characteristics.
Resumo:
In order to identify the factors influencing adoption of technologies promoted by government to small-scale dairy farmers in the highlands of central Mexico, a field survey was conducted. A total of 115 farmers were grouped through cluster analysis (CA) and divided into three wealth status categories (high, medium and low) using wealth ranking. Chi-square analysis was used to examine the association of wealth status with technology adoption. Four groups of farms were differentiated in terms of farms’ dimensions, farmers’ education, sources of incomes, wealth status, management of herd, monetary support by government and technological availability. Statistical differences (p < 0.05) were observed in the milk yield per herd per year among groups. Government organizations (GO) participated little in the promotion of the 17 technologies identified, six of which focused on crop or forage production and 11 of which were related to animal husbandry. Relatives and other farmers played an important role in knowledge diffusion and technology adoption. Although wealth status had a significant association (p < 0.05) with adoption, other factors including importance of the technology to farmers, usefulness and productive benefits of innovations together with farmers’ knowledge of them, were important. It is concluded that the analysis of the information per group and wealth status was useful to identify suitable crop or forage related and animal husbandry technologies per group and wealth status of farmers. Therefore the characterizations of farmers could provide a useful starting point for the design and delivery of more appropriate and effective extension.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.