873 resultados para agglomerative clustering


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Los mapas de vegetación son a menudo utilizados como proxis de una estratificación de hábitats para generar distribuciones geográficas contínuas de organismos a partir de datos discretos mediante modelos multi-variantes. Sin embargo, los mapas de vegetación suelen ser poco apropiados para ser directamente aplicados a este fin, pues sus categorías no se concibieron con la intención de corresponder a tipos de hábitat. En este artículo presentamos y aplicamos el método de Agrupamiento por Doble Criterio para generalizar un mapa de vegetación extraordinariamente detallado (350 clases) del Parque Natural del Montseny (Cataluña) en categorías que mantienen la coherencia tanto desde el punto de vista estructural (a través de una matriz de disimilaridad espectral calculada mediante una imágen del satélite SPOT-5) como en términos de vegetación (gracias a una matriz de disimilaridad calculada mediante propiedades de vegetación deducidas de la leyenda jerárquica del mapa). El método simplifica de 114 a 18 clases el 67% del área de estudio. Añadiendo otras agregaciones más triviales basadas exclusivamente en criterios de cubierta de suelo, el 73% del área de estudio pasa de 167 a 25 categorías. Como valor añadido, el método identifica el 10% de los polígonos originales como anómalos (a partir de comparar las propiedades espectrales de cada polígono con el resto de los de su clases), lo que implica cambios en la cubierta entre las fechas del soporte utilizado para generar el mapa original y la imagen de satélite, o errores en la producción de éste.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Flow in geophysical fluids is commonly summarized by coherent streams, for example conveyor belt flows in extratropical cyclones or jet streaks in the upper troposphere. Typically, parcel trajectories are calculated from the flow field and subjective thresholds are used to distinguish coherent streams of interest. This methodology contribution develops a more objective approach to distinguish coherent airstreams within extratropical cyclones. Agglomerative clustering is applied to trajectories along with a method to identify the optimal number of cluster classes. The methodology is applied to trajectories associated with the low-level jets of a well-studied extratropical cyclone. For computational efficiency, a constraint that trajectories must pass through these jet regions is applied prior to clustering; the partitioning into different airstreams is then performed by the agglomerative clustering. It is demonstrated that the methodology can identify the salient flow structures of cyclones: the warm and cold conveyor belts. A test focusing on the airstreams terminating at the tip of the bent-back front further demonstrates the success of the method in that it can distinguish fine-scale flow structure such as descending sting jet airstreams.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a new approach to detect and track maritime objects in real time. The approach particularly addresses the highly dynamic maritime environment, panning cameras, target scale changes, and operates on both visible and thermal imagery. Object detection is based on agglomerative clustering of temporally stable features. Object extents are first determined based on persistence of detected features and their relative separation and motion attributes. An explicit cluster merging and splitting process handles object creation and separation. Stable object clus- ters are tracked frame-to-frame. The effectiveness of the approach is demonstrated on four challenging real-world public datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the current market system, power systems are operated at higher loads for economic reasons. Power system stability becomes a genuine concern in such operating conditions. In case of failure of any larger component, the system may become stressed. These events may start cascading failures, which may lead to blackouts. One of the main reasons of the major recorded blackout events has been the unavailability of system-wide information. Synchrophasor technology has the capability to provide system-wide real time information. Phasor Measurement Units (PMUs) are the basic building block of this technology, which provide the Global Positioning System (GPS) time-stamped voltage and current phasor values along with the frequency. It is being assumed that synchrophasor data of all the buses is available and thus the whole system is fully observable. This information can be used to initiate islanding or system separation to avoid blackouts. A system separation strategy using synchrophasor data has been developed to answer the three main aspects of system separation: (1) When to separate: One class support machines (OC-SVM) is primarily used for the anomaly detection. Here OC-SVM was used to detect wide area instability. OC-SVM has been tested on different stable and unstable cases and it is found that OC-SVM has the capability to detect the wide area instability and thus is capable to answer the question of “when the system should be separated”. (2) Where to separate: The agglomerative clustering technique was used to find the groups of coherent buses. The lines connecting different groups of coherent buses form the separation surface. The rate of change of the bus voltage phase angles has been used as the input to this technique. This technique has the potential to exactly identify the lines to be tripped for the system separation. (3) What to do after separation: Load shedding was performed approximately equal to the sum of power flows along the candidate system separation lines should be initiated before tripping these lines. Therefore it is recommended that load shedding should be initiated before tripping the lines for system separation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.

Relevância:

20.00% 20.00%

Publicador: