869 resultados para height partition clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Homogeneous temperature regions are necessary for use in hydrometeorological studies. The regions are often delineated by analysing statistics derived from time series of maximum, minimum or mean temperature, rather than attributes influencing temperature. This practice cannot yield meaningful regions in data-sparse areas. Further, independent validation of the delineated regions for homogeneity in temperature is not possible, as temperature records form the basis to arrive at the regions. To address these issues, a two-stage clustering approach is proposed in this study to delineate homogeneous temperature regions. First stage of the approach involves (1) determining correlation structure between observed temperature over the study area and possible predictors (large-scale atmospheric variables) influencing the temperature and (2) using the correlation structure as the basis to delineate sites in the study area into clusters. Second stage of the approach involves analysis on each of the clusters to (1) identify potential predictors (large-scale atmospheric variables) influencing temperature at sites in the cluster and (2) partition the cluster into homogeneous fuzzy temperature regions using the identified potential predictors. Application of the proposed approach to India yielded 28 homogeneous regions that were demonstrated to be effective when compared to an alternate set of 6 regions that were previously delineated over the study area. Intersite cross-correlations of monthly maximum and minimum temperatures in the existing regions were found to be weak and negative for several months, which is undesirable. This problem was not found in the case of regions delineated using the proposed approach. Utility of the proposed regions in arriving at estimates of potential evapotranspiration for ungauged locations in the study area is demonstrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Elucidating the intricate relationship between brain structure and function, both in healthy and pathological conditions, is a key challenge for modern neuroscience. Recent progress in neuroimaging has helped advance our understanding of this important issue, with diffusion images providing information about structural connectivity (SC) and functional magnetic resonance imaging shedding light on resting state functional connectivity (rsFC). Here, we adopt a systems approach, relying on modular hierarchical clustering, to study together SC and rsFC datasets gathered independently from healthy human subjects. Our novel approach allows us to find a common skeleton shared by structure and function from which a new, optimal, brain partition can be extracted. We describe the emerging common structure-function modules (SFMs) in detail and compare them with commonly employed anatomical or functional parcellations. Our results underline the strong correspondence between brain structure and resting-state dynamics as well as the emerging coherent organization of the human brain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a novel protocol which uses the Internet Domain Name System (DNS) to partition Web clients into disjoint sets, each of which is associated with a single DNS server. We define an L-DNS cluster to be a grouping of Web Clients that use the same Local DNS server to resolve Internet host names. We identify such clusters in real-time using data obtained from a Web Server in conjunction with that server's Authoritative DNS―both instrumented with an implementation of our clustering algorithm. Using these clusters, we perform measurements from four distinct Internet locations. Our results show that L-DNS clustering enables a better estimation of proximity of a Web Client to a Web Server than previously proposed techniques. Thus, in a Content Distribution Network, a DNS-based scheme that redirects a request from a web client to one of many servers based on the client's name server coordinates (e.g., hops/latency/loss-rates between the client and servers) would perform better with our algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the electricity market liberalization, the distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity consumers. A fair insight on the consumers’ behavior will permit the definition of specific contract aspects based on the different consumption patterns. In order to form the different consumers’ classes, and find a set of representative consumption patterns we use electricity consumption data from a utility client’s database and two approaches: Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. While EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process, the WEACS approach uses subsampling and weights differently the partitions. As a complementary step to the WEACS approach, we combine the partitions obtained in the WEACS approach with the ALL clustering ensemble construction method and we use the Ward Link algorithm to obtain the final data partition. The characterization of the obtained consumers’ clusters was performed using the C5.0 classification algorithm. Experiment results showed that the WEACS approach leads to better results than many other clustering approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The intensity of regional specialization in specific activities, and conversely, the level of industrial concentration in specific locations, has been used as a complementary evidence for the existence and significance of externalities. Additionally, economists have mainly focused the debate on disentangling the sources of specialization and concentration processes according to three vectors: natural advantages, internal, and external scale economies. The arbitrariness of partitions plays a key role in capturing these effects, while the selection of the partition would have to reflect the actual characteristics of the economy. Thus, the identification of spatial boundaries to measure specialization becomes critical, since most likely the model will be adapted to different scales of distance, and be influenced by different types of externalities or economies of agglomeration, which are based on the mechanisms of interaction with particular requirements of spatial proximity. This work is based on the analysis of the spatial aspect of economic specialization supported by the manufacturing industry case. The main objective is to propose, for discrete and continuous space: i) a measure of global specialization; ii) a local disaggregation of the global measure; and iii) a spatial clustering method for the identification of specialized agglomerations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Map showing the whole of New Jersey and its borders with as well as part of Pennsylvania and New York. Map is drawn in black ink with green, pink, and yellow watercolors used to show features such as waterways, borders, and places of interest. Notes on map concern border disputes between New Jersey and New York.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Document concerns how certain lands around Casco Bay, York Co., Maine, were partitioned in accordance with an order of a writ.