795 resultados para hierarchical clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Point placement strategies aim at mapping data points represented in higher dimensions to bi-dimensional spaces and are frequently used to visualize relationships amongst data instances. They have been valuable tools for analysis and exploration of data sets of various kinds. Many conventional techniques, however, do not behave well when the number of dimensions is high, such as in the case of documents collections. Later approaches handle that shortcoming, but may cause too much clutter to allow flexible exploration to take place. In this work we present a novel hierarchical point placement technique that is capable of dealing with these problems. While good grouping and separation of data with high similarity is maintained without increasing computation cost, its hierarchical structure lends itself both to exploration in various levels of detail and to handling data in subsets, improving analysis capability and also allowing manipulation of larger data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several popular Machine Learning techniques are originally designed for the solution of two-class problems. However, several classification problems have more than two classes. One approach to deal with multiclass problems using binary classifiers is to decompose the multiclass problem into multiple binary sub-problems disposed in a binary tree. This approach requires a binary partition of the classes for each node of the tree, which defines the tree structure. This paper presents two algorithms to determine the tree structure taking into account information collected from the used dataset. This approach allows the tree structure to be determined automatically for any multiclass dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study a symplectic chain with a non-local form of coupling by means of a standard map lattice where the interaction strength decreases with the lattice distance as a power-law, in Such a way that one can pass continuously from a local (nearest-neighbor) to a global (mean-field) type of coupling. We investigate the formation of map clusters, or spatially coherent structures generated by the system dynamics. Such clusters are found to be related to stickiness of chaotic phase-space trajectories near periodic island remnants, and also to the behavior of the diffusion coefficient. An approximate two-dimensional map is derived to explain some of the features of this connection. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A continuous version of the hierarchical spherical model at dimension d=4 is investigated. Two limit distributions of the block spin variable X(gamma), normalized with exponents gamma = d + 2 and gamma=d at and above the critical temperature, are established. These results are proven by solving certain evolution equations corresponding to the renormalization group (RG) transformation of the O(N) hierarchical spin model of block size L(d) in the limit L down arrow 1 and N ->infinity. Starting far away from the stationary Gaussian fixed point the trajectories of these dynamical system pass through two different regimes with distinguishable crossover behavior. An interpretation of this trajectories is given by the geometric theory of functions which describe precisely the motion of the Lee-Yang zeroes. The large-N limit of RG transformation with L(d) fixed equal to 2, at the criticality, has recently been investigated in both weak and strong (coupling) regimes by Watanabe (J. Stat. Phys. 115:1669-1713, 2004) . Although our analysis deals only with N = infinity case, it complements various aspects of that work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, we propose a hierarchical extension of the polygonality index as the means to characterize geographical planar networks. By considering successive neighborhoods around each node, it is possible to obtain more complete information about the spatial order of the network at progressive spatial scales. The potential of the methodology is illustrated with respect to synthetic and real geographical networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hierarchical assemblies of CaMoO4 (CM) nano-octahedrons were obtained by microwave-assisted hydrothemial synthesis at 120 degrees C for different times. These structures were structurally, morphologically and optically characterized by X-ray diffraction, micro-Raman spectroscopy, field-emission gun scanning electron microscopy, ultraviolet-visible absorption spectroscopy and photoluminescence measurements. First-principle calculations have been carried out to understand the structural and electronic order-disorder effects as a function of the particle/region size. Supercells of different dimensions were constructed to simulate the geometric distortions along both they and z planes of the scheelite structure. Based on these experimental results and with the help of detailed structural simulations, we were able to model the nature of the order-disorder in this important class of materials and discuss the consequent implications on its physical properties, in particular, the photoluminescence properties of CM nanocrystals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.