853 resultados para height partition clustering
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.
Resumo:
Partition of Unity Implicits (PUI) has been recently introduced for surface reconstruction from point clouds. In this work, we propose a PUI method that employs a set of well-observed solutions in order to produce geometrically pleasant results without requiring time consuming or mathematically overloaded computations. One feature of our technique is the use of multivariate orthogonal polynomials in the least-squares approximation, which allows the recursive refinement of the local fittings in terms of the degree of the polynomial. However, since the use of high-order approximations based only on the number of available points is not reliable, we introduce the concept of coverage domain. In addition, the method relies on the use of an algebraically defined triangulation to handle two important tasks in PUI: the spatial decomposition and an adaptive polygonization. As the spatial subdivision is based on tetrahedra, the generated mesh may present poorly-shaped triangles that are improved in this work by means a specific vertex displacement technique. Furthermore, we also address sharp features and raw data treatment. A further contribution is based on the PUI locality property that leads to an intuitive scheme for improving or repairing the surface by means of editing local functions.
Resumo:
We study a symplectic chain with a non-local form of coupling by means of a standard map lattice where the interaction strength decreases with the lattice distance as a power-law, in Such a way that one can pass continuously from a local (nearest-neighbor) to a global (mean-field) type of coupling. We investigate the formation of map clusters, or spatially coherent structures generated by the system dynamics. Such clusters are found to be related to stickiness of chaotic phase-space trajectories near periodic island remnants, and also to the behavior of the diffusion coefficient. An approximate two-dimensional map is derived to explain some of the features of this connection. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
We analyze by numerical simulations and mean-field approximations an asymmetric version of the stochastic sandpile model with height restriction in one dimension. Each site can have at most two particles. Single particles are inactive and do not move. Two particles occupying the same site are active and may hop to neighboring sites following an asymmetric rule. Jumps to the right or to the left occur with distinct probabilities. In the active state, there will be a net current of particles to the right or to the left. We have found that the critical behavior related to the transition from the active to the absorbing state is distinct from the symmetrical case, making the asymmetry a relevant field.
Resumo:
In 1983, Chvatal, Trotter and the two senior authors proved that for any Delta there exists a constant B such that, for any n, any 2-colouring of the edges of the complete graph K(N) with N >= Bn vertices yields a monochromatic copy of any graph H that has n vertices and maximum degree Delta. We prove that the complete graph may be replaced by a sparser graph G that has N vertices and O(N(2-1/Delta)log(1/Delta)N) edges, with N = [B`n] for some constant B` that depends only on Delta. Consequently, the so-called size-Ramsey number of any H with n vertices and maximum degree Delta is O(n(2-1/Delta)log(1/Delta)n) Our approach is based on random graphs; in fact, we show that the classical Erdos-Renyi random graph with the numerical parameters above satisfies a stronger partition property with high probability, namely, that any 2-colouring of its edges contains a monochromatic universal graph for the class of graphs on n vertices and maximum degree Delta. The main tool in our proof is the regularity method, adapted to a suitable sparse setting. The novel ingredient developed here is an embedding strategy that allows one to embed bounded degree graphs of linear order in certain pseudorandom graphs. Crucial to our proof is the fact that regularity is typically inherited at a scale that is much finer than the scale at which it is assumed. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
In the present work, a new approach for the determination of the partition coefficient in different interfaces based on the density function theory is proposed. Our results for log P(ow) considering a n-octanol/water interface for a large super cell for acetone -0.30 (-0.24) and methane 0.95 (0.78) are comparable with the experimental data given in parenthesis. We believe that these differences are mainly related to the absence of van der Walls interactions and the limited number of molecules considered in the super cell. The numerical deviations are smaller than that observed for interpolation based tools. As the proposed model is parameter free, it is not limited to the n-octanol/water interface.
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700 and ∼9,500 SNPs explained ∼21%, ∼24% and ∼29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.2014