869 resultados para height partition clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trypanosoma (Megatrypanum) theileri from cattle and trypanosomes of other artiodactyls form a clade of closely related species in analyses using ribosomal sequences. Analysis of polymorphic sequences of a larger number of trypanosomes from broader geographical origins is required to evaluate the Clustering of isolates as suggested by previous studies. Here, we determined the sequences of the spliced leader (SL) genes of 21 isolates from cattle and 2 from water buffalo from distant regions of Brazil. Analysis of SL gene repeats revealed that the 5S rRNA gene is inserted within the intergenic region. Phylogeographical patterns inferred using SL sequences showed at least 5 major genotypes of T. theileri distributed in 2 strongly divergent lineages. Lineage TthI comprises genotypes IA and IB from buffalo and cattle, respectively, from the Southeast and Central regions, whereas genotype IC is restricted to cattle from the Southern region. Lineage Tth II includes cattle genotypes IIA, which is restricted to the North and Northeast, and IIB, found in the Centre, West, North and Northeast. PCR-RFLP of SL genes revealed valuable markers for genotyping T. theileri. The results of this study emphasize the genetic complexity and corroborate the geographical structuring of T. theileri genotypes found in cattle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We characterized 28 new isolates of Trypanosoma cruzi IIc (TCIIc) of mammals and triatomines from Northern to Southern Brazil, confirming the widespread distribution of this lineage. Phylogenetic analyses using cytochrome b and SSU rDNA sequences clearly separated TCIIc from TCIIa according to terrestrial and arboreal ecotopes of their preferential mammalian hosts and vectors. TCIIc was more closely related to TCIId/e, followed by TCIIa, and separated by large distances from TCIIb and TCI. Despite being indistinguishable by traditional genotyping and generally being assigned to Z3, we provide evidence that TCIIa from South America and TCIIa from North America correspond to independent lineages that circulate in distinct hosts and ecological niches. Armadillos, terrestrial didelphids and rodents, and domestic dogs were found infected by TCIIc in Brazil. We believe that, in Brazil, this is the first description of TCIIc from rodents and domestic dogs. Terrestrial triatomines of genera Panstrongylus and Triatoma were confirmed as vectors of TCIIc. Together, habitat, mammalian host and vector association corroborated the link between TCIIc and terrestrial transmission cycles/ecological niches. Analysis of ITS1 rDNA sequences disclosed clusters of TCIIc isolates in accordance with their geographic origin, independent of their host species. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Partition of Unity Implicits (PUI) has been recently introduced for surface reconstruction from point clouds. In this work, we propose a PUI method that employs a set of well-observed solutions in order to produce geometrically pleasant results without requiring time consuming or mathematically overloaded computations. One feature of our technique is the use of multivariate orthogonal polynomials in the least-squares approximation, which allows the recursive refinement of the local fittings in terms of the degree of the polynomial. However, since the use of high-order approximations based only on the number of available points is not reliable, we introduce the concept of coverage domain. In addition, the method relies on the use of an algebraically defined triangulation to handle two important tasks in PUI: the spatial decomposition and an adaptive polygonization. As the spatial subdivision is based on tetrahedra, the generated mesh may present poorly-shaped triangles that are improved in this work by means a specific vertex displacement technique. Furthermore, we also address sharp features and raw data treatment. A further contribution is based on the PUI locality property that leads to an intuitive scheme for improving or repairing the surface by means of editing local functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study a symplectic chain with a non-local form of coupling by means of a standard map lattice where the interaction strength decreases with the lattice distance as a power-law, in Such a way that one can pass continuously from a local (nearest-neighbor) to a global (mean-field) type of coupling. We investigate the formation of map clusters, or spatially coherent structures generated by the system dynamics. Such clusters are found to be related to stickiness of chaotic phase-space trajectories near periodic island remnants, and also to the behavior of the diffusion coefficient. An approximate two-dimensional map is derived to explain some of the features of this connection. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analyze by numerical simulations and mean-field approximations an asymmetric version of the stochastic sandpile model with height restriction in one dimension. Each site can have at most two particles. Single particles are inactive and do not move. Two particles occupying the same site are active and may hop to neighboring sites following an asymmetric rule. Jumps to the right or to the left occur with distinct probabilities. In the active state, there will be a net current of particles to the right or to the left. We have found that the critical behavior related to the transition from the active to the absorbing state is distinct from the symmetrical case, making the asymmetry a relevant field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In 1983, Chvatal, Trotter and the two senior authors proved that for any Delta there exists a constant B such that, for any n, any 2-colouring of the edges of the complete graph K(N) with N >= Bn vertices yields a monochromatic copy of any graph H that has n vertices and maximum degree Delta. We prove that the complete graph may be replaced by a sparser graph G that has N vertices and O(N(2-1/Delta)log(1/Delta)N) edges, with N = [B`n] for some constant B` that depends only on Delta. Consequently, the so-called size-Ramsey number of any H with n vertices and maximum degree Delta is O(n(2-1/Delta)log(1/Delta)n) Our approach is based on random graphs; in fact, we show that the classical Erdos-Renyi random graph with the numerical parameters above satisfies a stronger partition property with high probability, namely, that any 2-colouring of its edges contains a monochromatic universal graph for the class of graphs on n vertices and maximum degree Delta. The main tool in our proof is the regularity method, adapted to a suitable sparse setting. The novel ingredient developed here is an embedding strategy that allows one to embed bounded degree graphs of linear order in certain pseudorandom graphs. Crucial to our proof is the fact that regularity is typically inherited at a scale that is much finer than the scale at which it is assumed. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present work, a new approach for the determination of the partition coefficient in different interfaces based on the density function theory is proposed. Our results for log P(ow) considering a n-octanol/water interface for a large super cell for acetone -0.30 (-0.24) and methane 0.95 (0.78) are comparable with the experimental data given in parenthesis. We believe that these differences are mainly related to the absence of van der Walls interactions and the limited number of molecules considered in the super cell. The numerical deviations are smaller than that observed for interpolation based tools. As the proposed model is parameter free, it is not limited to the n-octanol/water interface.