5 resultados para Clustering a large document collection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We characterized 28 new isolates of Trypanosoma cruzi IIc (TCIIc) of mammals and triatomines from Northern to Southern Brazil, confirming the widespread distribution of this lineage. Phylogenetic analyses using cytochrome b and SSU rDNA sequences clearly separated TCIIc from TCIIa according to terrestrial and arboreal ecotopes of their preferential mammalian hosts and vectors. TCIIc was more closely related to TCIId/e, followed by TCIIa, and separated by large distances from TCIIb and TCI. Despite being indistinguishable by traditional genotyping and generally being assigned to Z3, we provide evidence that TCIIa from South America and TCIIa from North America correspond to independent lineages that circulate in distinct hosts and ecological niches. Armadillos, terrestrial didelphids and rodents, and domestic dogs were found infected by TCIIc in Brazil. We believe that, in Brazil, this is the first description of TCIIc from rodents and domestic dogs. Terrestrial triatomines of genera Panstrongylus and Triatoma were confirmed as vectors of TCIIc. Together, habitat, mammalian host and vector association corroborated the link between TCIIc and terrestrial transmission cycles/ecological niches. Analysis of ITS1 rDNA sequences disclosed clusters of TCIIc isolates in accordance with their geographic origin, independent of their host species. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Autosomal recessive spastic paraplegia with thinning of corpus callosum (ARHSP-TCC) is a complex form of HSP initially described in Japan but subsequently reported to have a worldwide distribution with a particular high frequency in multiple families from the Mediterranean basin. We recently showed that ARHSP-TCC is commonly associated with mutations in SPG11/KIAA1840 on chromosome 15q. We have now screened a collection of new patients mainly originating from Italy and Brazil, in order to further ascertain the spectrum of mutations in SPG11, enlarge the ethnic origin of SPG11 patients, determine the relative frequency at the level of single Countries (i.e., Italy), and establish whether there is one or more common mutation. In 25 index cases we identified 32 mutations; 22 are novel, including 9 nonsense, 3 small deletions, 4 insertions, 1 in/del, 1 small duplication, 1 missense, 2 splice-site, and for the first time a large genomic rearrangement. This brings the total number of SPG11 mutated patients in the SPATAX collection to 111 cases in 44 families and in 17 isolated cases, from 16 Countries, all assessed using homogeneous clinical criteria. While expanding the spectrum of mutations in SPG11, this larger series also corroborated the notion that even within apparently homogeneous population a molecular diagnosis cannot be achieved without full gene sequencing. (C) 2008 Wiley-Liss, Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The relationship between the structure and function of biological networks constitutes a fundamental issue in systems biology. Particularly, the structure of protein-protein interaction networks is related to important biological functions. In this work, we investigated how such a resilience is determined by the large scale features of the respective networks. Four species are taken into account, namely yeast Saccharomyces cerevisiae, worm Caenorhabditis elegans, fly Drosophila melanogaster and Homo sapiens. We adopted two entropy-related measurements (degree entropy and dynamic entropy) in order to quantify the overall degree of robustness of these networks. We verified that while they exhibit similar structural variations under random node removal, they differ significantly when subjected to intentional attacks (hub removal). As a matter of fact, more complex species tended to exhibit more robust networks. More specifically, we quantified how six important measurements of the networks topology (namely clustering coefficient, average degree of neighbors, average shortest path length, diameter, assortativity coefficient, and slope of the power law degree distribution) correlated with the two entropy measurements. Our results revealed that the fraction of hubs and the average neighbor degree contribute significantly for the resilience of networks. In addition, the topological analysis of the removed hubs indicated that the presence of alternative paths between the proteins connected to hubs tend to reinforce resilience. The performed analysis helps to understand how resilience is underlain in networks and can be applied to the development of protein network models.