70 resultados para Subtractive clustering

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Melanoma progression occurs through three major stages: radial growth phase (RGP), confined to the epidermis; vertical growth phase (VGP), when the tumor has invaded into the dermis; and metastasis. In this work, we used suppression subtractive hybridization (SSH) to investigate the molecular signature of melanoma progression, by comparing a group of metastatic cell lines with an RGP-like cell line showing characteristics of early neoplastic lesions including expression of the metastasis suppressor KISS1, lack of alpha v beta 3-integrin and low levels of RHOC. Methods: Two subtracted cDNA collections were obtained, one (RGP library) by subtracting the RGP cell line (WM1552C) cDNA from a cDNA pool from four metastatic cell lines (WM9, WM852, 1205Lu and WM1617), and the other (Met library) by the reverse subtraction. Clones were sequenced and annotated, and expression validation was done by Northern blot and RT-PCR. Gene Ontology annotation and searches in large-scale melanoma expression studies were done for the genes identified. Results: We identified 367 clones from the RGP library and 386 from the Met library, of which 351 and 368, respectively, match human mRNA sequences, representing 288 and 217 annotated genes. We confirmed the differential expression of all genes selected for validation. In the Met library, we found an enrichment of genes in the growth factors/receptor, adhesion and motility categories whereas in the RGP library, enriched categories were nucleotide biosynthesis, DNA packing/repair, and macromolecular/vesicular trafficking. Interestingly, 19% of the genes from the RGP library map to chromosome 1 against 4% of the ones from Met library. Conclusion: This study identifies two populations of genes differentially expressed between melanoma cell lines from two tumor stages and suggests that these sets of genes represent profiles of less aggressive versus metastatic melanomas. A search for expression profiles of melanoma in available expression study databases allowed us to point to a great potential of involvement in tumor progression for several of the genes identified here. A few sequences obtained here may also contribute to extend annotated mRNAs or to the identification of novel transcripts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: During mating, insect males eject accessory gland proteins (Acps) into the female genital tract. These substances are known to affect female post-mating behavior and physiology. In addition, they may harm the female, e. g., in reducing its lifespan. This is interpreted as a consequence of sexual antagonistic co-evolution. Whereas sexual conflict abounds in non-social species, the peculiar life history of social insects (ants, bees, wasps) with lifelong pair-bonding and no re-mating aligns the reproductive interests of the sexes. Harming the female during mating would negatively affect male fitness and sexual antagonism is therefore not expected. Indeed, mating appears to increase female longevity in at least one ant species. Acps are presumed to play a role in this phenomenon, but the underlying mechanisms are unknown. In this study, we investigated genes, which are preferentially expressed in male accessory glands of the ant Leptothorax gredleri, to determine which proteins might be transferred in the seminal fluid. Results: By a suppression subtractive hybridization protocol we obtained 20 unique sequences (USs). Twelve had mutual best matches with genes predicted for Apis mellifera and Nasonia vitripennis. Functional information (Gene Ontology) was available only for seven of these, including intracellular signaling, energy-dependent transport and metabolic enzyme activities. The remaining eight USs did not match sequences from other species. Six genes were further analyzed by quantitative RT-PCR in three life cycle stages of male ants. A gene with carboxy-lyase activity and one of unpredicted function were significantly overexpressed in accessory glands of sexually mature males. Conclusions: Our study is the first one to investigate differential gene expression in ants in a context related to mating. Our findings indicate that male accessory glands of L. gredleri express a series of genes that are unique to this species, possibly representing novel genes, in addition to conserved ones for which functions can be predicted. Identifying differentially expressed genes might help to better understand molecular mechanisms involved in reproductive processes in eusocial Hymenoptera. While the novel genes could account for rapidly evolving ones driven by intra-sexual conflict between males, conserved genes imply that rather beneficial traits might get fixed by a process described as inter-sexual cooperation between males and females.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A graph clustering algorithm constructs groups of closely related parts and machines separately. After they are matched for the least intercell moves, a refining process runs on the initial cell formation to decrease the number of intercell moves. A simple modification of this main approach can deal with some practical constraints, such as the popular constraint of bounding the maximum number of machines in a cell. Our approach makes a big improvement in the computational time. More importantly, improvement is seen in the number of intercell moves when the computational results were compared with best known solutions from the literature. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fruit of banana undergoes several important physico-chemical changes during ripening. Analysis of gene expression would permit identification of important genes and regulatory elements involved in this process. Therefore, transcript profiling of preclimacteric and climacteric fruit was performed using differential display and Suppression subtractive hybridization. Our analyses resulted in the isolation of 12 differentially expressed cDNAs, which were confirmed by dot-blots and northern blots. Among the sequences identified were sequences homologous to plant aquaporins, adenine nucleotide translocator, immunophilin, legumin-like proteins, deoxyguanosine kinase and omega-3 fatty acid desaturase. Some of these cDNAs correspond to newly isolated genes involved in changes related to the respiratory climacteric, or stress-defense responses. Functional characterization of ripening-associated genes could provide information useful in controlling biochemical pathways that would have an impact on banana quality and shelf life. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trypanosoma (Megatrypanum) theileri from cattle and trypanosomes of other artiodactyls form a clade of closely related species in analyses using ribosomal sequences. Analysis of polymorphic sequences of a larger number of trypanosomes from broader geographical origins is required to evaluate the Clustering of isolates as suggested by previous studies. Here, we determined the sequences of the spliced leader (SL) genes of 21 isolates from cattle and 2 from water buffalo from distant regions of Brazil. Analysis of SL gene repeats revealed that the 5S rRNA gene is inserted within the intergenic region. Phylogeographical patterns inferred using SL sequences showed at least 5 major genotypes of T. theileri distributed in 2 strongly divergent lineages. Lineage TthI comprises genotypes IA and IB from buffalo and cattle, respectively, from the Southeast and Central regions, whereas genotype IC is restricted to cattle from the Southern region. Lineage Tth II includes cattle genotypes IIA, which is restricted to the North and Northeast, and IIB, found in the Centre, West, North and Northeast. PCR-RFLP of SL genes revealed valuable markers for genotyping T. theileri. The results of this study emphasize the genetic complexity and corroborate the geographical structuring of T. theileri genotypes found in cattle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We characterized 28 new isolates of Trypanosoma cruzi IIc (TCIIc) of mammals and triatomines from Northern to Southern Brazil, confirming the widespread distribution of this lineage. Phylogenetic analyses using cytochrome b and SSU rDNA sequences clearly separated TCIIc from TCIIa according to terrestrial and arboreal ecotopes of their preferential mammalian hosts and vectors. TCIIc was more closely related to TCIId/e, followed by TCIIa, and separated by large distances from TCIIb and TCI. Despite being indistinguishable by traditional genotyping and generally being assigned to Z3, we provide evidence that TCIIa from South America and TCIIa from North America correspond to independent lineages that circulate in distinct hosts and ecological niches. Armadillos, terrestrial didelphids and rodents, and domestic dogs were found infected by TCIIc in Brazil. We believe that, in Brazil, this is the first description of TCIIc from rodents and domestic dogs. Terrestrial triatomines of genera Panstrongylus and Triatoma were confirmed as vectors of TCIIc. Together, habitat, mammalian host and vector association corroborated the link between TCIIc and terrestrial transmission cycles/ecological niches. Analysis of ITS1 rDNA sequences disclosed clusters of TCIIc isolates in accordance with their geographic origin, independent of their host species. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.