48 resultados para microarray data classification

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1), and in two candidate molecules (Ann1 and Dixon) were assessed. Based on comparative best bidirectional hit analyses, the majority (51%) of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results: This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions: Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Moniliophthora perniciosa is a hemibiotrophic fungus that causes witches` broom disease (WBD) in cacao. Marked dimorphism characterizes this fungus, showing a monokaryotic or biotrophic phase that causes disease symptoms and a later dikaryotic or saprotrophic phase. A combined strategy of DNA microarray, expressed sequence tag, and real-time reverse-transcriptase polymerase chain reaction analyses was employed to analyze differences between these two fungal stages in vitro. In all, 1,131 putative genes were hybridized with cDNA from different phases, resulting in 189 differentially expressed genes, and 4,595 reads were clusterized, producing 1,534 unigenes. The analysis of these genes, which represent approximately 21% of the total genes, indicates that the biotrophic-like phase undergoes carbon and nitrogen catabollite repression that correlates to the expression of phytopathogenicity genes. Moreover, downregulation of mitochondrial oxidative phosphorylation and the presence of a putative ngr1 of Saccharomyces cerevisiae could help explain its lower growth rate. In contrast, the saprotrophic mycelium expresses genes related to the metabolism of hexoses, ammonia, and oxidative phosphorylation, which could explain its faster growth. Antifungal toxins were upregulated and could prevent the colonization by competing fungi. This work significantly contributes to our understanding of the molecular mechanisms of WBD and, to our knowledge, is the first to analyze differential gene expression of the different phases of a hemibiotrophic fungus.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Down syndrome (DS) immune phenotype is characterized by thymus hypotrophy, higher propensity to organ-specific autoimmune disorders, and higher susceptibility to infections, among other features. Considering that AIRE (autoimmune regulator) is located on 21q22.3, we analyzed protein and gene expression in surgically removed thymuses from 14 DS patients with congenital heart defects, who were compared with 42 age-matched controls with heart anomaly as an isolated malformation. Immunohistochemistry revealed 70.48 +/- 49.59 AIRE-positive cells/mm(2) in DS versus 154.70 +/- 61.16 AIRE-positive cells/mm(2) in controls (p < 0.0001), and quantitative PCR as well as DNA microarray data confirmed those results. The number of FOXP3-positive cells/mm(2) was equivalent in both groups. Thymus transcriptome analysis showed 407 genes significantly hypoexpressed in DS, most of which were related, according to network transcriptional analysis (FunNet), to cell division and to immunity. Immune response-related genes included those involved in 1) Ag processing and presentation (HLA-DQB1, HLA-DRB3, CD1A, CD1B, CD1C, ERAP) and 2) thymic T cell differentiation (IL2RG, RAG2, CD3D, CD3E, PRDX2, CDK6) and selection (SH2D1A, CD74). It is noteworthy that relevant AIRE-partner genes, such as TOP2A, LAMNB1, and NUP93, were found hypoexpressed in DNA microarrays and quantitative real-time PCR analyses. These findings on global thymic hypofunction in DS revealed molecular mechanisms underlying DS immune phenotype and strongly suggest that DS immune abnormalities are present since early development, rather than being a consequence of precocious aging, as widely hypothesized. Thus, DS should be considered as a non-monogenic primary immunodeficiency. The Journal of Immunology, 2011, 187: 3422-3430.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Chagas disease, characterized by acute myocarditis and chronic cardiomyopathy, is caused by infection with the protozoan parasite Trypanosoma cruzi. We sought to identify genes altered during the development of parasite-induced cardiomyopathy. Microarrays containing 27,400 sequence-verified mouse cDNAs were used to analyze global gene expression changes in the myocardium of a murine model of chagasic cardiomyopathy. Changes in gene expression were determined as the acute stage of infection developed into the chronic stage. This analysis was performed on the hearts of male CD-1 mice infected with trypomastigotes of T. cruzi (Brazil strain). At each interval we compared infected and uninfected mice and confirmed the microarray data with dye reversal. We identified eight distinct categories of mRNAs that were differentially regulated during infection and identified dysregulation of several key genes. These data may provide insight into the pathogenesis of chagasic cardiomyopathy and provide new targets for intervention. (c) 2008 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objectives To evaluate the gene expression profile of fibroblasts from affected and non-affected skin of systemic sclerosis (SSc) patients and from controls. Materials and methods Labeled cDNA from fibroblast cultures from forearm (affected) and axillary (non-affected) skin from six diffuse SSc patients, from three normal controls, and from MOLT-4/HEp-2/normal fibroblasts (reference pool) was probed in microarrays generated with 4193 human cDNAs from the IMAGE Consortium. Microarray images were converted into numerical data and gene expression was calculated as the ratio between fibroblast cDNA (Cy5) and reference pool cDNA (Cy3) data and analyzed by R environment/Aroma, Cluster, Tree View, and SAM softwares. Differential expression was confirmed by real time PCR for a set of selected genes. Results Eighty-eight genes were up- and 241 genes down-regulated in SSc fibroblasts. Gene expression correlation was strong between affected and non-affected fibroblast samples from the same patient (r>0.8), moderate among fibroblasts from all patients (r=0.72) and among fibroblasts from all controls (r=0.70), and modest among fibroblasts from patients and controls (r=0.55). The differential expression was confirmed by real time PCR for all selected genes. Conclusions Fibroblasts from affected and non-affected skin of SSc patients shared a similar abnormal gene expression profile, suggesting that the widespread molecular disturbance in SSc fibroblasts is more sensitive than histological and clinical alterations. Novel molecular elements potentially involved in SSc pathogenesis were identified.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Urinary bladder cancer is the fourth most common malignancy in the Western world. Transitional cell carcinoma (TCC) is the most common subtype, accounting for about 90% of all bladder cancers. The TP53 gene plays an essential role in the regulation of the cell cycle and apoptosis and therefore contributes to cellular transformation and malignancy; however, little is known about the differential gene expression patterns in human tumors that present with the wild-type or mutated TP53 gene. Therefore, because gene profiling can provide new insights into the molecular biology of bladder cancer, the present study aimed to compare the molecular profiles of bladder cancer cell lines with different TP53 alleles, including the wild type (RT4) and two mutants (5637, with mutations in codons 280 and 72; and T24, a TP53 allele encoding an in-frame deletion of tyrosine 126). Unsupervised hierarchical clustering and gene networks were constructed based on data generated by cDNA microarrays using mRNA from the three cell lines. Differentially expressed genes related to the cell cycle, cell division, cell death, and cell proliferation were observed in the three cell lines. However, the cDNA microarray data did not cluster cell lines based on their TP53 allele. The gene profiles of the RT4 cells were more similar to those of T24 than to those of the 5637 cells. While the deregulation of both the cell cycle and the apoptotic pathways was particularly related to TCC, these alterations were not associated with the TP53 status.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gene expression profiling by cDNA microarrays during murine thymus ontogeny has contributed to dissecting the large-scale molecular genetics of T cell maturation. Gene profiling, although useful for characterizing the thymus developmental phases and identifying the differentially expressed genes, does not permit the determination of possible interactions between genes. In order to reconstruct genetic interactions, on RNA level, within thymocyte differentiation, a pair of microarrays containing a total of 1,576 cDNA sequences derived from the IMAGE MTB library was applied on samples of developing thymuses (14-17 days of gestation). The data were analyzed using the GeneNetwork program. Genes that were previously identified as differentially expressed during thymus ontogeny showed their relationships with several other genes. The present method provided the detection of gene nodes coding for proteins implicated in the calcium signaling pathway, such as Prrg2 and Stxbp3, and in protein transport toward the cell membrane, such as Gosr2. The results demonstrate the feasibility of reconstructing networks based on cDNA microarray gene expression determinations, contributing to a clearer understanding of the complex interactions between genes involved in thymus/thymocyte development.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Because of the economical relevance of sugarcane and its high potential as a source of biofuel, it is important to understand how this crop will respond to the foreseen increase in atmospheric [CO(2)]. The effects of increased [CO(2)] on photosynthesis, development and carbohydrate metabolism were studied in sugarcane (Saccharum ssp.). Plants were grown at ambient (similar to 370 ppm) and elevated (similar to 720 ppm) [CO(2)] during 50 weeks in open-top chambers. The plants grown under elevated CO(2) showed, at the end of such period, an increase of about 30% in photosynthesis and 17% in height, and accumulated 40% more biomass in comparison with the plants grown at ambient [CO(2)]. These plants also had lower stomatal conductance and transpiration rates (-37 and -32%, respectively), and higher water-use efficiency (c.a. 62%). cDNA microarray analyses revealed a differential expression of 35 genes on the leaves (14 repressed and 22 induced) by elevated CO(2). The latter are mainly related to photosynthesis and development. Industrial productivity analysis showed an increase of about 29% in sucrose content. These data suggest that sugarcane crops increase productivity in higher [CO(2)], and that this might be related, as previously observed for maize and sorghum, to transient drought stress.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Caulobacter crescentus sigma(E) belongs to the ECF (extracytoplasmic function) subfamily of RNA polymerase sigma factors, whose members regulate gene expression in response to distinct environmental stresses. During physiological growth conditions, data indicate that sigma(E) is maintained in reduced levels due to the action of ChrR, a negative regulator of rpoE gene expression and function. However, once bacterial cells are exposed to cadmium, organic hydroperoxide, singlet oxygen or UV-A irradiation, transcription of rpoE is induced in a sigma(E)-dependent manner. Site-directed mutagenesis indicated that residue C188 in ChrR is critical for the cadmium response while residues H140 and H142 are required for the bacterial response to organic hydroperoxide, singlet oxygen and UV-A. Global transcriptional analysis showed that sigma(E) regulates genes involved in protecting cells against oxidative damages. A combination of transcriptional start site identification and promoter prediction revealed that some of these genes contain a putative sigma(E)-dependent motif in their upstream regions. Furthermore, deletion of rpoE and two sigma(E)-dependent genes (cfaS and hsp20) impairs Caulobacter survival when singlet oxygen is constantly generated in the cells.