132 resultados para EXPRESSION DATA
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.
Resumo:
Background: Without intensive selection, the majority of bovine oocytes submitted to in vitro embryo production (IVP) fail to develop to the blastocyst stage. This is attributed partly to their maturation status and competences. Using the Affymetrix GeneChip Bovine Genome Array, global mRNA expression analysis of immature (GV) and in vitro matured (IVM) bovine oocytes was carried out to characterize the transcriptome of bovine oocytes and then use a variety of approaches to determine whether the observed transcriptional changes during IVM was real or an artifact of the techniques used during analysis. Results: 8489 transcripts were detected across the two oocyte groups, of which similar to 25.0% (2117 transcripts) were differentially expressed (p < 0.001); corresponding to 589 over-expressed and 1528 under-expressed transcripts in the IVM oocytes compared to their immature counterparts. Over expression of transcripts by IVM oocytes is particularly interesting, therefore, a variety of approaches were employed to determine whether the observed transcriptional changes during IVM were real or an artifact of the techniques used during analysis, including the analysis of transcript abundance in oocytes in vitro matured in the presence of a-amanitin. Subsets of the differentially expressed genes were also validated by quantitative real-time PCR (qPCR) and the gene expression data was classified according to gene ontology and pathway enrichment. Numerous cell cycle linked (CDC2, CDK5, CDK8, HSPA2, MAPK14, TXNL4B), molecular transport (STX5, STX17, SEC22A, SEC22B), and differentiation (NACA) related genes were found to be among the several over-expressed transcripts in GV oocytes compared to the matured counterparts, while ANXA1, PLAU, STC1and LUM were among the over-expressed genes after oocyte maturation. Conclusion: Using sequential experiments, we have shown and confirmed transcriptional changes during oocyte maturation. This dataset provides a unique reference resource for studies concerned with the molecular mechanisms controlling oocyte meiotic maturation in cattle, addresses the existing conflicting issue of transcription during meiotic maturation and contributes to the global goal of improving assisted reproductive technology.
Resumo:
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Resumo:
The expression of peripheral tissue antigens (PTAs) in the thymus by medullary thymic epithelial cells (mTECs) is essential for the central self-tolerance in the generation of the T cell repertoire. Due to heterogeneity of autoantigen representation, this phenomenon has been termed promiscuous gene expression (PGE), in which the autoimmune regulator (Aire) gene plays a key role as a transcription factor in part of these genes. Here we used a microarray strategy to access PGE in cultured murine CD80(+) 3.10 mTEC line. Hierarchical clustering of the data allowed observation that PTA genes were differentially expressed being possible to found their respective induced or repressed mRNAs. To further investigate the control of PGE, we tested the hypothesis that genes involved in this phenomenon might also be modulated by transcriptional network. We then reconstructed such network based on the microarray expression data, featuring the guanylate cyclase 2d (Gucy2d) gene as a main node. In such condition, we established 167 positive and negative interactions with downstream PTA genes. Silencing Aire by RNA interference, Gucy2d while down regulated established a larger number (355) of interactions with PTA genes. T- and G-boxes corresponding to AIRE protein binding sites located upstream to ATG codon of Gucy2d supports this effect. These findings provide evidence that Aire plays a role in association with Gucy2d, which is connected to Several PTA genes and establishes a cascade-like transcriptional control of promiscuous gene expression in mTEC cells. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Background The continued increase in tuberculosis (TB) rates and the appearance of extremely resistant Mycobacterium tuberculosis strains (XDR-TB) worldwide are some of the great problems of public health. In this context, DNA immunotherapy has been proposed as an effective alternative that could circumvent the limitations of conventional drugs. Nonetheless, the molecular events underlying these therapeutic effects are poorly understood. Methods We characterized the transcriptional signature of lungs from mice infected with M. tuberculosis and treated with heat shock protein 65 as a genetic vaccine (DNAhsp65) combining microarray and real-time polymerase chain reaction analysis. The gene expression data were correlated with the histopathological analysis of lungs. Results The differential modulation of a high number of genes allowed us to distinguish DNAhsp65-treated from nontreated animals (saline and vector-injected mice). Functional analysis of this group of genes suggests that DNAhsp65 therapy could not only boost the T helper (Th)1 immune response, but also could inhibit Th2 cytokines and regulate the intensity of inflammation through fine tuning of gene expression of various genes, including those of interleukin-17, lymphotoxin A, tumour necrosis factor-cl, interleukin-6, transforming growth factor-beta, inducible nitric oxide synthase and Foxp3. In addition, a large number of genes and expressed sequence tags previously unrelated to DNA-therapy were identified. All these findings were well correlated with the histopathological lesions presented in the lungs. Conclusions The effects of DNA therapy are reflected in gene expression modulation; therefore, the genes identified as differentially expressed could be considered as transcriptional biomarkers of DNAhsp65 immunotherapy against TB. The data have important implications for achieving a better understanding of gene-based therapies. Copyright (C) 2008 John Wiley & Sons, Ltd.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Non-coding RNAs (ncRNAs) were recently given much higher attention due to technical advances in sequencing which expanded the characterization of transcriptomes in different organisms. ncRNAs have different lengths (22 nt to >1, 000 nt) and mechanisms of action that essentially comprise a sophisticated gene expression regulation network. Recent publication of schistosome genomes and transcriptomes has increased the description and characterization of a large number of parasite genes. Here we review the number of predicted genes and the coverage of genomic bases in face of the public ESTs dataset available, including a critical appraisal of the evidence and characterization of ncRNAs in schistosomes. We show expression data for ncRNAs in Schistosoma mansoni. We analyze three different microarray experiment datasets: (1) adult worms' large-scale expression measurements; (2) differentially expressed S. mansoni genes regulated by a human cytokine (TNF-α) in a parasite culture; and (3) a stage-specific expression of ncRNAs. All these data point to ncRNAs involved in different biological processes and physiological responses that suggest functionality of these new players in the parasite's biology. Exploring this world is a challenge for the scientists under a new molecular perspective of host-parasite interactions and parasite development.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
Background -: Sucrose content is a highly desirable trait in sugarcane as the worldwide demand for cost-effective biofuels surges. Sugarcane cultivars differ in their capacity to accumulate sucrose and breeding programs routinely perform crosses to identify genotypes able to produce more sucrose. Sucrose content in the mature internodes reach around 20% of the culms dry weight. Genotypes in the populations reflect their genetic program and may display contrasting growth, development, and physiology, all of which affect carbohydrate metabolism. Few studies have profiled gene expression related to sugarcane's sugar content. The identification of signal transduction components and transcription factors that might regulate sugar accumulation is highly desirable if we are to improve this characteristic of sugarcane plants. Results -: We have evaluated thirty genotypes that have different Brix (sugar) levels and identified genes differentially expressed in internodes using cDNA microarrays. These genes were compared to existing gene expression data for sugarcane plants subjected to diverse stress and hormone treatments. The comparisons revealed a strong overlap between the drought and sucrose-content datasets and a limited overlap with ABA signaling. Genes associated with sucrose content were extensively validated by qRT-PCR, which highlighted several protein kinases and transcription factors that are likely to be regulators of sucrose accumulation. The data also indicate that aquaporins, as well as lignin biosynthesis and cell wall metabolism genes, are strongly related to sucrose accumulation. Moreover, sucrose-associated genes were shown to be directly responsive to short term sucrose stimuli, confirming their role in sugar-related pathways. Conclusion -: Gene expression analysis of sugarcane populations contrasting for sucrose content indicated a possible overlap with drought and cell wall metabolism processes and suggested signaling and transcriptional regulators to be used as molecular markers in breeding programs. Transgenic research is necessary to further clarify the role of the genes and define targets useful for sugarcane improvement programs based on transgenic plants.
Resumo:
Background: The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results: We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions: ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast.
Resumo:
Background: Claudin-4 (CLDN4) is one of several proteins that act as molecular mediators of embryo implantation. Recently, we examined immunolabeling of leukemia inhibitory factor (LIF) in the endometrial tissue of 52 IVF patients, and found that LIF staining intensity was strongly correlated with successful pregnancy initiation. In the same set of patients, we have now examined endometrial CLDN4 expression, to see how expression intensity may vary with LIF. We examined CLDN4 in the luteal phase of the menstrual cycle, immediately preceding IVF treatment. Our aim was to compare expression of LIF and CLDN4 in the luteal phase, and document these patterns as putative biomarkers for pregnancy. Methods: Endometrial tissue was collected from women undergoing IVF. Endometrial biopsies were obtained during the luteal phase preceding IVF, and were then used for tissue microarray (TMA) immunolabeling of CLDN4. Previously published LIF expression data were then combined with CLDN4 expression data, to determine CLDN4/LIF expression patterns. Associations between successful pregnancy after IVF and combined CLDN4/LIF expression patterns were evaluated. Results: Four patterns of immunolabeling were observed in the endometrial samples: 16% showed weak CLDN4 and strong LIF (CLDN4(-)/LIF(+)); 20% showed strong CLDN4 and strong LIF (LIF(+)/CLDN4(+)); 28% showed strong CLDN4 and weak LIF (CLDN4(+)/LIF(-)); and 36% showed weak CLDN4 and weak LIF (CLDN4(-)/LIF(-)). Successful implantation after IVF was associated with CLDN4(-)/LIF(+)(p = 0.003). Patients showing this endometrial CLDN4(-)/LIF(+) immunolabeling were also 6 times more likely to achieve pregnancy than patients with endometrial CLDN4(+)/LIF(-) immunolabeling (p = 0.007). Conclusion: The combined immunolabeling expression of CLDN4(-)/LIF(+) in endometrial tissue is a potential biomarker for predicting successful pregnancy in IVF candidates.