44 resultados para microarray data classification
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Abstract Background The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. Results BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. Conclusion The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.
Resumo:
Traditional supervised data classification considers only physical features (e. g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Resumo:
Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.
Resumo:
Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.
Resumo:
A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.
Resumo:
Abstract Background Papaya (Carica papaya L.) is a commercially important crop that produces climacteric fruits with a soft and sweet pulp that contain a wide range of health promoting phytochemicals. Despite its importance, little is known about transcriptional modifications during papaya fruit ripening and their control. In this study we report the analysis of ripe papaya transcriptome by using a cross-species (XSpecies) microarray technique based on the phylogenetic proximity between papaya and Arabidopsis thaliana. Results Papaya transcriptome analyses resulted in the identification of 414 ripening-related genes with some having their expression validated by qPCR. The transcription profile was compared with that from ripening tomato and grape. There were many similarities between papaya and tomato especially with respect to the expression of genes encoding proteins involved in primary metabolism, regulation of transcription, biotic and abiotic stress and cell wall metabolism. XSpecies microarray data indicated that transcription factors (TFs) of the MADS-box, NAC and AP2/ERF gene families were involved in the control of papaya ripening and revealed that cell wall-related gene expression in papaya had similarities to the expression profiles seen in Arabidopsis during hypocotyl development. Conclusion The cross-species array experiment identified a ripening-related set of genes in papaya allowing the comparison of transcription control between papaya and other fruit bearing taxa during the ripening process.
Resumo:
Semi-supervised learning is a classification paradigm in which just a few labeled instances are available for the training process. To overcome this small amount of initial label information, the information provided by the unlabeled instances is also considered. In this paper, we propose a nature-inspired semi-supervised learning technique based on attraction forces. Instances are represented as points in a k-dimensional space, and the movement of data points is modeled as a dynamical system. As the system runs, data items with the same label cooperate with each other, and data items with different labels compete among them to attract unlabeled points by applying a specific force function. In this way, all unlabeled data items can be classified when the system reaches its stable state. Stability analysis for the proposed dynamical system is performed and some heuristics are proposed for parameter setting. Simulation results show that the proposed technique achieves good classification results on artificial data sets and is comparable to well-known semi-supervised techniques using benchmark data sets.
Resumo:
Patients with type 2 diabetes mellitus (T2DM) exhibit insulin resistance associated with obesity and inflammatory response, besides an increased level of oxidative DNA damage as a consequence of the hyperglycemic condition and the generation of reactive oxygen species (ROS). In order to provide information on the mechanisms involved in the pathophysiology of T2DM, we analyzed the transcriptional expression patterns exhibited by peripheral blood mononuclear cells (PBMCs) from patients with T2DM compared to non-diabetic subjects, by investigating several biological processes: inflammatory and immune responses, responses to oxidative stress and hypoxia, fatty acid processing, and DNA repair. PBMCs were obtained from 20 T2DM patients and eight non-diabetic subjects. Total RNA was hybridized to Agilent whole human genome 4x44K one-color oligo-microarray. Microarray data were analyzed using the GeneSpring GX 11.0 software (Agilent). We used BRB-ArrayTools software (gene set analysis - GSA) to investigate significant gene sets and the Genomica tool to study a possible influence of clinical features on gene expression profiles. We showed that PBMCs from T2DM patients presented significant changes in gene expression, exhibiting 1320 differentially expressed genes compared to the control group. A great number of genes were involved in biological processes implicated in the pathogenesis of T2DM. Among the genes with high fold-change values, the up-regulated ones were associated with fatty acid metabolism and protection against lipid-induced oxidative stress, while the down-regulated ones were implicated in the suppression of pro-inflammatory cytokines production and DNA repair. Moreover, we identified two significant signaling pathways: adipocytokine, related to insulin resistance; and ceramide, related to oxidative stress and induction of apoptosis. In addition, expression profiles were not influenced by patient features, such as age, gender, obesity, pre/post-menopause age, neuropathy, glycemia, and HbA(1c) percentage. Hence, by studying expression profiles of PBMCs, we provided quantitative and qualitative differences and similarities between T2DM patients and non-diabetic individuals, contributing with new perspectives for a better understanding of the disease. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Complex networks have been employed to model many real systems and as a modeling tool in a myriad of applications. In this paper, we use the framework of complex networks to the problem of supervised classification in the word disambiguation task, which consists in deriving a function from the supervised (or labeled) training data of ambiguous words. Traditional supervised data classification takes into account only topological or physical features of the input data. On the other hand, the human (animal) brain performs both low- and high-level orders of learning and it has facility to identify patterns according to the semantic meaning of the input data. In this paper, we apply a hybrid technique which encompasses both types of learning in the field of word sense disambiguation and show that the high-level order of learning can really improve the accuracy rate of the model. This evidence serves to demonstrate that the internal structures formed by the words do present patterns that, generally, cannot be correctly unveiled by only traditional techniques. Finally, we exhibit the behavior of the model for different weights of the low- and high-level classifiers by plotting decision boundaries. This study helps one to better understand the effectiveness of the model. Copyright (C) EPLA, 2012
Resumo:
Introduction: Ovarian adenocarcinoma is frequently detected at the late stage, when therapy efficacy is limited and death occurs in up to 50% of the cases. A potential novel treatment for this disease is a monoclonal antibody that recognizes phosphate transporter sodium-dependent phosphate transporter protein 2b (NaPi2b). Materials and Methods: To better understand the expression of this protein in different histologic types of ovarian carcinomas, we immunostained 50 tumor samples with anti-NaPi2b monoclonal antibody MX35 and, in parallel, we assessed the expression of the gene encoding NaPi2b (SCL34A2) by in silico analysis of microarray data. Results: Both approaches detected higher expression of NaPi2b (SCL34A2) in ovarian carcinoma than in normal tissue. Moreover, a comprehensive analysis indicates that SCL34A2 is the only gene of the several phosphate transporters genes whose expression differentiates normal from carcinoma samples, suggesting it might exert a major role in ovarian carcinomas. Immunohistochemical and mRNA expression data have also shown that 2 histologic subtypes of ovarian carcinoma express particularly high levels of NaPi2b: serous and clear cell adenocarcinomas. Serous adenocarcinomas are the most frequent, contrasting with clear cell carcinomas, rare, and with worse prognosis. Conclusion: This identification of subgroups of patients expressing NaPi2b may be important in selecting cohorts who most likely should be included in future clinical trials, as a recently generated humanized version of MX35 has been developed.
Resumo:
The search for molecular markers to improve diagnosis, individualize treatment and predict behavior of tumors has been the focus of several studies. This study aimed to analyze homeobox gene expression profile in oral squamous cell carcinoma (OSCC) as well as to investigate whether some of these genes are relevant molecular markers of prognosis and/or tumor aggressiveness. Homeobox gene expression levels were assessed by microarrays and qRT-PCR in OSCC tissues and adjacent non-cancerous matched tissues (margin), as well as in OSCC cell lines. Analysis of microarray data revealed the expression of 147 homeobox genes, including one set of six at least 2-fold up-regulated, and another set of 34 at least 2-fold down-regulated homeobox genes in OSCC. After qRT-PCR assays, the three most up-regulated homeobox genes (HOXA5, HOXD10 and HOXD11) revealed higher and statistically significant expression levels in OSCC samples when compared to margins. Patients presenting lower expression of HOXA5 had poorer prognosis compared to those with higher expression (P=0.03). Additionally, the status of HOXA5, HOXD10 and HOXD11 expression levels in OSCC cell lines also showed a significant up-regulation when compared to normal oral keratinocytes. Results confirm the presence of three significantly upregulated (>4-fold) homeobox genes (HOXA5, HOXD10 and HOXD11) in OSCC that may play a significant role in the pathogenesis of these tumors. Moreover, since lower levels of HOXA5 predict poor prognosis, this gene may be a novel candidate for development of therapeutic strategies in OSCC.
Resumo:
Xylella fastidiosa inhabits the plant xylem, a nutrient-poor environment, so that mechanisms to sense and respond to adverse environmental conditions are extremely important for bacterial survival in the plant host. Although the complete genome sequences of different Xylella strains have been determined, little is known about stress responses and gene regulation in these organisms. In this work, a DNA microarray was constructed containing 2,600 ORFs identified in the genome sequencing project of Xylella fastidiosa 9a5c strain, and used to check global gene expression differences in the bacteria when it is infecting a symptomatic and a tolerant citrus tree. Different patterns of expression were found in each variety, suggesting that bacteria are responding differentially according to each plant xylem environment. The global gene expression profile was determined and several genes related to bacterial survival in stressed conditions were found to be differentially expressed between varieties, suggesting the involvement of different strategies for adaptation to the environment. The expression pattern of some genes related to the heat shock response, toxin and detoxification processes, adaptation to atypical conditions, repair systems as well as some regulatory genes are discussed in this paper. DNA microarray proved to be a powerful technique for global transcriptome analyses. This is one of the first studies of Xylella fastidiosa gene expression in vivo which helped to increase insight into stress responses and possible bacterial survival mechanisms in the nutrient-poor environment of xylem vessels.
Resumo:
Schistosoma mansoni is responsible for schistosomiasis, a parasitic disease that affects 200 million people worldwide. Molecular mechanisms of host-parasite interaction are complex and involve a crosstalk between host signals and parasite receptors. TGF-beta signaling pathway has been shown to play an important role in S. mansoni development and embryogenesis. In particular human (h) TGF-beta has been shown to bind to a S. mansoni receptor, transduce a signal that regulates the expression of a schistosome target gene. Here we describe 381 parasite genes whose expression levels are affected by in vitro treatment with hTGF-beta. Among these differentially expressed genes we highlight genes related to morphology, development and cell cycle that could be players of cytokine effects on the parasite. We confirm by qPCR the expression changes detected with microarrays for 5 out of 7 selected genes. We also highlight a set of non-coding RNAs transcribed from the same loci of protein-coding genes that are differentially expressed upon hTCF-beta treatment. These datasets offer potential targets to be explored in order to understand the molecular mechanisms behind the possible role of hTGF-beta effects on parasite biology. (C) 2012 Elsevier B.V. All rights reserved.