890 resultados para Gene-expression Data
Resumo:
Background The continued increase in tuberculosis (TB) rates and the appearance of extremely resistant Mycobacterium tuberculosis strains (XDR-TB) worldwide are some of the great problems of public health. In this context, DNA immunotherapy has been proposed as an effective alternative that could circumvent the limitations of conventional drugs. Nonetheless, the molecular events underlying these therapeutic effects are poorly understood. Methods We characterized the transcriptional signature of lungs from mice infected with M. tuberculosis and treated with heat shock protein 65 as a genetic vaccine (DNAhsp65) combining microarray and real-time polymerase chain reaction analysis. The gene expression data were correlated with the histopathological analysis of lungs. Results The differential modulation of a high number of genes allowed us to distinguish DNAhsp65-treated from nontreated animals (saline and vector-injected mice). Functional analysis of this group of genes suggests that DNAhsp65 therapy could not only boost the T helper (Th)1 immune response, but also could inhibit Th2 cytokines and regulate the intensity of inflammation through fine tuning of gene expression of various genes, including those of interleukin-17, lymphotoxin A, tumour necrosis factor-cl, interleukin-6, transforming growth factor-beta, inducible nitric oxide synthase and Foxp3. In addition, a large number of genes and expressed sequence tags previously unrelated to DNA-therapy were identified. All these findings were well correlated with the histopathological lesions presented in the lungs. Conclusions The effects of DNA therapy are reflected in gene expression modulation; therefore, the genes identified as differentially expressed could be considered as transcriptional biomarkers of DNAhsp65 immunotherapy against TB. The data have important implications for achieving a better understanding of gene-based therapies. Copyright (C) 2008 John Wiley & Sons, Ltd.
Resumo:
Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms. Database URL: http://eurodia.vital-it.ch.
Resumo:
Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.
Resumo:
Background: Stem cells and their niches are studied in many systems, but mammalian germ stem cells (GSC) and their niches are still poorly understood. In rat testis, spermatogonia and undifferentiated Sertoli cells proliferate before puberty, but at puberty most spermatogonia enter spermatogenesis, and Sertoli cells differentiate to support this program. Thus, pre-pubertal spermatogonia might possess GSC potential and pre-pubertal Sertoli cells niche functions. We hypothesized that the different stem cell pools at pre-puberty and maturity provide a model for the identification of stem cell and niche-specific genes. We compared the transcript profiles of spermatogonia and Sertoli cells from pre-pubertal and pubertal rats and examined how these related to genes expressed in testicular cancers, which might originate from inappropriate communication between GSCs and Sertoli cells. Results: The pre-pubertal spermatogonia-specific gene set comprised known stem cell and spermatogonial stem cell (SSC) markers. Similarly, the pre-pubertal Sertoli cell-specific gene set comprised known niche gene transcripts. A large fraction of these specifically enriched transcripts encoded trans-membrane, extra-cellular, and secreted proteins highlighting stem cell to niche communication. Comparing selective gene sets established in this study with published gene expression data of testicular cancers and their stroma, we identified sets expressed genes shared between testicular tumors and pre-pubertal spermatogonia, and tumor stroma and pre-pubertal Sertoli cells with statistic significance. Conclusions: Our data suggest that SSC and their niche specifically express complementary factors for cell communication and that the same factors might be implicated in the communication between tumor cells and their micro-enviroment in testicular cancer.
Resumo:
Eutherian mammals share a common ancestor that evolved into two main placental types, i.e., hemotrophic (e.g., human and mouse) and histiotrophic (e.g., farm animals), which differ in invasiveness. Pregnancies initiated with assisted reproductive techniques (ART) in farm animals are at increased risk of failure; these losses were associated with placental defects, perhaps due to altered gene expression. Developmentally regulated genes in the placenta seem highly phylogenetically conserved, whereas those expressed later in pregnancy are more species-specific. To elucidate differences between hemotrophic and epitheliochorial placentae, gene expression data were compiled from microarray studies of bovine placental tissues at various stages of pregnancy. Moreover, an in silico subtractive library was constructed based on homology of bovine genes to the database of zebrafish - a nonplacental vertebrate. In addition, the list of placental preferentially expressed genes for the human and mouse were collected using bioinformatics tools (Tissue-specific Gene Expression and Regulation [TiGER] - for humans, and tissue-specific genes database (TiSGeD) - for mice and humans). Humans, mice, and cattle shared 93 genes expressed in their placentae. Most of these were related to immune function (based on analysis of gene ontology). Cattle and women shared expression of 23 genes, mostly related to hormonal activity, whereas mice and women shared 16 genes (primarily sexual differentiation and glycoprotein biology). Because the number of genes expressed by the placentae of both cattle and mice were similar (based on cluster analysis), we concluded that both cattle and mice were suitable models to study the biology of the human placenta. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
Induction of interferon-beta (IFN-beta) gene expression is a tightly regulated process, and a plethora of studies identified the signal transduction pathway TANK-binding kinase-1 (TBK-1)/IFN regulatory factor-3 (IRF-3) as essential to the induction of IFN-beta gene expression. Data regarding the role of p38 and JNK are rare, however. We investigated the contribution of these kinases to IFN-beta expression in human macrophages treated with poly(I:C), lipopolysaccharide (LPS), Sendai virus, or vesicular stomatitis virus (VSV). We found that all the stimuli induced IFN-beta mRNA, albeit to a different extent. Whereas LPS and VSV induced the phosphorylation of p38 and JNK, neither poly(I:C) nor Sendai virus led to the detection of phosphospecific signals. When inhibiting p38, a VSV-triggered IFN-beta mRNA response was inhibited, whereas inhibiting JNK suppressed an LPS-triggered response, but only when macrophages were primed with IFN-gamma. Neither poly(I:C)-induced nor Sendai virus-induced IFN-beta mRNA expression was affected when p38 and JNK were inhibited. Collectively, the data show that the contribution of p38 and JNK to the expression of IFN-beta occurs in a stimulation-specific manner in human macrophages.
Resumo:
Abstract The creation of atlases, or digital models where information from different subjects can be combined, is a field of increasing interest in biomedical imaging. When a single image does not contain enough information to appropriately describe the organism under study, it is then necessary to acquire images of several individuals, each of them containing complementary data with respect to the rest of the components in the cohort. This approach allows creating digital prototypes, ranging from anatomical atlases of human patients and organs, obtained for instance from Magnetic Resonance Imaging, to gene expression cartographies of embryo development, typically achieved from Light Microscopy. Within such context, in this PhD Thesis we propose, develop and validate new dedicated image processing methodologies that, based on image registration techniques, bring information from multiple individuals into alignment within a single digital atlas model. We also elaborate a dedicated software visualization platform to explore the resulting wealth of multi-dimensional data and novel analysis algo-rithms to automatically mine the generated resource in search of bio¬logical insights. In particular, this work focuses on gene expression data from developing zebrafish embryos imaged at the cellular resolution level with Two-Photon Laser Scanning Microscopy. Disposing of quantitative measurements relating multiple gene expressions to cell position and their evolution in time is a fundamental prerequisite to understand embryogenesis multi-scale processes. However, the number of gene expressions that can be simultaneously stained in one acquisition is limited due to optical and labeling constraints. These limitations motivate the implementation of atlasing strategies that can recreate a virtual gene expression multiplex. The developed computational tools have been tested in two different scenarios. The first one is the early zebrafish embryogenesis where the resulting atlas constitutes a link between the phenotype and the genotype at the cellular level. The second one is the late zebrafish brain where the resulting atlas allows studies relating gene expression to brain regionalization and neurogenesis. The proposed computational frameworks have been adapted to the requirements of both scenarios, such as the integration of partial views of the embryo into a whole embryo model with cellular resolution or the registration of anatom¬ical traits with deformable transformation models non-dependent on any specific labeling. The software implementation of the atlas generation tool (Match-IT) and the visualization platform (Atlas-IT) together with the gene expression atlas resources developed in this Thesis are to be made freely available to the scientific community. Lastly, a novel proof-of-concept experiment integrates for the first time 3D gene expression atlas resources with cell lineages extracted from live embryos, opening up the door to correlate genetic and cellular spatio-temporal dynamics. La creación de atlas, o modelos digitales, donde la información de distintos sujetos puede ser combinada, es un campo de creciente interés en imagen biomédica. Cuando una sola imagen no contiene suficientes datos como para describir apropiadamente el organismo objeto de estudio, se hace necesario adquirir imágenes de varios individuos, cada una de las cuales contiene información complementaria respecto al resto de componentes del grupo. De este modo, es posible crear prototipos digitales, que pueden ir desde atlas anatómicos de órganos y pacientes humanos, adquiridos por ejemplo mediante Resonancia Magnética, hasta cartografías de la expresión genética del desarrollo de embrionario, típicamente adquiridas mediante Microscopía Optica. Dentro de este contexto, en esta Tesis Doctoral se introducen, desarrollan y validan nuevos métodos de procesado de imagen que, basándose en técnicas de registro de imagen, son capaces de alinear imágenes y datos provenientes de múltiples individuos en un solo atlas digital. Además, se ha elaborado una plataforma de visualization específicamente diseñada para explorar la gran cantidad de datos, caracterizados por su multi-dimensionalidad, que resulta de estos métodos. Asimismo, se han propuesto novedosos algoritmos de análisis y minería de datos que permiten inspeccionar automáticamente los atlas generados en busca de conclusiones biológicas significativas. En particular, este trabajo se centra en datos de expresión genética del desarrollo embrionario del pez cebra, adquiridos mediante Microscopía dos fotones con resolución celular. Disponer de medidas cuantitativas que relacionen estas expresiones genéticas con las posiciones celulares y su evolución en el tiempo es un prerrequisito fundamental para comprender los procesos multi-escala característicos de la morfogénesis. Sin embargo, el número de expresiones genéticos que pueden ser simultáneamente etiquetados en una sola adquisición es reducido debido a limitaciones tanto ópticas como del etiquetado. Estas limitaciones requieren la implementación de estrategias de creación de atlas que puedan recrear un multiplexado virtual de expresiones genéticas. Las herramientas computacionales desarrolladas han sido validadas en dos escenarios distintos. El primer escenario es el desarrollo embrionario temprano del pez cebra, donde el atlas resultante permite constituir un vínculo, a nivel celular, entre el fenotipo y el genotipo de este organismo modelo. El segundo escenario corresponde a estadios tardíos del desarrollo del cerebro del pez cebra, donde el atlas resultante permite relacionar expresiones genéticas con la regionalización del cerebro y la formación de neuronas. La plataforma computacional desarrollada ha sido adaptada a los requisitos y retos planteados en ambos escenarios, como la integración, a resolución celular, de vistas parciales dentro de un modelo consistente en un embrión completo, o el alineamiento entre estructuras de referencia anatómica equivalentes, logrado mediante el uso de modelos de transformación deformables que no requieren ningún marcador específico. Está previsto poner a disposición de la comunidad científica tanto la herramienta de generación de atlas (Match-IT), como su plataforma de visualización (Atlas-IT), así como las bases de datos de expresión genética creadas a partir de estas herramientas. Por último, dentro de la presente Tesis Doctoral, se ha incluido una prueba conceptual innovadora que permite integrar los mencionados atlas de expresión genética tridimensionales dentro del linaje celular extraído de una adquisición in vivo de un embrión. Esta prueba conceptual abre la puerta a la posibilidad de correlar, por primera vez, las dinámicas espacio-temporales de genes y células.
Resumo:
Analysis of previously published sets of DNA microarray gene expression data by singular value decomposition has uncovered underlying patterns or “characteristic modes” in their temporal profiles. These patterns contribute unequally to the structure of the expression profiles. Moreover, the essential features of a given set of expression profiles are captured using just a small number of characteristic modes. This leads to the striking conclusion that the transcriptional response of a genome is orchestrated in a few fundamental patterns of gene expression change. These patterns are both simple and robust, dominating the alterations in expression of genes throughout the genome. Moreover, the characteristic modes of gene expression change in response to environmental perturbations are similar in such distant organisms as yeast and human cells. This analysis reveals simple regularities in the seemingly complex transcriptional transitions of diverse cells to new states, and these provide insights into the operation of the underlying genetic networks.
Resumo:
We present statistical methods for analyzing replicated cDNA microarray expression data and report the results of a controlled experiment. The study was conducted to investigate inherent variability in gene expression data and the extent to which replication in an experiment produces more consistent and reliable findings. We introduce a statistical model to describe the probability that mRNA is contained in the target sample tissue, converted to probe, and ultimately detected on the slide. We also introduce a method to analyze the combined data from all replicates. Of the 288 genes considered in this controlled experiment, 32 would be expected to produce strong hybridization signals because of the known presence of repetitive sequences within them. Results based on individual replicates, however, show that there are 55, 36, and 58 highly expressed genes in replicates 1, 2, and 3, respectively. On the other hand, an analysis by using the combined data from all 3 replicates reveals that only 2 of the 288 genes are incorrectly classified as expressed. Our experiment shows that any single microarray output is subject to substantial variability. By pooling data from replicates, we can provide a more reliable analysis of gene expression data. Therefore, we conclude that designing experiments with replications will greatly reduce misclassification rates. We recommend that at least three replicates be used in designing experiments by using cDNA microarrays, particularly when gene expression data from single specimens are being analyzed.
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
Mood stabilising drugs such as lithium (LiCl) and valproic acid (VPA) are the first line agents for treating conditions such as Bipolar disorder and Epilepsy. However, these drugs have potential developmental effects that are not fully understood. This study explores the use of a simple human neurosphere-based in vitro model to characterise the pharmacological and toxicological effects of LiCl and VPA using gene expression changes linked to phenotypic alterations in cells. Treatment with VPA and LiCl resulted in the differential expression of 331 and 164 genes respectively. In the subset of VPA targeted genes, 114 were downregulated whilst 217 genes were upregulated. In the subset of LiCl targeted genes, 73 were downregulated and 91 were upregulated. Gene ontology (GO) term enrichment analysis was used to highlight the most relevant GO terms associated with a given gene list following toxin exposure. In addition, in order to phenotypically anchor the gene expression data, changes in the heterogeneity of cell subtype populations and cell cycle phase were monitored using flow cytometry. Whilst LiCl exposure did not significantly alter the proportion of cells expressing markers for stem cells/undifferentiated cells (Oct4, SSEA4), neurons (Neurofilament M), astrocytes (GFAP) or cell cycle phase, the drug caused a 1.4-fold increase in total cell number. In contrast, exposure to VPA resulted in significant upregulation of Oct4, SSEA, Neurofilament M and GFAP with significant decreases in both G2/M phase cells and cell number. This neurosphere model might provide the basis of a human-based cellular approach for the regulatory exploration of developmental impact of potential toxic chemicals.
Resumo:
Background: Expressed Sequence Tags (ESTs) are in general used to gain a first insight into gene activities from a species of interest. Subsequently, and typically based on a combination of EST and genome sequences, microarray-based expression analyses are performed for a variety of conditions. In some cases, a multitude of EST and microarray experiments are conducted for one species, covering different tissues, cell states, and cell types. Under these circumstances, the challenge arises to combine results derived from the different expression profiling strategies, with the goal to uncover novel information on the basis of the integrated datasets. Findings: Using our new analysis tool, MediPlEx (MEDIcago truncatula multiPLe EXpression analysis), expression data from EST experiments, oligonucleotide microarrays and Affymetrix GeneChips® can be combined and analyzed, leading to a novel approach to integrated transcriptome analysis. We have validated our tool via the identification of a set of well-characterized AM-specific and AM-induced marker genes, identified by MediPlEx on the basis of in silico and experimental gene expression profiles from roots colonized with AM fungi. Conclusions: MediPlEx offers an integrated analysis pipeline for different sets of expression data generated for the model legume Medicago truncatula. As expected, in silico and experimental gene expression data that cover the same biological condition correlate well. The collection of differentially expressed genes identified via MediPlEx provides a starting point for functional studies in plant mutants.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
High-throughput technologies are now used to generate more than one type of data from the same biological samples. To properly integrate such data, we propose using co-modules, which describe coherent patterns across paired data sets, and conceive several modular methods for their identification. We first test these methods using in silico data, demonstrating that the integrative scheme of our Ping-Pong Algorithm uncovers drug-gene associations more accurately when considering noisy or complex data. Second, we provide an extensive comparative study using the gene-expression and drug-response data from the NCI-60 cell lines. Using information from the DrugBank and the Connectivity Map databases we show that the Ping-Pong Algorithm predicts drug-gene associations significantly better than other methods. Co-modules provide insights into possible mechanisms of action for a wide range of drugs and suggest new targets for therapy