978 resultados para Gene Set Enrichment
Resumo:
Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper. Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting gene sets using different statistics. We find that gender may have effects on the gene expression in addition to the phenotype effects. Investigating overlap among interesting gene sets indicate that overlapping could alter the interpretation of the significant results.
Resumo:
Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis (GSEA), is based on a statistical test known for its lack of sensitivity. In this paper we compare the performance of a simple alternative to GSEA.We find that this simple solution clearly outperforms GSEA.We demonstrate this with eight different microarray datasets.
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
Resumo:
The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026. © 2013 Swiss Institute of Bioinformatics. Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Resumo:
The use of immunosuppressive drugs in transplanted patients is associated with the development of diabetes, possibly due to β-cell toxicity. To better understand the mechanisms leading to post-transplant diabetes, we investigated the actions of prolonged exposure of isolated human islets to therapeutical levels of tacrolimus (Tac) or cyclosporin A (CsA). Islets were isolated from the pancreas of multiorgan donors by enzymatic digestion and density gradient centrifugation. Functional, survival and molecular studies were then performed after 4 days of incubation with therapeutical concentrations of Tac or CsA. Glucose-induced insulin secretion was significantly decreased in Tac, but not in CsA exposed islets, which was associated with a reduction of the amount of insulin granules as shown by electron microscopy. The percentage of apoptotic β-cells was higher in Tac than CsA exposed islets. Microarray experiments followed by Gene Set Enrichment Analysis revealed that gene expression was more markedly affected upon Tac treatment. In conclusion, Tac and CsA affect features of beta-cell differently, with several changes occurring at the molecular level.
Resumo:
Introduction. Genetic epidemiology is focused on the study of the genetic causes that determine health and diseases in populations. To achieve this goal a common strategy is to explore differences in genetic variability between diseased and nondiseased individuals. Usual markers of genetic variability are single nucleotide polymorphisms (SNPs) which are changes in just one base in the genome. The usual statistical approach in genetic epidemiology study is a marginal analysis, where each SNP is analyzed separately for association with the phenotype. Motivation. It has been observed, that for common diseases the single-SNP analysis is not very powerful for detecting genetic causing variants. In this work, we consider Gene Set Analysis (GSA) as an alternative to standard marginal association approaches. GSA aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. Objective. We present a new optimized implementation of a pair of gene set analysis methodologies for analyze the individual evidence of SNPs in biological pathways. We perform a simulation study for exploring the power of the proposed methodologies in a set of scenarios with different number of causal SNPs under different effect sizes. In addition, we compare the results with the usual single-SNP analysis method. Moreover, we show the advantage of using the proposed gene set approaches in the context of an Alzheimer disease case-control study where we explore the Reelin signal pathway.
Resumo:
Glucose is the most important metabolic substrate of the retina and maintenance of normoglycemia is an essential challenge for diabetic patients. Chronic, exaggerated, glycemic excursions could lead to cardiovascular diseases, nephropathy, neuropathy and retinopathy. We recently showed that hypoglycemia induced retinal cell death in mouse via caspase 3 activation and glutathione (GSH) decrease. Ex vivo experiments in 661W photoreceptor cells confirmed the low-glucose induction of death via superoxide production and activation of caspase 3, which was concomitant with a decrease of GSH content. We evaluate herein retinal gene expression 4 h and 48 h after insulin-induced hypoglycemia. Microarray analysis demonstrated clusters of genes whose expression was modified by hypoglycemia and we discuss the potential implication of those genes in retinal cell death. In addition, we identify by gene set enrichment analysis, three important pathways, including lysosomal function, GSH metabolism and apoptotic pathways. Then we tested the effect of recurrent hypoglycemia (three successive 4h periods of hypoglycemia spaced by 48 h recovery) on retinal cell death. Interestingly, exposure to multiple hypoglycemic events prevented GSH decrease and retinal cell death, or adapted the retina to external stress by restoring GSH level comparable to control situation. We hypothesize that scavenger GSH is a key compound in this apoptotic process, and maintaining "normal" GSH level, as well as a strict glycemic control, represents a therapeutic challenge in order to avoid side effects of diabetes, especially diabetic retinopathy.
Resumo:
The recently sequenced genome of the parasitic bacterium Mycoplasma genitalium contains only 468 identified protein-coding genes that have been dubbed a minimal gene complement [Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., et al. (1995) Science 270, 397-403]. Although the M. genitalium gene complement is indeed the smallest among known cellular life forms, there is no evidence that it is the minimal self-sufficient gene set. To derive such a set, we compared the 468 predicted M. genitalium protein sequences with the 1703 protein sequences encoded by the other completely sequenced small bacterial genome, that of Haemophilus influenzae. M. genitalium and H. influenzae belong to two ancient bacterial lineages, i.e., Gram-positive and Gram-negative bacteria, respectively. Therefore, the genes that are conserved in these two bacteria are almost certainly essential for cellular function. It is this category of genes that is most likely to approximate the minimal gene set. We found that 240 M. genitalium genes have orthologs among the genes of H. influenzae. This collection of genes falls short of comprising the minimal set as some enzymes responsible for intermediate steps in essential pathways are missing. The apparent reason for this is the phenomenon that we call nonorthologous gene displacement when the same function is fulfilled by nonorthologous proteins in two organisms. We identified 22 nonorthologous displacements and supplemented the set of orthologs with the respective M. genitalium genes. After examining the resulting list of 262 genes for possible functional redundancy and for the presence of apparently parasite-specific genes, 6 genes were removed. We suggest that the remaining 256 genes are close to the minimal gene set that is necessary and sufficient to sustain the existence of a modern-type cell. Most of the proteins encoded by the genes from the minimal set have eukaryotic or archaeal homologs but seven key proteins of DNA replication do not. We speculate that the last common ancestor of the three primary kingdoms had an RNA genome. Possibilities are explored to further reduce the minimal set to model a primitive cell that might have existed at a very early stage of life evolution.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Host responses following exposure to Mycobacterium tuberculosis (TB) are complex and can significantly affect clinical outcome. These responses, which are largely mediated by complex immune mechanisms involving peripheral blood cells (PBCs) such as T-lymphocytes, NK cells and monocyte-derived macrophages, have not been fully characterized. We hypothesize that different clinical outcome following TB exposure will be uniquely reflected in host gene expression profiles, and expression profiling of PBCs can be used to discriminate between different TB infectious outcomes. In this study, microarray analysis was performed on PBCs from three TB groups (BCG-vaccinated, latent TB infection, and active TB infection) and a control healthy group. Supervised learning algorithms were used to identify signature genomic responses that differentiate among group samples. Gene Set Enrichment Analysis was used to determine sets of genes that were co-regulated. Multivariate permutation analysis (p < 0.01) gave 645 genes differentially expressed among the four groups, with both distinct and common patterns of gene expression observed for each group. A 127-probeset, representing 77 known genes, capable of accurately classifying samples into their respective groups was identified. In addition, 13 insulin-sensitive genes were found to be differentially regulated in all three TB infected groups, underscoring the functional association between insulin signaling pathway and TB infection. Published by Elsevier Ltd.
Resumo:
Tese de Doutoramento em Ciências da Saúde
Resumo:
To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P = 5.4 × 10⁻⁶⁰) and 9q31.2 (P = 2.2 × 10⁻³³), we identified 30 new menarche loci (all P < 5 × 10⁻⁸) and found suggestive evidence for a further 10 loci (P < 1.9 × 10⁻⁶). The new loci included four previously associated with body mass index (in or near FTO, SEC16B, TRA2B and TMEM18), three in or near other genes implicated in energy homeostasis (BSX, CRTC1 and MCHR2) and three in or near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and gene-set enrichment pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.
Resumo:
Most approaches aiming at finding genes involved in adaptive events have focused on the detection of outlier loci, which resulted in the discovery of individually "significant" genes with strong effects. However, a collection of small effect mutations could have a large effect on a given biological pathway that includes many genes, and such a polygenic mode of adaptation has not been systematically investigated in humans. We propose here to evidence polygenic selection by detecting signals of adaptation at the pathway or gene set level instead of analyzing single independent genes. Using a gene-set enrichment test to identify genome-wide signals of adaptation among human populations, we find that most pathways globally enriched for signals of positive selection are either directly or indirectly involved in immune response. We also find evidence for long-distance genotypic linkage disequilibrium, suggesting functional epistatic interactions between members of the same pathway. Our results show that past interactions with pathogens have elicited widespread and coordinated genomic responses, and suggest that adaptation to pathogens can be considered as a primary example of polygenic selection.
Resumo:
Mouse models are important tools to decipher the molecular mechanisms of mammary carcinogenesis and to mimic the respective human disease. Despite sharing common phenotypic and genetic features, the proper translation of murine models to human breast cancer remains a challenging task. In a previous study we showed that in the SV40 transgenic WAP-T mice an active Met-pathway and epithelial-mesenchymal characteristics distinguish low- and high-grade mammary carcinoma. To assign these murine tumors to corresponding human tumors we here incorporated the analysis of expression of transcription factor (TF) coding genes and show that thereby a more accurate interspecies translation can be achieved. We describe a novel cross-species translation procedure and demonstrate that expression of unsupervised selected TFs, such as ELF5, HOXA5 and TFCP2L1, can clearly distinguish between the human molecular breast cancer subtypes-or as, for example, expression of TFAP2B between yet unclassified subgroups. By integrating different levels of information like histology, gene set enrichment, expression of differentiation markers and TFs we conclude that tumors in WAP-T mice exhibit similarities to both, human basal-like and non-basal-like subtypes. We furthermore suggest that the low- and high-grade WAP-T tumor phenotypes might arise from distinct cells of tumor origin. Our results underscore the importance of TFs as common cross-species denominators in the regulatory networks underlying mammary carcinogenesis.