13 resultados para Gene Set Enrichment
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
Resumo:
Introduction. Genetic epidemiology is focused on the study of the genetic causes that determine health and diseases in populations. To achieve this goal a common strategy is to explore differences in genetic variability between diseased and nondiseased individuals. Usual markers of genetic variability are single nucleotide polymorphisms (SNPs) which are changes in just one base in the genome. The usual statistical approach in genetic epidemiology study is a marginal analysis, where each SNP is analyzed separately for association with the phenotype. Motivation. It has been observed, that for common diseases the single-SNP analysis is not very powerful for detecting genetic causing variants. In this work, we consider Gene Set Analysis (GSA) as an alternative to standard marginal association approaches. GSA aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. Objective. We present a new optimized implementation of a pair of gene set analysis methodologies for analyze the individual evidence of SNPs in biological pathways. We perform a simulation study for exploring the power of the proposed methodologies in a set of scenarios with different number of causal SNPs under different effect sizes. In addition, we compare the results with the usual single-SNP analysis method. Moreover, we show the advantage of using the proposed gene set approaches in the context of an Alzheimer disease case-control study where we explore the Reelin signal pathway.
Resumo:
Background: Aging results in a progressive loss of skeletal muscle, a condition known as sarcopenia. Mitochondrial DNA (mtDNA) mutations accumulate with aging in skeletal muscle and correlate with muscle loss, although no causal relationship has been established. Methodology/Principal Findings: We investigated the relationship between mtDNA mutations and sarcopenia at the gene expression and biochemical levels using a mouse model that expresses a proofreading-deficient version (D257A) of the mitochondrial DNA Polymerase c, resulting in increased spontaneous mtDNA mutation rates. Gene expression profiling of D257A mice followed by Parametric Analysis of Gene Set Enrichment (PAGE) indicates that the D257A mutation is associated with a profound downregulation of gene sets associated with mitochondrial function. At the biochemical level, sarcopenia in D257A mice is associated with a marked reduction (35–50%) in the content of electron transport chain (ETC) complexes I, III and IV, all of which are partly encoded by mtDNA. D257A mice display impaired mitochondrial bioenergetics associated with compromised state-3 respiration, lower ATP content and a resulting decrease in mitochondrial membrane potential (Dym). Surprisingly, mitochondrial dysfunction was not accompanied by an increase in mitochondrial reactive oxygen species (ROS) production or oxidative damage. Conclusions/Significance: These findings demonstrate that mutations in mtDNA can be causal in sarcopenia by affecting the assembly of functional ETC complexes, the lack of which provokes a decrease in oxidative phosphorylation, without an increase in oxidative stress, and ultimately, skeletal muscle apoptosis and sarcopenia.
Resumo:
The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.
Resumo:
Background: Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs). Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli.Results: Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs) treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580) and p38α deficient (p38α-/-) MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions: Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell adaptation to stress.
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
Resumo:
The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally.
Resumo:
Malaria in pregnancy forms a substantial part of the worldwide burden of malaria, with an estimated annual death toll of up to 200,000 infants, as well as increased maternal morbidity and mortality. Studies of genetic susceptibility to malaria have so far focused on infant malaria, with only a few studies investigating the genetic basis of placental malaria, focusing only on a limited number of candidate genes. The aim of this study therefore was to identify novel host genetic factors involved in placental malaria infection. To this end we carried out a nested case-control study on 180 Mozambican pregnant women with placental malaria infection, and 180 controls within an intervention trial of malaria prevention. We genotyped 880 SNPs in a set of 64 functionally related genes involved in glycosylation and innate immunity. A SNP located in the gene FUT9, rs3811070, was significantly associated with placental malaria infection (OR = 2.31, permutation p-value = 0.028). Haplotypic analysis revealed a similarly strong association of a common haplotype of four SNPs including rs3811070. FUT9 codes for a fucosyl-transferase that is catalyzing the last step in the biosynthesis of the Lewis-x antigen, which forms part of the Lewis blood group-related antigens. These results therefore suggest an involvement of this antigen in the pathogenesis of placental malaria infection.
Resumo:
AbstractBACKGROUND: Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult.PRINCIPAL FINDINGS: We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell.CONCLUSIONS: For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases.AVAILABILITY: The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download
Resumo:
Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P = 0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P = 0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process.
Resumo:
The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.
Resumo:
Background: Understanding the relationship between gene expression changes, enzyme activity shifts, and the corresponding physiological adaptive response of organisms to environmental cues is crucial in explaining how cells cope with stress. For example, adaptation of yeast to heat shock involves a characteristic profile of changes to the expression levels of genes coding for enzymes of the glycolytic pathway and some of its branches. The experimental determination of changes in gene expression profiles provides a descriptive picture of the adaptive response to stress. However, it does not explain why a particular profile is selected for any given response. Results: We used mathematical models and analysis of in silico gene expression profiles (GEPs) to understand how changes in gene expression correlate to an efficient response of yeast cells to heat shock. An exhaustive set of GEPs, matched with the corresponding set of enzyme activities, was simulated and analyzed. The effectiveness of each profile in the response to heat shock was evaluated according to relevant physiological and functional criteria. The small subset of GEPs that lead to effective physiological responses after heat shock was identified as the result of the tuning of several evolutionary criteria. The experimentally observed transcriptional changes in response to heat shock belong to this set and can be explained by quantitative design principles at the physiological level that ultimately constrain changes in gene expression. Conclusion: Our theoretical approach suggests a method for understanding the combined effect of changes in the expression of multiple genes on the activity of metabolic pathways, and consequently on the adaptation of cellular metabolism to heat shock. This method identifies quantitative design principles that facilitate understating the response of the cell to stress.
Resumo:
BACKGROUND: The bacterial flagellum is the most important organelle of motility in bacteria and plays a key role in many bacterial lifestyles, including virulence. The flagellum also provides a paradigm of how hierarchical gene regulation, intricate protein-protein interactions and controlled protein secretion can result in the assembly of a complex multi-protein structure tightly orchestrated in time and space. As if to stress its importance, plants and animals produce receptors specifically dedicated to the recognition of flagella. Aside from motility, the flagellum also moonlights as an adhesion and has been adapted by humans as a tool for peptide display. Flagellar sequence variation constitutes a marker with widespread potential uses for studies of population genetics and phylogeny of bacterial species. RESULTS: We sequenced the complete flagellin gene (flaA) in 18 different species and subspecies of Aeromonas. Sequences ranged in size from 870 (A. allosaccharophila) to 921 nucleotides (A. popoffii). The multiple alignment displayed 924 sites, 66 of which presented alignment gaps. The phylogenetic tree revealed the existence of two groups of species exhibiting different FlaA flagellins (FlaA1 and FlaA2). Maximum likelihood models of codon substitution were used to analyze flaA sequences. Likelihood ratio tests suggested a low variation in selective pressure among lineages, with an omega ratio of less than 1 indicating the presence of purifying selection in almost all cases. Only one site under potential diversifying selection was identified (isoleucine in position 179). However, 17 amino acid positions were inferred as sites that are likely to be under positive selection using the branch-site model. Ancestral reconstruction revealed that these 17 amino acids were among the amino acid changes detected in the ancestral sequence. CONCLUSION: The models applied to our set of sequences allowed us to determine the possible evolutionary pathway followed by the flaA gene in Aeromonas, suggesting that this gene have probably been evolving independently in the two groups of Aeromonas species since the divergence of a distant common ancestor after one or several episodes of positive selection. REVIEWERS: This article was reviewed by Alexey Kondrashov, John Logsdon and Olivier Tenaillon (nominated by Laurence D Hurst).