274 resultados para GENOME PROJECT
Resumo:
BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.
Resumo:
BACKGROUND AND PURPOSE: Beyond the Framingham Stroke Risk Score, prediction of future stroke may improve with a genetic risk score (GRS) based on single-nucleotide polymorphisms associated with stroke and its risk factors. METHODS: The study includes 4 population-based cohorts with 2047 first incident strokes from 22,720 initially stroke-free European origin participants aged ≥55 years, who were followed for up to 20 years. GRSs were constructed with 324 single-nucleotide polymorphisms implicated in stroke and 9 risk factors. The association of the GRS to first incident stroke was tested using Cox regression; the GRS predictive properties were assessed with area under the curve statistics comparing the GRS with age and sex, Framingham Stroke Risk Score models, and reclassification statistics. These analyses were performed per cohort and in a meta-analysis of pooled data. Replication was sought in a case-control study of ischemic stroke. RESULTS: In the meta-analysis, adding the GRS to the Framingham Stroke Risk Score, age and sex model resulted in a significant improvement in discrimination (all stroke: Δjoint area under the curve=0.016, P=2.3×10(-6); ischemic stroke: Δjoint area under the curve=0.021, P=3.7×10(-7)), although the overall area under the curve remained low. In all the studies, there was a highly significantly improved net reclassification index (P<10(-4)). CONCLUSIONS: The single-nucleotide polymorphisms associated with stroke and its risk factors result only in a small improvement in prediction of future stroke compared with the classical epidemiological risk factors for stroke.
Resumo:
Body fat distribution, particularly centralized obesity, is associated with metabolic risk above and beyond total adiposity. We performed genome-wide association of abdominal adipose depots quantified using computed tomography (CT) to uncover novel loci for body fat distribution among participants of European ancestry. Subcutaneous and visceral fat were quantified in 5,560 women and 4,997 men from 4 population-based studies. Genome-wide genotyping was performed using standard arrays and imputed to ~2.5 million Hapmap SNPs. Each study performed a genome-wide association analysis of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), VAT adjusted for body mass index, and VAT/SAT ratio (a metric of the propensity to store fat viscerally as compared to subcutaneously) in the overall sample and in women and men separately. A weighted z-score meta-analysis was conducted. For the VAT/SAT ratio, our most significant p-value was rs11118316 at LYPLAL1 gene (p = 3.1 × 10E-09), previously identified in association with waist-hip ratio. For SAT, the most significant SNP was in the FTO gene (p = 5.9 × 10E-08). Given the known gender differences in body fat distribution, we performed sex-specific analyses. Our most significant finding was for VAT in women, rs1659258 near THNSL2 (p = 1.6 × 10-08), but not men (p = 0.75). Validation of this SNP in the GIANT consortium data demonstrated a similar sex-specific pattern, with observed significance in women (p = 0.006) but not men (p = 0.24) for BMI and waist circumference (p = 0.04 [women], p = 0.49 [men]). Finally, we interrogated our data for the 14 recently published loci for body fat distribution (measured by waist-hip ratio adjusted for BMI); associations were observed at 7 of these loci. In contrast, we observed associations at only 7/32 loci previously identified in association with BMI; the majority of overlap was observed with SAT. Genome-wide association for visceral and subcutaneous fat revealed a SNP for VAT in women. More refined phenotypes for body composition and fat distribution can detect new loci not previously uncovered in large-scale GWAS of anthropometric traits.
Resumo:
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Resumo:
Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.
Resumo:
The R package EasyStrata facilitates the evaluation and visualization of stratified genome-wide association meta-analyses (GWAMAs) results. It provides (i) statistical methods to test and account for between-strata difference as a means to tackle gene-strata interaction effects and (ii) extended graphical features tailored for stratified GWAMA results. The software provides further features also suitable for general GWAMAs including functions to annotate, exclude or highlight specific loci in plots or to extract independent subsets of loci from genome-wide datasets. It is freely available and includes a user-friendly scripting interface that simplifies data handling and allows for combining statistical and graphical functions in a flexible fashion. AVAILABILITY: EasyStrata is available for free (under the GNU General Public License v3) from our Web site www.genepi-regensburg.de/easystrata and from the CRAN R package repository cran.r-project.org/web/packages/EasyStrata/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
Plants have the ability to use the composition of incident light as a cue to adapt development and growth to their environment. Arabidopsis thaliana as well as many crops are best adapted to sunny habitats. When subjected to shade, these plants exhibit a variety of physiological responses collectively called shade avoidance syndrome (SAS). It includes increased growth of hypocotyl and petioles, decreased growth rate of cotyledons and reduced branching and crop yield. These responses are mainly mediated by phytochrome photoreceptors, which exist either in an active, far-red light (FR) absorbing or an inactive, red light (R) absorbing isoform. In direct sunlight, the R to FR light (R/FR) ratio is high and converts the phytochromes into their physiologically active state. The phytochromes interact with downstream transcription factors such as PHYTOCHROME INTERACTING FACTOR (PIF), which are subsequently degraded. Light filtered through a canopy is strongly depleted in R, which result in a low R/FR ratio and renders the phytochromes inactive. Protein levels of downstream transcription factors are stabilized, which initiates the expression of shade-induced genes such as HFR1, PIL1 or ATHB-2. In my thesis, I investigated transcriptional responses mediated by the SAS in whole Arabidopsis seedlings. Using microarray and chromatin immunoprecipitation data, we identified genome-wide PIF4 and PIF5 dependent shade regulated gene as well as putative direct target genes of PIF5. This revealed evidence for a direct regulatory link between phytochrome signaling and the growth promoting phytohormone auxin (IAA) at the level of biosynthesis, transport and signaling. Subsequently, it was shown, that free-IAA levels are upregulated in response to shade. It is assumed that shade-induced auxin production takes predominantly place in cotyledons of seedlings. This implies, that IAA is subsequently transported basipetally to the hypocotyl and enhances elongation growth. The importance of auxin transport for growth responses has been established by chemical and genetic approaches. To gain a better understanding of spatio-temporal transcriptional regulation of shade-induce auxin, I generated in a second project, an organ specific high throughput data focusing on cotyledon and hypocotyl of young Arabidopsis seedlings. Interestingly, both organs show an opposite growth regulation by shade. I first investigated the spatio-transcriptional regulation of auxin re- sponsive gene, in order to determine how broad gene expression pattern can be explained by the hypothesized movement of auxin from cotyledons to hypocotyls in shade. The analysis suggests, that several genes are indeed regulated according to our prediction and others are regulated in a more complex manner. In addition, analysis of gene families of auxin biosynthetic and transport components, lead to the identification of essential family members for shade-induced growth re- sponses, which were subsequently experimentally confirmed. Finally, the analysis of expression pattern identified several candidate genes, which possibly explain aspects of the opposite growth response of the different organs.
Resumo:
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.
Resumo:
The first extensive catalog of structural human variation was recently released. It showed that large stretches of genomic DNA that vary considerably in copy number were extremely abundant. Thus it is conceivable that they play a major role in functional variation. Consistently, genomic insertions and deletions were shown to contribute to phenotypic differences by modifying not only the expression levels of genes within the aneuploid segments but also of normal copy-number neighboring genes. In this report, we review the possible mechanisms behind this latter effect.
Resumo:
The mutualistic symbiosis involving Glomeromycota, a distinctive phylum of early diverging Fungi, is widely hypothesized to have promoted the evolution of land plants during the middle Paleozoic. These arbuscular mycorrhizal fungi (AMF) perform vital functions in the phosphorus cycle that are fundamental to sustainable crop plant productivity. The unusual biological features of AMF have long fascinated evolutionary biologists. The coenocytic hyphae host a community of hundreds of nuclei and reproduce clonally through large multinucleated spores. It has been suggested that the AMF maintain a stable assemblage of several different genomes during the life cycle, but this genomic organization has been questioned. Here we introduce the 153-Mb haploid genome of Rhizophagus irregularis and its repertoire of 28,232 genes. The observed low level of genome polymorphism (0.43 SNP per kb) is not consistent with the occurrence of multiple, highly diverged genomes. The expansion of mating-related genes suggests the existence of cryptic sex-related processes. A comparison of gene categories confirms that R. irregularis is close to the Mucoromycotina. The AMF obligate biotrophy is not explained by genome erosion or any related loss of metabolic complexity in central metabolism, but is marked by a lack of genes encoding plant cell wall-degrading enzymes and of genes involved in toxin and thiamine synthesis. A battery of mycorrhiza-induced secreted proteins is expressed in symbiotic tissues. The present comprehensive repertoire of R. irregularis genes provides a basis for future research on symbiosis-related mechanisms in Glomeromycota.
Resumo:
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on (1)H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10(-8)) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10(-44)) and lysine (rs8101881, P = 1.2×10(-33)), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers.
Resumo:
BACKGROUND & AIMS: Hepatitis C virus (HCV) induces chronic infection in 50% to 80% of infected persons; approximately 50% of these do not respond to therapy. We performed a genome-wide association study to screen for host genetic determinants of HCV persistence and response to therapy. METHODS: The analysis included 1362 individuals: 1015 with chronic hepatitis C and 347 who spontaneously cleared the virus (448 were coinfected with human immunodeficiency virus [HIV]). Responses to pegylated interferon alfa and ribavirin were assessed in 465 individuals. Associations between more than 500,000 single nucleotide polymorphisms (SNPs) and outcomes were assessed by multivariate logistic regression. RESULTS: Chronic hepatitis C was associated with SNPs in the IL28B locus, which encodes the antiviral cytokine interferon lambda. The rs8099917 minor allele was associated with progression to chronic HCV infection (odds ratio [OR], 2.31; 95% confidence interval [CI], 1.74-3.06; P = 6.07 x 10(-9)). The association was observed in HCV mono-infected (OR, 2.49; 95% CI, 1.64-3.79; P = 1.96 x 10(-5)) and HCV/HIV coinfected individuals (OR, 2.16; 95% CI, 1.47-3.18; P = 8.24 x 10(-5)). rs8099917 was also associated with failure to respond to therapy (OR, 5.19; 95% CI, 2.90-9.30; P = 3.11 x 10(-8)), with the strongest effects in patients with HCV genotype 1 or 4. This risk allele was identified in 24% of individuals with spontaneous HCV clearance, 32% of chronically infected patients who responded to therapy, and 58% who did not respond (P = 3.2 x 10(-10)). Resequencing of IL28B identified distinct haplotypes that were associated with the clinical phenotype. CONCLUSIONS: The association of the IL28B locus with natural and treatment-associated control of HCV indicates the importance of innate immunity and interferon lambda in the pathogenesis of HCV infection.
Resumo:
Cerebrospinal fluid amyloid-beta 1-42 (Aβ1-42) and phosphorylated Tau at position 181 (pTau181) are biomarkers of Alzheimer's disease (AD). We performed an analysis and meta-analysis of genome-wide association study data on Aβ1-42 and pTau181 in AD dementia patients followed by independent replication. An association was found between Aβ1-42 level and a single-nucleotide polymorphism in SUCLG2 (rs62256378) (P = 2.5×10(-12)). An interaction between APOE genotype and rs62256378 was detected (P = 9.5 × 10(-5)), with the strongest effect being observed in APOE-ε4 noncarriers. Clinically, rs62256378 was associated with rate of cognitive decline in AD dementia patients (P = 3.1 × 10(-3)). Functional microglia experiments showed that SUCLG2 was involved in clearance of Aβ1-42.