29 resultados para genome wide complex trait analysis
em Duke University
Resumo:
Alzheimer's disease is a complex and progressive neurodegenerative disease leading to loss of memory, cognitive impairment, and ultimately death. To date, six large-scale genome-wide association studies have been conducted to identify SNPs that influence disease predisposition. These studies have confirmed the well-known APOE epsilon4 risk allele, identified a novel variant that influences disease risk within the APOE epsilon4 population, found a SNP that modifies the age of disease onset, as well as reported the first sex-linked susceptibility variant. Here we report a genome-wide scan of Alzheimer's disease in a set of 331 cases and 368 controls, extending analyses for the first time to include assessments of copy number variation. In this analysis, no new SNPs show genome-wide significance. We also screened for effects of copy number variation, and while nothing was significant, a duplication in CHRNA7 appears interesting enough to warrant further investigation.
Resumo:
OBJECTIVES: Identification of patient subpopulations susceptible to develop myocardial infarction (MI) or, conversely, those displaying either intrinsic cardioprotective phenotypes or highly responsive to protective interventions remain high-priority knowledge gaps. We sought to identify novel common genetic variants associated with perioperative MI in patients undergoing coronary artery bypass grafting using genome-wide association methodology. SETTING: 107 secondary and tertiary cardiac surgery centres across the USA. PARTICIPANTS: We conducted a stage I genome-wide association study (GWAS) in 1433 ethnically diverse patients of both genders (112 cases/1321 controls) from the Genetics of Myocardial Adverse Outcomes and Graft Failure (GeneMAGIC) study, and a stage II analysis in an expanded population of 2055 patients (225 cases/1830 controls) combined from the GeneMAGIC and Duke Perioperative Genetics and Safety Outcomes (PEGASUS) studies. Patients undergoing primary non-emergent coronary bypass grafting were included. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome variable was perioperative MI, defined as creatine kinase MB isoenzyme (CK-MB) values ≥10× upper limit of normal during the first postoperative day, and not attributable to preoperative MI. Secondary outcomes included postoperative CK-MB as a quantitative trait, or a dichotomised phenotype based on extreme quartiles of the CK-MB distribution. RESULTS: Following quality control and adjustment for clinical covariates, we identified 521 single nucleotide polymorphisms in the stage I GWAS analysis. Among these, 8 common variants in 3 genes or intergenic regions met p<10(-5) in stage II. A secondary analysis using CK-MB as a quantitative trait (minimum p=1.26×10(-3) for rs609418), or a dichotomised phenotype based on extreme CK-MB values (minimum p=7.72×10(-6) for rs4834703) supported these findings. Pathway analysis revealed that genes harbouring top-scoring variants cluster in pathways of biological relevance to extracellular matrix remodelling, endoplasmic reticulum-to-Golgi transport and inflammation. CONCLUSIONS: Using a two-stage GWAS and pathway analysis, we identified and prioritised several potential susceptibility loci for perioperative MI.
Resumo:
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Resumo:
Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.
Resumo:
To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54x10(-10); odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36-1.82), and serine racemase (SRR) (P = 3.06x10(-9); OR = 1.28; 95% CI = 1.18-1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65x10(-10); OR = 1.29, 95% CI = 1.19-1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.
Resumo:
Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
Extensive departures from balanced gene dose in aneuploids are highly deleterious. However, we know very little about the relationship between gene copy number and expression in aneuploid cells. We determined copy number and transcript abundance (expression) genome-wide in Drosophila S2 cells by DNA-Seq and RNA-Seq. We found that S2 cells are aneuploid for >43 Mb of the genome, primarily in the range of one to five copies, and show a male genotype ( approximately two X chromosomes and four sets of autosomes, or 2X;4A). Both X chromosomes and autosomes showed expression dosage compensation. X chromosome expression was elevated in a fixed-fold manner regardless of actual gene dose. In engineering terms, the system "anticipates" the perturbation caused by X dose, rather than responding to an error caused by the perturbation. This feed-forward regulation resulted in precise dosage compensation only when X dose was half of the autosome dose. Insufficient compensation occurred at lower X chromosome dose and excessive expression occurred at higher doses. RNAi knockdown of the Male Specific Lethal complex abolished feed-forward regulation. Both autosome and X chromosome genes show Male Specific Lethal-independent compensation that fits a first order dose-response curve. Our data indicate that expression dosage compensation dampens the effect of altered DNA copy number genome-wide. For the X chromosome, compensation includes fixed and dose-dependent components.
Resumo:
In many bacteria, there is a genome-wide bias towards co-orientation of replication and transcription, with essential and/or highly-expressed genes further enriched co-directionally. We previously found that reversing this bias in the bacterium Bacillus subtilis slows replication elongation, and we proposed that this effect contributes to the evolutionary pressure selecting the transcription-replication co-orientation bias. This selection might have been based purely on selection for speedy replication; alternatively, the slowed replication might actually represent an average of individual replication-disruption events, each of which is counter-selected independently because genome integrity is selected. To differentiate these possibilities and define the precise forces driving this aspect of genome organization, we generated new strains with inversions either over approximately 1/4 of the chromosome or at ribosomal RNA (rRNA) operons. Applying mathematical analysis to genomic microarray snapshots, we found that replication rates vary dramatically within the inverted genome. Replication is moderately impeded throughout the inverted region, which results in a small but significant competitive disadvantage in minimal medium. Importantly, replication is strongly obstructed at inverted rRNA loci in rich medium. This obstruction results in disruption of DNA replication, activation of DNA damage responses, loss of genome integrity, and cell death. Our results strongly suggest that preservation of genome integrity drives the evolution of co-orientation of replication and transcription, a conserved feature of genome organization.
Resumo:
This is a crucial transition time for human genetics in general, and for HIV host genetics in particular. After years of equivocal results from candidate gene analyses, several genome-wide association studies have been published that looked at plasma viral load or disease progression. Results from other studies that used various large-scale approaches (siRNA screens, transcriptome or proteome analysis, comparative genomics) have also shed new light on retroviral pathogenesis. However, most of the inter-individual variability in response to HIV-1 infection remains to be explained: genome resequencing and systems biology approaches are now required to progress toward a better understanding of the complex interactions between HIV-1 and its human host.
Resumo:
Meta-analyses of genome-wide association studies (GWAS) have demonstrated that the same genetic variants can be associated with multiple diseases and other complex traits. We present software called CPAG (Cross-Phenotype Analysis of GWAS) to look for similarities between 700 traits, build trees with informative clusters, and highlight underlying pathways. Clusters are consistent with pre-defined groups and literature-based validation but also reveal novel connections. We report similarity between plasma palmitoleic acid and Crohn's disease and find that specific fatty acids exacerbate enterocolitis in zebrafish. CPAG will become increasingly powerful as more genetic variants are uncovered, leading to a deeper understanding of complex traits. CPAG is freely available at www.sourceforge.net/projects/CPAG/.
Resumo:
Genome-wide association studies (GWASs) have characterized 13 loci associated with melanoma, which only account for a small part of melanoma risk. To identify new genes with too small an effect to be detected individually but which collectively influence melanoma risk and/or show interactive effects, we used a two-step analysis strategy including pathway analysis of genome-wide SNP data, in a first step, and epistasis analysis within significant pathways, in a second step. Pathway analysis, using the gene-set enrichment analysis (GSEA) approach and the gene ontology (GO) database, was applied to the outcomes of MELARISK (3,976 subjects) and MDACC (2,827 subjects) GWASs. Cross-gene SNP-SNP interaction analysis within melanoma-associated GOs was performed using the INTERSNP software. Five GO categories were significantly enriched in genes associated with melanoma (false discovery rate ≤ 5% in both studies): response to light stimulus, regulation of mitotic cell cycle, induction of programmed cell death, cytokine activity and oxidative phosphorylation. Epistasis analysis, within each of the five significant GOs, showed significant evidence for interaction for one SNP pair at TERF1 and AFAP1L2 loci (pmeta-int = 2.0 × 10(-7) , which met both the pathway and overall multiple-testing corrected thresholds that are equal to 9.8 × 10(-7) and 2.0 × 10(-7) , respectively) and suggestive evidence for another pair involving correlated SNPs at the same loci (pmeta-int = 3.6 × 10(-6) ). This interaction has important biological relevance given the key role of TERF1 in telomere biology and the reported physical interaction between TERF1 and AFAP1L2 proteins. This finding brings a novel piece of evidence for the emerging role of telomere dysfunction into melanoma development.
Resumo:
Thermodynamic stability measurements on proteins and protein-ligand complexes can offer insights not only into the fundamental properties of protein folding reactions and protein functions, but also into the development of protein-directed therapeutic agents to combat disease. Conventional calorimetric or spectroscopic approaches for measuring protein stability typically require large amounts of purified protein. This requirement has precluded their use in proteomic applications. Stability of Proteins from Rates of Oxidation (SPROX) is a recently developed mass spectrometry-based approach for proteome-wide thermodynamic stability analysis. Since the proteomic coverage of SPROX is fundamentally limited by the detection of methionine-containing peptides, the use of tryptophan-containing peptides was investigated in this dissertation. A new SPROX-like protocol was developed that measured protein folding free energies using the denaturant dependence of the rate at which globally protected tryptophan and methionine residues are modified with dimethyl (2-hydroxyl-5-nitrobenzyl) sulfonium bromide and hydrogen peroxide, respectively. This so-called Hybrid protocol was applied to proteins in yeast and MCF-7 cell lysates and achieved a ~50% increase in proteomic coverage compared to probing only methionine-containing peptides. Subsequently, the Hybrid protocol was successfully utilized to identify and quantify both known and novel protein-ligand interactions in cell lysates. The ligands under study included the well-known Hsp90 inhibitor geldanamycin and the less well-understood omeprazole sulfide that inhibits liver-stage malaria. In addition to protein-small molecule interactions, protein-protein interactions involving Puf6 were investigated using the SPROX technique in comparative thermodynamic analyses performed on wild-type and Puf6-deletion yeast strains. A total of 39 proteins were detected as Puf6 targets and 36 of these targets were previously unknown to interact with Puf6. Finally, to facilitate the SPROX/Hybrid data analysis process and minimize human errors, a Bayesian algorithm was developed for transition midpoint assignment. In summary, the work in this dissertation expanded the scope of SPROX and evaluated the use of SPROX/Hybrid protocols for characterizing protein-ligand interactions in complex biological mixtures.
Resumo:
Mitotic genome instability can occur during the repair of double-strand breaks (DSBs) in DNA, which arise from endogenous and exogenous sources. Studying the mechanisms of DNA repair in the budding yeast, Saccharomyces cerevisiae has shown that Homologous Recombination (HR) is a vital repair mechanism for DSBs. HR can result in a crossover event, in which the broken molecule reciprocally exchanges information with a homologous repair template. The current model of double-strand break repair (DSBR) also allows for a tract of information to non-reciprocally transfer from the template molecule to the broken molecule. These “gene conversion” events can vary in size and can occur in conjunction with a crossover event or in isolation. The frequency and size of gene conversions in isolation and gene conversions associated with crossing over has been a source of debate due to the variation in systems used to detect gene conversions and the context in which the gene conversions are measured.
In Chapter 2, I use an unbiased system that measures the frequency and size of gene conversion events, as well as the association of gene conversion events with crossing over between homologs in diploid yeast. We show mitotic gene conversions occur at a rate of 1.3x10-6 per cell division, are either large (median 54.0kb) or small (median 6.4kb), and are associated with crossing over 43% of the time.
DSBs can arise from endogenous cellular processes such as replication and transcription. Two important RNA/DNA hybrids are involved in replication and transcription: R-loops, which form when an RNA transcript base pairs with the DNA template and displaces the non-template DNA strand, and ribonucleotides embedded into DNA (rNMPs), which arise when replicative polymerase errors insert ribonucleotide instead of deoxyribonucleotide triphosphates. RNaseH1 (encoded by RNH1) and RNaseH2 (whose catalytic subunit is encoded by RNH201) both recognize and degrade the RNA in within R-loops while RNaseH2 alone recognizes, nicks, and initiates removal of rNMPs embedded into DNA. Due to their redundant abilities to act on RNA:DNA hybrids, aberrant removal of rNMPs from DNA has been thought to lead to genome instability in an rnh201Δ background.
In Chapter 3, I characterize (1) non-selective genome-wide homologous recombination events and (2) crossing over on chromosome IV in mutants defective in RNaseH1, RNaseH2, or RNaseH1 and RNaseH2. Using a mutant DNA polymerase that incorporates 4-fold fewer rNMPs than wild type, I demonstrate that the primary recombinogenic lesion in the RNaseH2-defective genome is not rNMPs, but rather R-loops. This work suggests different in-vivo roles for RNaseH1 and RNaseH2 in resolving R-loops in yeast and is consistent with R-loops, not rNMPs, being the the likely source of pathology in Aicardi-Goutières Syndrome patients defective in RNaseH2.