31 resultados para Genome-wide Search
Resumo:
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
Resumo:
Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.
Resumo:
To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54x10(-10); odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36-1.82), and serine racemase (SRR) (P = 3.06x10(-9); OR = 1.28; 95% CI = 1.18-1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65x10(-10); OR = 1.29, 95% CI = 1.19-1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.
Resumo:
Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.
Resumo:
Alzheimer's disease is a complex and progressive neurodegenerative disease leading to loss of memory, cognitive impairment, and ultimately death. To date, six large-scale genome-wide association studies have been conducted to identify SNPs that influence disease predisposition. These studies have confirmed the well-known APOE epsilon4 risk allele, identified a novel variant that influences disease risk within the APOE epsilon4 population, found a SNP that modifies the age of disease onset, as well as reported the first sex-linked susceptibility variant. Here we report a genome-wide scan of Alzheimer's disease in a set of 331 cases and 368 controls, extending analyses for the first time to include assessments of copy number variation. In this analysis, no new SNPs show genome-wide significance. We also screened for effects of copy number variation, and while nothing was significant, a duplication in CHRNA7 appears interesting enough to warrant further investigation.
Resumo:
OBJECTIVES: Identification of patient subpopulations susceptible to develop myocardial infarction (MI) or, conversely, those displaying either intrinsic cardioprotective phenotypes or highly responsive to protective interventions remain high-priority knowledge gaps. We sought to identify novel common genetic variants associated with perioperative MI in patients undergoing coronary artery bypass grafting using genome-wide association methodology. SETTING: 107 secondary and tertiary cardiac surgery centres across the USA. PARTICIPANTS: We conducted a stage I genome-wide association study (GWAS) in 1433 ethnically diverse patients of both genders (112 cases/1321 controls) from the Genetics of Myocardial Adverse Outcomes and Graft Failure (GeneMAGIC) study, and a stage II analysis in an expanded population of 2055 patients (225 cases/1830 controls) combined from the GeneMAGIC and Duke Perioperative Genetics and Safety Outcomes (PEGASUS) studies. Patients undergoing primary non-emergent coronary bypass grafting were included. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome variable was perioperative MI, defined as creatine kinase MB isoenzyme (CK-MB) values ≥10× upper limit of normal during the first postoperative day, and not attributable to preoperative MI. Secondary outcomes included postoperative CK-MB as a quantitative trait, or a dichotomised phenotype based on extreme quartiles of the CK-MB distribution. RESULTS: Following quality control and adjustment for clinical covariates, we identified 521 single nucleotide polymorphisms in the stage I GWAS analysis. Among these, 8 common variants in 3 genes or intergenic regions met p<10(-5) in stage II. A secondary analysis using CK-MB as a quantitative trait (minimum p=1.26×10(-3) for rs609418), or a dichotomised phenotype based on extreme CK-MB values (minimum p=7.72×10(-6) for rs4834703) supported these findings. Pathway analysis revealed that genes harbouring top-scoring variants cluster in pathways of biological relevance to extracellular matrix remodelling, endoplasmic reticulum-to-Golgi transport and inflammation. CONCLUSIONS: Using a two-stage GWAS and pathway analysis, we identified and prioritised several potential susceptibility loci for perioperative MI.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
In many bacteria, there is a genome-wide bias towards co-orientation of replication and transcription, with essential and/or highly-expressed genes further enriched co-directionally. We previously found that reversing this bias in the bacterium Bacillus subtilis slows replication elongation, and we proposed that this effect contributes to the evolutionary pressure selecting the transcription-replication co-orientation bias. This selection might have been based purely on selection for speedy replication; alternatively, the slowed replication might actually represent an average of individual replication-disruption events, each of which is counter-selected independently because genome integrity is selected. To differentiate these possibilities and define the precise forces driving this aspect of genome organization, we generated new strains with inversions either over approximately 1/4 of the chromosome or at ribosomal RNA (rRNA) operons. Applying mathematical analysis to genomic microarray snapshots, we found that replication rates vary dramatically within the inverted genome. Replication is moderately impeded throughout the inverted region, which results in a small but significant competitive disadvantage in minimal medium. Importantly, replication is strongly obstructed at inverted rRNA loci in rich medium. This obstruction results in disruption of DNA replication, activation of DNA damage responses, loss of genome integrity, and cell death. Our results strongly suggest that preservation of genome integrity drives the evolution of co-orientation of replication and transcription, a conserved feature of genome organization.
Resumo:
cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence. Instead of pre-selecting promising candidate sequences, it utilizes information across all sequence regions to search for high-scoring motifs. We apply cERMIT on a range of direct binding and overexpression datasets; it substantially outperforms state-of-the-art approaches on curated ChIP-chip datasets, and easily scales to current mammalian ChIP-seq experiments with data on thousands of non-coding regions.
Resumo:
The complete and faithful duplication of the genome is essential to ensure normal cell division and organismal development. Eukaryotic DNA replication is initiated at multiple sites termed origins of replication that are activated at different time through S phase. The replication timing program is regulated by the S-phase checkpoint, which signals and repairs replicative stress. Eukaryotic DNA is packaged with histones into chromatin, thus DNA-templated processes including replication are modulated by the local chromatin environment such as post-translational modifications (PTMs) of histones.
One such epigenetic mark, methylation of lysine 20 on histone H4 (H4K20), has been linked to chromatin compaction, transcription, DNA repair and DNA replication. H4K20 can be mono-, di- and tri-methylated. Monomethylation of H4K20 (H4K20me1) is mediated by the cell cycle-regulated histone methyltransferase PR-Set7 and subsequent di-/tri- methylation is catalyzed by Suv4-20. Prior studies have shown that PR-Set7 depletion in mammalian cells results in defective S phase progression and the accumulation of DNA damage, which may be partially attributed to defects in origin selection and activation. Meanwhile, overexpression of mammalian PR-Set7 recruits components of pre-Replication Complex (pre-RC) onto chromatin and licenses replication origins for re-replication. However, these studies were limited to only a handful of mammalian origins, and it remains unclear how PR-Set7 impacts the replication program on a genomic scale. Finally, the methylation substrates of PR-Set7 include both histone (H4K20) and non-histone targets, therefore it is necessary to directly test the role of H4K20 methylation in PR-Set7 regulated phenotypes.
I employed genetic, cytological, and genomic approaches to better understand the role of H4K20 methylation in regulating DNA replication and genome stability in Drosophila melanogaster cells. Depletion of Drosophila PR-Set7 by RNAi in cultured Kc167 cells led to an ATR-dependent cell cycle arrest with near 4N DNA content and the accumulation of DNA damage, indicating a defect in completing S phase. The cells were arrested at the second S phase following PR-Set7 downregulation, suggesting that it was an epigenetic effect that coupled to the dilution of histone modification over multiple cell cycles. To directly test the role of H4K20 methylation in regulating genome integrity, I collaborated with the Duronio Lab and observed spontaneous DNA damage on the imaginal wing discs of third instar mutant larvae that had an alanine substitution on H4K20 (H4K20A) thus unable to be methylated, confirming that H4K20 is a bona fide target of PR-Set7 in maintaining genome integrity.
One possible source of DNA damage due to loss of PR-Set7 is reduced origin activity. I used BrdU-seq to profile the genome-wide origin activation pattern. However, I found that deregulation of H4K20 methylation states by manipulating the H4K20 methyltransferases PR-Set7 and Suv4-20 had no impact on origin activation throughout the genome. I then mapped the genomic distribution of DNA damage upon PR-Set7 depletion. Surprisingly, ChIP-seq of the DNA damage marker γ-H2A.v located the DNA damage to late replicating euchromatic regions of the Drosophila genome, and the strength of γ-H2A.v signal was uniformly distributed and spanned the entire late replication domain, implying stochastic replication fork collapse within late replicating regions. Together these data suggest that PR-Set7-mediated monomethylation of H4K20 is critical for maintaining the genomic integrity of late replicating domains, presumably via stabilization of late replicating forks.
In addition to investigating the function of H4K20me, I also used immunofluorescence to characterize the cell cycle regulated chromatin loading of Mcm2-7 complex, the DNA helicase that licenses replication origins, using H4K20me1 level as a proxy for cell cycle stages. In parallel with chromatin spindown data by Powell et al. (Powell et al. 2015), we showed a continuous loading of Mcm2-7 during G1 and a progressive removal from chromatin through S phase.
Resumo:
We performed a whole-genome association study of human immunodeficiency virus type 1 (HIV-1) set point among a cohort of African Americans (n = 515), and an intronic single-nucleotide polymorphism (SNP) in the HLA-B gene showed one of the strongest associations. We use a subset of patients to demonstrate that this SNP reflects the effect of the HLA-B*5703 allele, which shows a genome-wide statistically significant association with viral load set point (P = 5.6 x 10(-10)). These analyses therefore confirm a member of the HLA-B*57 group of alleles as the most important common variant that influences viral load variation in African Americans, which is consistent with what has been observed for individuals of European ancestry, among whom the most important common variant is HLA-B*5701.
Resumo:
The autosomal recessive kidney disease nephronophthisis (NPHP) constitutes the most frequent genetic cause of terminal renal failure in the first 3 decades of life. Ten causative genes (NPHP1-NPHP9 and NPHP11), whose products localize to the primary cilia-centrosome complex, support the unifying concept that cystic kidney diseases are "ciliopathies". Using genome-wide homozygosity mapping, we report here what we believe to be a new locus (NPHP-like 1 [NPHPL1]) for an NPHP-like nephropathy. In 2 families with an NPHP-like phenotype, we detected homozygous frameshift and splice-site mutations, respectively, in the X-prolyl aminopeptidase 3 (XPNPEP3) gene. In contrast to all known NPHP proteins, XPNPEP3 localizes to mitochondria of renal cells. However, in vivo analyses also revealed a likely cilia-related function; suppression of zebrafish xpnpep3 phenocopied the developmental phenotypes of ciliopathy morphants, and this effect was rescued by human XPNPEP3 that was devoid of a mitochondrial localization signal. Consistent with a role for XPNPEP3 in ciliary function, several ciliary cystogenic proteins were found to be XPNPEP3 substrates, for which resistance to N-terminal proline cleavage resulted in attenuated protein function in vivo in zebrafish. Our data highlight an emerging link between mitochondria and ciliary dysfunction, and suggest that further understanding the enzymatic activity and substrates of XPNPEP3 will illuminate novel cystogenic pathways.
Resumo:
Extensive departures from balanced gene dose in aneuploids are highly deleterious. However, we know very little about the relationship between gene copy number and expression in aneuploid cells. We determined copy number and transcript abundance (expression) genome-wide in Drosophila S2 cells by DNA-Seq and RNA-Seq. We found that S2 cells are aneuploid for >43 Mb of the genome, primarily in the range of one to five copies, and show a male genotype ( approximately two X chromosomes and four sets of autosomes, or 2X;4A). Both X chromosomes and autosomes showed expression dosage compensation. X chromosome expression was elevated in a fixed-fold manner regardless of actual gene dose. In engineering terms, the system "anticipates" the perturbation caused by X dose, rather than responding to an error caused by the perturbation. This feed-forward regulation resulted in precise dosage compensation only when X dose was half of the autosome dose. Insufficient compensation occurred at lower X chromosome dose and excessive expression occurred at higher doses. RNAi knockdown of the Male Specific Lethal complex abolished feed-forward regulation. Both autosome and X chromosome genes show Male Specific Lethal-independent compensation that fits a first order dose-response curve. Our data indicate that expression dosage compensation dampens the effect of altered DNA copy number genome-wide. For the X chromosome, compensation includes fixed and dose-dependent components.
Resumo:
To extend the understanding of host genetic determinants of HIV-1 control, we performed a genome-wide association study in a cohort of 2,554 infected Caucasian subjects. The study was powered to detect common genetic variants explaining down to 1.3% of the variability in viral load at set point. We provide overwhelming confirmation of three associations previously reported in a genome-wide study and show further independent effects of both common and rare variants in the Major Histocompatibility Complex region (MHC). We also examined the polymorphisms reported in previous candidate gene studies and fail to support a role for any variant outside of the MHC or the chemokine receptor cluster on chromosome 3. In addition, we evaluated functional variants, copy-number polymorphisms, epistatic interactions, and biological pathways. This study thus represents a comprehensive assessment of common human genetic variation in HIV-1 control in Caucasians.