20 resultados para SNPs
em Duke University
Resumo:
The population structure of an organism reflects its evolutionary history and influences its evolutionary trajectory. It constrains the combination of genetic diversity and reveals patterns of past gene flow. Understanding it is a prerequisite for detecting genomic regions under selection, predicting the effect of population disturbances, or modeling gene flow. This paper examines the detailed global population structure of Arabidopsis thaliana. Using a set of 5,707 plants collected from around the globe and genotyped at 149 SNPs, we show that while A. thaliana as a species self-fertilizes 97% of the time, there is considerable variation among local groups. This level of outcrossing greatly limits observed heterozygosity but is sufficient to generate considerable local haplotypic diversity. We also find that in its native Eurasian range A. thaliana exhibits continuous isolation by distance at every geographic scale without natural breaks corresponding to classical notions of populations. By contrast, in North America, where it exists as an exotic species, A. thaliana exhibits little or no population structure at a continental scale but local isolation by distance that extends hundreds of km. This suggests a pattern for the development of isolation by distance that can establish itself shortly after an organism fills a new habitat range. It also raises questions about the general applicability of many standard population genetics models. Any model based on discrete clusters of interchangeable individuals will be an uneasy fit to organisms like A. thaliana which exhibit continuous isolation by distance on many scales.
Resumo:
Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.
Resumo:
We present a novel strategy that uses high-throughput methods of isolating and mapping C. elegans mutants susceptible to pathogen infection. We show that C. elegans mutants that exhibit an enhanced pathogen accumulation (epa) phenotype can be rapidly identified and isolated using a sorting system that allows automation of the analysis, sorting, and dispensing of C. elegans by measuring fluorescent bacteria inside the animals. Furthermore, we validate the use of Amplifluor as a new single nucleotide polymorphism (SNP) mapping technique in C. elegans. We show that a set of 9 SNPs allows the linkage of C. elegans mutants to a 5-8 megabase sub-chromosomal region.
Resumo:
BACKGROUND: We previously identified a panel of genes associated with outcome of ovarian cancer. The purpose of the current study was to assess whether variants in these genes correlated with ovarian cancer risk. METHODS AND FINDINGS: Women with and without invasive ovarian cancer (749 cases, 1,041 controls) were genotyped at 136 single nucleotide polymorphisms (SNPs) within 13 candidate genes. Risk was estimated for each SNP and for overall variation within each gene. At the gene-level, variation within MSL1 (male-specific lethal-1 homolog) was associated with risk of serous cancer (p = 0.03); haplotypes within PRPF31 (PRP31 pre-mRNA processing factor 31 homolog) were associated with risk of invasive disease (p = 0.03). MSL1 rs7211770 was associated with decreased risk of serous disease (OR 0.81, 95% CI 0.66-0.98; p = 0.03). SNPs in MFSD7, BTN3A3, ZNF200, PTPRS, and CCND1A were inversely associated with risk (p<0.05), and there was increased risk at HEXIM1 rs1053578 (p = 0.04, OR 1.40, 95% CI 1.02-1.91). CONCLUSIONS: Tumor studies can reveal novel genes worthy of follow-up for cancer susceptibility. Here, we found that inherited markers in the gene encoding MSL1, part of a complex that modifies the histone H4, may decrease risk of invasive serous ovarian cancer.
Resumo:
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
Resumo:
Alzheimer's disease is a complex and progressive neurodegenerative disease leading to loss of memory, cognitive impairment, and ultimately death. To date, six large-scale genome-wide association studies have been conducted to identify SNPs that influence disease predisposition. These studies have confirmed the well-known APOE epsilon4 risk allele, identified a novel variant that influences disease risk within the APOE epsilon4 population, found a SNP that modifies the age of disease onset, as well as reported the first sex-linked susceptibility variant. Here we report a genome-wide scan of Alzheimer's disease in a set of 331 cases and 368 controls, extending analyses for the first time to include assessments of copy number variation. In this analysis, no new SNPs show genome-wide significance. We also screened for effects of copy number variation, and while nothing was significant, a duplication in CHRNA7 appears interesting enough to warrant further investigation.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
BACKGROUND: We have previously shown that a functional polymorphism of the UGT2B15 gene (rs1902023) was associated with increased risk of prostate cancer (PC). Novel functional polymorphisms of the UGT2B17 and UGT2B15 genes have been recently characterized by in vitro assays but have not been evaluated in epidemiologic studies. METHODS: Fifteen functional SNPs of the UGT2B17 and UGT2B15 genes, including cis-acting UGT2B gene SNPs, were genotyped in African American and Caucasian men (233 PC cases and 342 controls). Regression models were used to analyze the association between SNPs and PC risk. RESULTS: After adjusting for race, age and BMI, we found that six UGT2B15 SNPs (rs4148269, rs3100, rs9994887, rs13112099, rs7686914 and rs7696472) were associated with an increased risk of PC in log-additive models (p < 0.05). A SNP cis-acting on UGT2B17 and UGT2B15 expression (rs17147338) was also associated with increased risk of prostate cancer (OR = 1.65, 95% CI = 1.00-2.70); while a stronger association among men with high Gleason sum was observed for SNPs rs4148269 and rs3100. CONCLUSIONS: Although small sample size limits inference, we report novel associations between UGT2B15 and UGT2B17 variants and PC risk. These associations with PC risk in men with high Gleason sum, more frequently found in African American men, support the relevance of genetic differences in the androgen metabolism pathway, which could explain, in part, the high incidence of PC among African American men. Larger studies are required.
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Association between DNA damage response and repair genes and risk of invasive serous ovarian cancer.
Resumo:
BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.
Resumo:
During mitotic cell cycles, DNA experiences many types of endogenous and exogenous damaging agents that could potentially cause double strand breaks (DSB). In S. cerevisiae, DSBs are primarily repaired by mitotic recombination and as a result, could lead to loss-of-heterozygosity (LOH). Genetic recombination can happen in both meiosis and mitosis. While genome-wide distribution of meiotic recombination events has been intensively studied, mitotic recombination events have not been mapped unbiasedly throughout the genome until recently. Methods for selecting mitotic crossovers and mapping the positions of crossovers have recently been developed in our lab. Our current approach uses a diploid yeast strain that is heterozygous for about 55,000 SNPs, and employs SNP-Microarrays to map LOH events throughout the genome. These methods allow us to examine selected crossovers and unselected mitotic recombination events (crossover, noncrossover and BIR) at about 1 kb resolution across the genome. Using this method, we generated maps of spontaneous and UV-induced LOH events. In this study, we explore machine learning and variable selection techniques to build a predictive model for where the LOH events occur in the genome.
Randomly from the yeast genome, we simulated control tracts resembling the LOH tracts in terms of tract lengths and locations with respect to single-nucleotide-polymorphism positions. We then extracted roughly 1,100 features such as base compositions, histone modifications, presence of tandem repeats etc. and train classifiers to distinguish control tracts and LOH tracts. We found interesting features of good predictive values. We also found that with the current repertoire of features, the prediction is generally better for spontaneous LOH events than UV-induced LOH events.
Resumo:
BACKGROUND: Telomere-related genes play an important role in carcinogenesis and progression of prostate cancer (PCa). It is not fully understood whether genetic variations in telomere-related genes are associated with development and progression in PCa patients. METHODS: Six potentially functional single-nucleotide polymorphisms (SNPs) of three key telomere-related genes were evaluated in 1015 PCa cases and 1052 cancer-free controls, to test their associations with risk of PCa. Among 426 PCa patients who underwent radical prostatectomy (RP), the prognostic significance of the studied SNPs on biochemical recurrence (BCR) was also assessed using the Kaplan-Meier analysis and Cox proportional hazards regression model. The relative telomere lengths (RTLs) were measured in peripheral blood leukocytes using real-time PCR in the RP patients. RESULTS: TEP1 rs1760904 AG/AA genotypes were significantly associated with a decreased risk of PCa (odds ratio (OR): 0.77, 95% confidence interval (CI): 0.64-0.93, P=0.005) compared with the GG genotype. By using median RTL as a cutoff level, RP patients with TEP1 rs1760904 AG/AA genotypes tended to have a longer RTL than those with the GG genotype (OR: 1.55, 95% CI: 1.04-2.30, P=0.031). A significant interaction between TEP1 rs1713418 and age in modifying PCa risk was observed (P=0.005). After adjustment for clinicopathologic risk factors, the presence of heterozygotes or rare homozygotes of TEP1 rs1760904 and TNKS2 rs1539042 were associated with BCR in the RP cohorts (hazard ratio: 0.53, 95% CI: 0.36-0.79, P=0.002 and hazard ratio: 1.67, 95% CI: 1.07-2.48, P=0.017, respectively). CONCLUSIONS: These data suggest that genetic variations in the TEP1 gene may be biomarkers for risk of PCa and BCR after RP.
Resumo:
CD133 is one of the most common stem cell markers, and functional single nucleotide polymorphisms (SNPs) of CD133 may modulate its gene functions and thus cancer risk and patient survival. We hypothesized that potentially functional CD133 SNPs are associated with gastric cancer (GC) risk and survival. To test this hypothesis, we conducted a case-control study of 371 GC patients and 313 cancer-free controls frequency-matched by age, sex, and ethnicity. We genotyped four selected, potentially functional CD133 SNPs (rs2240688A>C, rs7686732C>G, rs10022537T>A, and rs3130C>T) and used logistic regression analysis for associations of these SNPs with GC risk and Cox hazards regression analysis for survival. We found that compared with the miRNA binding site rs2240688 AA genotype, AC + CC genotypes were associated with significantly increased GC risk (adjusted OR = 1.52, 95% CI = 1.09-2.13); for another miRNA binding site rs3130C>T SNP, the TT genotype was associated with significantly reduced GC risk (adjusted OR = 0.68, 95% CI = 0.48-0.97), compared with CC + CT genotypes. In all patients, the risk rs3130 TT variant genotype was significantly associated with overall survival (OS) (adjusted P(trend) = 0.016 and 0.007 under additive and recessive models, respectively). These findings suggest that these two CD133 miRNA binding site variants, rs2240688 and rs3130, may be potential biomarkers for genetic susceptibility to GC and possible predictors for survival in GC patients but require further validation by larger studies.
Resumo:
BACKGROUND: The Notch signaling pathway is constitutively activated in human cutaneous melanoma to promote growth and aggressive metastatic potential of primary melanoma cells. Therefore, genetic variants in Notch pathway genes may affect the prognosis of cutaneous melanoma patients. METHODS: We identified 6,256 SNPs in 48 Notch genes in 858 cutaneous melanoma patients included in a previously published cutaneous melanoma genome-wide association study dataset. Multivariate and stepwise Cox proportional hazards regression and false-positive report probability corrections were performed to evaluate associations between putative functional SNPs and cutaneous melanoma disease-specific survival. Receiver operating characteristic curve was constructed, and area under the curve was used to assess the classification performance of the model. RESULTS: Four putative functional SNPs of Notch pathway genes had independent and joint predictive roles in survival of cutaneous melanoma patients. The most significant variant was NCOR2 rs2342924 T>C (adjusted HR, 2.71; 95% confidence interval, 1.73-4.23; Ptrend = 9.62 × 10(-7)), followed by NCSTN rs1124379 G>A, NCOR2 rs10846684 G>A, and MAML2 rs7953425 G>A (Ptrend = 0.005, 0.005, and 0.013, respectively). The receiver operating characteristic analysis revealed that area under the curve was significantly increased after adding the combined unfavorable genotype score to the model containing the known clinicopathologic factors. CONCLUSIONS: Our results suggest that SNPs in Notch pathway genes may be predictors of cutaneous melanoma disease-specific survival. IMPACT: Our discovery offers a translational potential for using genetic variants in Notch pathway genes as a genotype score of biomarkers for developing an improved prognostic assessment and personalized management of cutaneous melanoma patients.
Resumo:
Genome-wide association studies (GWASs) have characterized 13 loci associated with melanoma, which only account for a small part of melanoma risk. To identify new genes with too small an effect to be detected individually but which collectively influence melanoma risk and/or show interactive effects, we used a two-step analysis strategy including pathway analysis of genome-wide SNP data, in a first step, and epistasis analysis within significant pathways, in a second step. Pathway analysis, using the gene-set enrichment analysis (GSEA) approach and the gene ontology (GO) database, was applied to the outcomes of MELARISK (3,976 subjects) and MDACC (2,827 subjects) GWASs. Cross-gene SNP-SNP interaction analysis within melanoma-associated GOs was performed using the INTERSNP software. Five GO categories were significantly enriched in genes associated with melanoma (false discovery rate ≤ 5% in both studies): response to light stimulus, regulation of mitotic cell cycle, induction of programmed cell death, cytokine activity and oxidative phosphorylation. Epistasis analysis, within each of the five significant GOs, showed significant evidence for interaction for one SNP pair at TERF1 and AFAP1L2 loci (pmeta-int = 2.0 × 10(-7) , which met both the pathway and overall multiple-testing corrected thresholds that are equal to 9.8 × 10(-7) and 2.0 × 10(-7) , respectively) and suggestive evidence for another pair involving correlated SNPs at the same loci (pmeta-int = 3.6 × 10(-6) ). This interaction has important biological relevance given the key role of TERF1 in telomere biology and the reported physical interaction between TERF1 and AFAP1L2 proteins. This finding brings a novel piece of evidence for the emerging role of telomere dysfunction into melanoma development.