28 resultados para Association mining
em Duke University
Resumo:
BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Resumo:
BACKGROUND: Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. RESULTS: We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. CONCLUSIONS: permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Resumo:
BACKGROUND: Several studies have noted that genetic variants of SCARB1, a lipoprotein receptor involved in reverse cholesterol transport, are associated with serum lipid levels in a sex-dependent fashion. However, the mechanism underlying this gene by sex interaction has not been explored. METHODS: We utilized both epidemiological and molecular methods to study how estrogen and gene variants interact to influence SCARB1 expression and lipid levels. Interaction between 35 SCARB1 haplotype-tagged polymorphisms and endogenous estradiol levels was assessed in 498 postmenopausal Caucasian women from the population-based Rancho Bernardo Study. We further examined associated variants with overall and SCARB1 splice variant (SR-BI and SR-BII) expression in 91 human liver tissues using quantitative real-time PCR. RESULTS: Several variants on a haplotype block spanning intron 11 to intron 12 of SCARB1 showed significant gene by estradiol interaction affecting serum lipid levels, the strongest for rs838895 with HDL-cholesterol (p=9.2x10(-4)) and triglycerides (p=1.3x10(-3)) and the triglyceride:HDL cholesterol ratio (p=2.7x10(-4)). These same variants were associated with expression of the SR-BI isoform in a sex-specific fashion, with the strongest association found among liver tissue from 52 young women<45 years old (p=0.002). CONCLUSIONS: Estrogen and SCARB1 genotype may act synergistically to regulate expression of SCARB1 isoforms and impact serum levels of HDL cholesterol and triglycerides. This work highlights the importance of considering sex-dependent effects of gene variants on serum lipid levels.
Resumo:
To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54x10(-10); odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36-1.82), and serine racemase (SRR) (P = 3.06x10(-9); OR = 1.28; 95% CI = 1.18-1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65x10(-10); OR = 1.29, 95% CI = 1.19-1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.
Resumo:
Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.
Resumo:
Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequilibrium (LD) patterns with other markers. We measured the effects of marginal disease effect size, haplotype LD, disease prevalence and minor allele frequency have on cross-locus interaction (epistasis). In the setting of strong allele effects and strong interaction, the correlation between the two disease genes was weak (r=0.2). In a complex system with multiple correlations (both marginal and interaction), it was difficult to determine the source of a significant result. Despite these complications, the partial least squares and modified LD contrast methods maintained adequate power to detect the epistatic effects; however, for many of the analyses we often could not separate interaction from a strong marginal effect. While we did not exhaust the entire parameter space of possible models, we do provide guidance on the effects that population parameters have on cross-locus interaction.
Resumo:
OBJECTIVE: To examine the associations between attention-deficit/hyperactivity disorder (ADHD) symptoms, obesity and hypertension in young adults in a large population-based cohort. DESIGN, SETTING AND PARTICIPANTS: The study population consisted of 15,197 respondents from the National Longitudinal Study of Adolescent Health, a nationally representative sample of adolescents followed from 1995 to 2009 in the United States. Multinomial logistic and logistic models examined the odds of overweight, obesity and hypertension in adulthood in relation to retrospectively reported ADHD symptoms. Latent curve modeling was used to assess the association between symptoms and naturally occurring changes in body mass index (BMI) from adolescence to adulthood. RESULTS: Linear association was identified between the number of inattentive (IN) and hyperactive/impulsive (HI) symptoms and waist circumference, BMI, diastolic blood pressure and systolic blood pressure (all P-values for trend <0.05). Controlling for demographic variables, physical activity, alcohol use, smoking and depressive symptoms, those with three or more HI or IN symptoms had the highest odds of obesity (HI 3+, odds ratio (OR)=1.50, 95% confidence interval (CI) = 1.22-2.83; IN 3+, OR = 1.21, 95% CI = 1.02-1.44) compared with those with no HI or IN symptoms. HI symptoms at the 3+ level were significantly associated with a higher OR of hypertension (HI 3+, OR = 1.24, 95% CI = 1.01-1.51; HI continuous, OR = 1.04, 95% CI = 1.00-1.09), but associations were nonsignificant when models were adjusted for BMI. Latent growth modeling results indicated that compared with those reporting no HI or IN symptoms, those reporting 3 or more symptoms had higher initial levels of BMI during adolescence. Only HI symptoms were associated with change in BMI. CONCLUSION: Self-reported ADHD symptoms were associated with adult BMI and change in BMI from adolescence to adulthood, providing further evidence of a link between ADHD symptoms and obesity.
Resumo:
BACKGROUND: Molecular tools may provide insight into cardiovascular risk. We assessed whether metabolites discriminate coronary artery disease (CAD) and predict risk of cardiovascular events. METHODS AND RESULTS: We performed mass-spectrometry-based profiling of 69 metabolites in subjects from the CATHGEN biorepository. To evaluate discriminative capabilities of metabolites for CAD, 2 groups were profiled: 174 CAD cases and 174 sex/race-matched controls ("initial"), and 140 CAD cases and 140 controls ("replication"). To evaluate the capability of metabolites to predict cardiovascular events, cases were combined ("event" group); of these, 74 experienced death/myocardial infarction during follow-up. A third independent group was profiled ("event-replication" group; n=63 cases with cardiovascular events, 66 controls). Analysis included principal-components analysis, linear regression, and Cox proportional hazards. Two principal components analysis-derived factors were associated with CAD: 1 comprising branched-chain amino acid metabolites (factor 4, initial P=0.002, replication P=0.01), and 1 comprising urea cycle metabolites (factor 9, initial P=0.0004, replication P=0.01). In multivariable regression, these factors were independently associated with CAD in initial (factor 4, odds ratio [OR], 1.36; 95% CI, 1.06 to 1.74; P=0.02; factor 9, OR, 0.67; 95% CI, 0.52 to 0.87; P=0.003) and replication (factor 4, OR, 1.43; 95% CI, 1.07 to 1.91; P=0.02; factor 9, OR, 0.66; 95% CI, 0.48 to 0.91; P=0.01) groups. A factor composed of dicarboxylacylcarnitines predicted death/myocardial infarction (event group hazard ratio 2.17; 95% CI, 1.23 to 3.84; P=0.007) and was associated with cardiovascular events in the event-replication group (OR, 1.52; 95% CI, 1.08 to 2.14; P=0.01). CONCLUSIONS: Metabolite profiles are associated with CAD and subsequent cardiovascular events.
Resumo:
Mountaintop mining (MTM) is the primary procedure for surface coal exploration within the central Appalachian region of the eastern United States, and it is known to contaminate streams in local watersheds. In this study, we measured the chemical and isotopic compositions of water samples from MTM-impacted tributaries and streams in the Mud River watershed in West Virginia. We systematically document the isotopic compositions of three major constituents: sulfur isotopes in sulfate (δ(34)SSO4), carbon isotopes in dissolved inorganic carbon (δ(13)CDIC), and strontium isotopes ((87)Sr/(86)Sr). The data show that δ(34)SSO4, δ(13)CDIC, Sr/Ca, and (87)Sr/(86)Sr measured in saline- and selenium-rich MTM impacted tributaries are distinguishable from those of the surface water upstream of mining impacts. These tracers can therefore be used to delineate and quantify the impact of MTM in watersheds. High Sr/Ca and low (87)Sr/(86)Sr characterize tributaries that originated from active MTM areas, while tributaries from reclaimed MTM areas had low Sr/Ca and high (87)Sr/(86)Sr. Leaching experiments of rocks from the watershed show that pyrite oxidation and carbonate dissolution control the solute chemistry with distinct (87)Sr/(86)Sr ratios characterizing different rock sources. We propose that MTM operations that access the deeper Kanawha Formation generate residual mined rocks in valley fills from which effluents with distinctive (87)Sr/(86)Sr and Sr/Ca imprints affect the quality of the Appalachian watersheds.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Association between DNA damage response and repair genes and risk of invasive serous ovarian cancer.
Resumo:
BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.
Resumo:
BACKGROUND: Ipsilateral hindfoot arthrodesis in combination with total ankle replacement (TAR) may diminish functional outcome and prosthesis survivorship compared to isolated TAR. We compared the outcome of isolated TAR to outcomes of TAR with ipsilateral hindfoot arthrodesis. METHODS: In a consecutive series of 404 primary TARs in 396 patients, 70 patients (17.3%) had a hindfoot fusion before, after, or at the time of TAR; the majority had either an isolated subtalar arthrodesis (n = 43, 62%) or triple arthrodesis (n = 15, 21%). The remaining 334 isolated TARs served as the control group. Mean patient follow-up was 3.2 years (range, 24-72 months). RESULTS: The SF-36 total, AOFAS Hindfoot-Ankle pain subscale, Foot and Ankle Disability Index, and Short Musculoskeletal Function Assessment scores were significantly improved from preoperative measures, with no significant differences between the hindfoot arthrodesis and control groups. The AOFAS Hindfoot-Ankle total, function, and alignment scores were significantly improved for both groups, albeit the control group demonstrated significantly higher scores in all 3 scales. Furthermore, the control group demonstrated a significantly greater improvement in VAS pain score compared to the hindfoot arthrodesis group. Walking speed, sit-to-stand time, and 4-square step test time were significantly improved for both groups at each postoperative time point; however, the hindfoot arthrodesis group completed these tests significantly slower than the control group. There was no significant difference in terms of talar component subsidence between the fusion (2.6 mm) and control groups (2.0 mm). The failure rate in the hindfoot fusion group (10.0%) was significantly higher than that in the control group (2.4%; p < 0.05). CONCLUSION: To our knowledge, this study represents the first series evaluating the clinical outcome of TARs performed with and without hindfoot fusion using implants available in the United States. At follow-up of 3.2 years, TAR performed with ipsilateral hindfoot arthrodesis resulted in significant improvements in pain and functional outcome; in contrast to prior studies, however, overall outcome was inferior to that of isolated TAR. LEVEL OF EVIDENCE: Level II, prospective comparative series.
Resumo:
Selenium (Se) is a micronutrient necessary for the function of a variety of important enzymes; Se also exhibits a narrow range in concentrations between essentiality and toxicity. Oviparous vertebrates such as birds and fish are especially sensitive to Se toxicity, which causes reproductive impairment and defects in embryo development. Selenium occurs naturally in the Earth's crust, but it can be mobilized by a variety of anthropogenic activities, including agricultural practices, coal burning, and mining.
Mountaintop removal/valley fill (MTR/VF) coal mining is a form of surface mining found throughout central Appalachia in the United States that involves blasting off the tops of mountains to access underlying coal seams. Spoil rock from the mountain is placed into adjacent valleys, forming valley fills, which bury stream headwaters and negatively impact surface water quality. This research focused on the biological impacts of Se leached from MTR/VF coal mining operations located around the Mud River, West Virginia.
In order to assess the status of Se in a lotic (flowing) system such as the Mud River, surface water, insects, and fish samples including creek chub (Semotilus atromaculatus) and green sunfish (Lepomis cyanellus) were collected from a mining impacted site as well as from a reference site not impacted by mining. Analysis of samples from the mined site showed increased conductivity and Se in the surface waters compared to the reference site in addition to increased concentrations of Se in insects and fish. Histological analysis of mined site fish gills showed a lack of normal parasites, suggesting parasite populations may be disrupted due to poor water quality. X-ray absorption near edge spectroscopy techniques were used to determine the speciation of Se in insect and creek chub samples. Insects contained approximately 40-50% inorganic Se (selenate and selenite) and 50-60% organic Se (Se-methionine and Se-cystine) while fish tissues contained lower proportions of inorganic Se than insects, instead having higher proportions of organic Se in the forms of methyl-Se-cysteine, Se-cystine, and Se-methionine.
Otoliths, calcified inner ear structures, were also collected from Mud River creek chubs and green sunfish and analyzed for Se content using laser ablation inductively couple mass spectrometry (LA-ICP-MS). Significant differences were found between the two species of fish, based on the concentrations of otolith Se. Green sunfish otoliths from all sites contained background or low concentrations of otolith Se (< 1 µg/g) that were not significantly different between mined and unmined sites. In contrast creek chub otoliths from the historically mined site contained much higher (≥ 5 µg/g, up to approximately 68 µg/g) concentrations of Se than for the same species in the unmined site or for the green sunfish. Otolith Se concentrations were related to muscle Se concentrations for creek chubs (R2 = 0.54, p = 0.0002 for the last 20% of the otolith Se versus muscle Se) while no relationship was observed for green sunfish.
Additional experiments using biofilms grown in the Mud River showed increased Se in mined site biofilms compared to the reference site. When we fed fathead minnows (Pimephales promelas) on these biofilms in the laboratory they accumulated higher concentrations of Se in liver and ovary tissues compared to fathead minnows fed on reference site biofilms. No differences in Se accumulation were found in muscle from either treatment group. Biofilms were also centrifuged and separated into filamentous green algae and the remaining diatom fraction. The majority of Se was found in the diatom fraction with only about 1/3rd of total biofilm Se concentration present in the filamentous green algae fraction
Finally, zebrafish (Danio rerio) embryos were exposed to aqueous Se in the form of selenate, selenite, and L-selenomethionine in an attempt to determine if oxidative stress plays a role in selenium embryo toxicity. Selenate and selenite exposure did not induce embryo deformities (lordosis and craniofacial malformation). L-selenomethionine, however, induced significantly higher deformity rates at 100 µg/L compared to controls. Antioxidant rescue of L-selenomethionime induced deformities was attempted in embryos using N-acetylcysteine (NAC). Pretreatment with NAC significantly reduced deformities in the zebrafish embryos secondarily treated with L-selenomethionine, suggesting that oxidative stress may play a role in Se toxicity. Selenite exposure also induced a 6.6-fold increase in glutathione-S-transferase pi class 2 gene expression, which is involved in xenobiotic transformation. No changes in gene expression were observed for selenate or L-selenomethionine-exposed embryos.
The findings in this dissertation contribute to the understanding of how Se bioaccumulates in a lotic system and is transferred through a simulated foodweb in addition to further exploring oxidative stress as a potential mechanism for Se-induced embryo toxicity. Future studies should continue to pursue the role of oxidative stress and other mechanisms in Se toxicity and the biotransformation of Se in aquatic ecosystems.