886 resultados para Association Studies
Resumo:
Elevated concentrations of albumin in the urine, albuminuria, are a hallmark of diabetic kidney disease and associate with increased risk for end-stage renal disease and cardiovascular events. To gain insight into the pathophysiological mechanisms underlying albuminuria, we conducted meta-analyses of genome-wide association studies and independent replication in up to 5,825 individuals of European ancestry with diabetes mellitus and up to 46,061 without diabetes, followed by functional studies. Known associations of variants in CUBN, encoding cubilin, with the urinary albumin-to-creatinine ratio (UACR) were confirmed in the overall sample (p=2.4*10(-10)). Gene-by-diabetes interactions were detected and confirmed for variants in HS6ST1 and near RAB38/CTSC. SNPs at these loci demonstrated a genetic effect on UACR in individuals with but not without diabetes. The change in average UACR per minor allele was 21% for HS6ST1 and 13% for RAB38/CTSC (p=6.3*10(-7) and 5.8*10(-7), respectively). Experiments using streptozotocin-treated diabetic Rab38 knockout and control rats showed higher urinary albumin concentrations and reduced amounts of megalin and cubilin at the proximal tubule cell surface in Rab38 knockout vs. control rats. Relative expression of RAB38 was higher in tubuli of patients with diabetic kidney disease compared to controls. The loci identified here confirm known and highlight novel pathways influencing albuminuria.
Resumo:
With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
T he aim of this study was to determine whether identity-by-descent (IBD) information for affected sib pairs (ASPs) can be used to select a sample of cases for a genetic case-control study which will provide more power for detecting association with loci in a known linkage region. By modeling the expected frequency of the disease allele in ASPs showing IBD sharing of 0, 1, or 2 alleles, and considering additive, recessive, and dominant disease models, we show that cases selected from IBD 2 families are best for this purpose, followed by those selected from IBD 1 families; least useful are cases selected from IBD 0 families.
Resumo:
Background: Esophageal adenocarcinoma (EA) is one of the fastest rising cancers in western countries. Barrett’s Esophagus (BE) is the premalignant precursor of EA. However, only a subset of BE patients develop EA, which complicates the clinical management in the absence of valid predictors. Genetic risk factors for BE and EA are incompletely understood. This study aimed to identify novel genetic risk factors for BE and EA.Methods: Within an international consortium of groups involved in the genetics of BE/EA, we performed the first meta-analysis of all genome-wide association studies (GWAS) available, involving 6,167 BE patients, 4,112 EA patients, and 17,159 representative controls, all of European ancestry, genotyped on Illumina high-density SNP-arrays, collected from four separate studies within North America, Europe, and Australia. Meta-analysis was conducted using the fixed-effects inverse variance-weighting approach. We used the standard genome-wide significant threshold of 5×10-8 for this study. We also conducted an association analysis following reweighting of loci using an approach that investigates annotation enrichment among the genome-wide significant loci. The entire GWAS-data set was also analyzed using bioinformatics approaches including functional annotation databases as well as gene-based and pathway-based methods in order to identify pathophysiologically relevant cellular pathways.Findings: We identified eight new associated risk loci for BE and EA, within or near the CFTR (rs17451754, P=4·8×10-10), MSRA (rs17749155, P=5·2×10-10), BLK (rs10108511, P=2·1×10-9), KHDRBS2 (rs62423175, P=3·0×10-9), TPPP/CEP72 (rs9918259, P=3·2×10-9), TMOD1 (rs7852462, P=1·5×10-8), SATB2 (rs139606545, P=2·0×10-8), and HTR3C/ABCC5 genes (rs9823696, P=1·6×10-8). A further novel risk locus at LPA (rs12207195, posteriori probability=0·925) was identified after re-weighting using significantly enriched annotations. This study thereby doubled the number of known risk loci. The strongest disease pathways identified (P<10-6) belong to muscle cell differentiation and to mesenchyme development/differentiation, which fit with current pathophysiological BE/EA concepts. To our knowledge, this study identified for the first time an EA-specific association (rs9823696, P=1·6×10-8) near HTR3C/ABCC5 which is independent of BE development (P=0·45).Interpretation: The identified disease loci and pathways reveal new insights into the etiology of BE and EA. Furthermore, the EA-specific association at HTR3C/ABCC5 may constitute a novel genetic marker for the prediction of transition from BE to EA. Mutations in CFTR, one of the new risk loci identified in this study, cause cystic fibrosis (CF), the most common recessive disorder in Europeans. Gastroesophageal reflux (GER) belongs to the phenotypic CF-spectrum and represents the main risk factor for BE/EA. Thus, the CFTR locus may trigger a common GER-mediated pathophysiology.
Resumo:
1. Genomewide association studies (GWAS) enable detailed dissections of the genetic basis for organisms' ability to adapt to a changing environment. In long-term studies of natural populations, individuals are often marked at one point in their life and then repeatedly recaptured. It is therefore essential that a method for GWAS includes the process of repeated sampling. In a GWAS, the effects of thousands of single-nucleotide polymorphisms (SNPs) need to be fitted and any model development is constrained by the computational requirements. A method is therefore required that can fit a highly hierarchical model and at the same time is computationally fast enough to be useful. 2. Our method fits fixed SNP effects in a linear mixed model that can include both random polygenic effects and permanent environmental effects. In this way, the model can correct for population structure and model repeated measures. The covariance structure of the linear mixed model is first estimated and subsequently used in a generalized least squares setting to fit the SNP effects. The method was evaluated in a simulation study based on observed genotypes from a long-term study of collared flycatchers in Sweden. 3. The method we present here was successful in estimating permanent environmental effects from simulated repeated measures data. Additionally, we found that especially for variable phenotypes having large variation between years, the repeated measurements model has a substantial increase in power compared to a model using average phenotypes as a response. 4. The method is available in the R package RepeatABEL. It increases the power in GWAS having repeated measures, especially for long-term studies of natural populations, and the R implementation is expected to facilitate modelling of longitudinal data for studies of both animal and human populations.
Resumo:
Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Resumo:
Phagocytosis of bacteria by specialized blood cells, known as hemocytes, is a vital component of Drosophila cellular immunity. To identify novel genes that mediate the cellular response to bacteria, we conducted three separate genetic screens using the Drosophila Genetic Reference Panel (DGRP). Adult DGRP lines were tested for the ability of their hemocytes to phagocytose the Gram-positive bacteria Staphylococcus aureus or the Gram-negative bacteria Escherichia coli. The DGRP lines were also screened for the ability of their hemocytes to clear S. aureus infection through the process of phagosome maturation. Genome-wide association analyses were performed to identify potentially relevant single nucleotide polymorphisms (SNPs) associated with the cellular immune phenotypes. The S. aureus phagosome maturation screen identified SNPs near or in 528 candidate genes, many of which have no known role in immunity. Three genes, dpr10, fred, and CG42673, were identified whose loss-of-function in blood cells significantly impaired the innate immune response to S. aureus. The DGRP S. aureus screens identified variants in the gene, Ataxin 2 Binding Protein-1 (A2bp1) as important for the cellular immune response to S. aureus. A2bp1 belongs to the highly conserved Fox-1 family of RNA-binding proteins. Genetic studies revealed that A2bp1 transcript levels must be tightly controlled for hemocytes to successfully phagocytose S. aureus. The transcriptome of infected and uninfected hemocytes from wild type and A2bp1 mutant flies was analyzed and it was found that A2bp1 negatively regulates the expression of the Immunoglobulin-superfamily member Down syndrome adhesion molecule 4 (Dscam4). Silencing of A2bp1 and Dscam4 in hemocytes rescues the fly’s immune response to S. aureus indicating that Dscam4 negatively regulates S. aureus phagocytosis. Overall, we present an examination of the cellular immune response to bacteria with the aim of identifying and characterizing roles for novel mediators of innate immunity in Drosophila. By screening panel of lines in which all genetic variants are known, we successfully identified a large set of candidate genes that could provide a basis for future studies of Drosophila cellular immunity. Finally, we describe a novel, immune-specific role for the highly conserved Fox-1 family member, A2bp1.
Resumo:
The high quality of protected designation of origin (PDO) dry-cured pork products depends largely on the chemical and physical parameters of the fresh meat and their variation during the production process of the final product. The discovery of the mechanisms that regulate the variability of these parameters was aided by the reference genome of swine adjuvant to genetic analysis methods. This thesis can contribute to the discovery of genetic mechanisms that regulate the variability of some quality parameters of fresh meat for PDO dry-cured pork production. The first study is of gene expression and showed that between low and high glycolytic potential (GP) samples of Semimembranosus muscle of Italian Large White (ILW) pigs in early postmortem, the differentially expressed genes were all but one over expressed in low GP. These were involved in ATP biosynthesis processes, calcium homeostasis, and lipid metabolism including the potential master regulator gene Peroxisome Proliferator-Activated Receptor Alpha (PPARA). The second is a study in commercial hybrid pigs to evaluate correlations between carcass and fresh ham traits, including carcass and fresh ham lean meat percentages, the former, a potential predictor of the latter. In addition, a genome-wide association study allowed the identification of chromosome-wide associations with phenotypic traits for 19 SNPs, and genome-wide associations for 14 SNPs for ferrochelatase activity. The latter could be a determinant for color variation in nitrite-free dry-cured ham. The third study showed gene expression differences in the Longissimus thoracis muscle of ILW pigs by feeding diets with extruded linseed (source of polyunsaturated fatty acids) and vitamin E and selenium (diet three) or natural (diet four) antioxidants. The diet three promoted a more rapid and massive immune system response possibly determined by improvement in muscle tissue function, while the diet four promoted oxidative stability and increased the anti-inflammatory potential of muscle tissue.
Resumo:
The domestication and selection processes in pigs and rabbits have resulted in the constitution of multiple breeds with broad phenotypic diversity. Population genomics analysis and Genome-wide association study analysis can be utilized to gain insights into the ancestral origins, genetic diversity, and the presence of lethal mutations across these diverse breeds. In this thesis, we analysed the dataset obtained from three Italian Pig breeds to detect deleterious alleles. We screened the dataset for genetic markers showing homozygous deficiency using two approaches single marker and haplotype-based approach. Moreover, Genome-wide association study analyses were performed to detect genetic markers associated with pigs' reproductive traits. In rabbits, we investigated the application of SNP bead chip for detection signatures of selection in rabbits using different methods. This analysis was implemented for the first time in different fancy and meet rabbit breeds. Multiple approaches were utilized for the detection of the selection of signatures including Fst analysis, ROH analysis, PCAdapt analysis, and haplotype-based analysis. The analysis in pigs was able to identify five putative deleterious SNPs and nine putative deleterious haplotypes in the analysed Italian Pig breeds. The genomic regions of the detected putative deleterious genomic markers harboring loss of function variants such as the Frameshift variant, start lost, and splice donor variant. Those variants are close to important candidate genes such as IGF2BP1, ADGRL4, and HGF. In rabbits, multiple genomic regions were detected to be under selection of signature. These genomic regions harbor candidate genes associated with coat color phenotype (MC1R, TYR, and ASIP), hair structure (LIPH), and body size (HMGA2 and COL2A1). The described results in rabbits and pigs could be used to improve breeding programs by excluding the deleterious genetic markers carriers and incorporating candidate genes for coat color, body size, and meat production in rabbit breeding programs to enhance desired traits
Resumo:
Association studies between ADIPOR1 genetic variants and predisposition to type 2 diabetes (DM2) have provided contradictory results. We determined if two single nucleotide polymorphisms (SNP c.-8503G>A and SNP c.10225C>G) in regulatory regions of ADIPOR1 in 567 Brazilian individuals of European (EA; N = 443) or African (AfA; N = 124) ancestry from rural (quilombo remnants; N = 439) and urban (N = 567) areas. We detected a significant effect of ethnicity on the distribution of the allelic frequencies of both SNPs in these populations (EA: -8503A = 0.27; AfA: -8503A = 0.16; P = 0.001 and EA: 10225G = 0.35; AfA: 10225G = 0.51; P < 0.001). Neither of the polymorphisms were associated with DM2 in the case-control study in EA (SNP c.-8503G>A: DM2 group -8503A = 0.26; control group -8503A = 0.30; P = 0.14/SNP 10225C>G: DM2 group 10225G = 0.37; control group 10225G = 0.32; P = 0.40) and AfA populations (SNP c.-8503G>A: DM2 group -8503A = 0.16; control group -8503A = 0.15; P = 0.34/SNP 10225C>G: DM2 group 10225G = 0.51; control group 10225G = 0.52; P = 0.50). Similarly, none of the polymorphisms were associated with metabolic/anthropometric risk factors for DM2 in any of the three populations, except for HDL cholesterol, which was significantly higher in AfA heterozygotes (GC = 53.75 ± 17.26 mg/dL) than in homozygotes. We conclude that ADIPOR1 polymorphisms are unlikely to be major risk factors for DM2 or for metabolic/anthropometric measurements that represent risk factors for DM2 in populations of European and African ancestries.
Resumo:
Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Resumo:
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation involved in susceptibility to etiologically complex disease. We conducted a GWAS to identify common genetic variation involved in susceptibility to upper aero-digestive tract (UADT) cancers. Genome-wide genotyping was carried out using the Illumina HumanHap300 beadchips in 2,091 UADT cancer cases and 3,513 controls from two large European multi-centre UADT cancer studies, as well as 4,821 generic controls. The 19 top-ranked variants were investigated further in an additional 6,514 UADT cancer cases and 7,892 controls of European descent from an additional 13 UADT cancer studies participating in the INHANCE consortium. Five common variants presented evidence for significant association in the combined analysis (p <= 5 x 10(-7)). Two novel variants were identified, a 4q21 variant (rs1494961, p = 1 x 10(-8)) located near DNA repair related genes HEL308 and FAM175A (or Abraxas) and a 12q24 variant (rs4767364, p = 2 x 10(-8)) located in an extended linkage disequilibrium region that contains multiple genes including the aldehyde dehydrogenase 2 (ALDH2) gene. Three remaining variants are located in the ADH gene cluster and were identified previously in a candidate gene study involving some of these samples. The association between these three variants and UADT cancers was independently replicated in 5,092 UADT cancer cases and 6,794 controls non-overlapping samples presented here (rs1573496-ADH7, p = 5 x 10(-8); rs1229984-ADH1B, p = 7 x 10(-9); and rs698-ADH1C, p = 0.02). These results implicate two variants at 4q21 and 12q24 and further highlight three ADH variants in UADT cancer susceptibility.