886 resultados para Association Studies
Resumo:
A linkage between obesity-related phenotypes and the 2p21-23 locus has been reported previously. The urocortin (UCN) gene resides at this interval, and its protein decreases appetite behavior, suggesting that UCN may be a candidate gene for susceptibility to obesity. We localized the UCN gene by radiation hybrid mapping, and the surrounding markers were genotyped in a collection of French families. Evidence for linkage was shown between the marker D2S165 and leptin levels (LOD score, 1.34; P = 0.006) and between D2S2247 and the z-score of body mass index (LOD score, 1.829; P = 0.0019). The gene was screened for SNPs in 96 obese patients. Four new variants were established. Two single nucleotide polymorphisms were located in the promoter (-535 A-->G, -286 G-->A), one in intron 1 (+31 C-->G), and one in the 3'-untranslated region (+34 C-->T). Association studies in cohorts of 722 unrelated obese and 381 control subjects and transmission disequilibrium tests, performed for the two frequent promoter polymorphisms, in 120 families (894 individuals) showed that no association was present between these variants and obesity, obesity-related phenotypes, and diabetes. Thus, our analyses of the genetic variations of the UCN gene suggest that, at least in French Caucasians, they do not represent a major cause of obesity.
Resumo:
Genome-wide association studies (GWAS) are conducted with the promise to discover novel genetic variants associated with diverse traits. For most traits, associated markers individually explain just a modest fraction of the phenotypic variation, but their number can well be in the hundreds. We developed a maximum likelihood method that allows us to infer the distribution of associated variants even when many of them were missed by chance. Compared to previous approaches, the novelty of our method is that it (a) does not require having an independent (unbiased) estimate of the effect sizes; (b) makes use of the complete distribution of P-values while allowing for the false discovery rate; (c) takes into account allelic heterogeneity and the SNP pruning strategy. We applied our method to the latest GWAS meta-analysis results of the GIANT consortium. It revealed that while the explained variance of genome-wide (GW) significant SNPs is around 1% for waist-hip ratio (WHR), the observed P-values provide evidence for the existence of variants explaining 10% (CI=[8.5-11.5%]) of the phenotypic variance in total. Similarly, the total explained variance likely to exist for height is estimated to be 29% (CI=[28-30%]), three times higher than what the observed GW significant SNPs give rise to. This methodology also enables us to predict the benefit of future GWA studies that aim to reveal more associated genetic markers via increased sample size.
Resumo:
Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.
Resumo:
Introduction. Genetic epidemiology is focused on the study of the genetic causes that determine health and diseases in populations. To achieve this goal a common strategy is to explore differences in genetic variability between diseased and nondiseased individuals. Usual markers of genetic variability are single nucleotide polymorphisms (SNPs) which are changes in just one base in the genome. The usual statistical approach in genetic epidemiology study is a marginal analysis, where each SNP is analyzed separately for association with the phenotype. Motivation. It has been observed, that for common diseases the single-SNP analysis is not very powerful for detecting genetic causing variants. In this work, we consider Gene Set Analysis (GSA) as an alternative to standard marginal association approaches. GSA aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. Objective. We present a new optimized implementation of a pair of gene set analysis methodologies for analyze the individual evidence of SNPs in biological pathways. We perform a simulation study for exploring the power of the proposed methodologies in a set of scenarios with different number of causal SNPs under different effect sizes. In addition, we compare the results with the usual single-SNP analysis method. Moreover, we show the advantage of using the proposed gene set approaches in the context of an Alzheimer disease case-control study where we explore the Reelin signal pathway.
Resumo:
Background: Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. Objective: To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. Design, Setting, and Participants: A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArrayH NT Cycler. Outcome Measurements and Statistical Analysis: Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. Results and Limitations: We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. Conclusion: Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.
Resumo:
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.
Resumo:
Elevated concentrations of albumin in the urine, albuminuria, are a hallmark of diabetic kidney disease and are associated with an increased risk for end-stage renal disease and cardiovascular events. To gain insight into the pathophysiological mechanisms underlying albuminuria, we conducted meta-analyses of genome-wide association studies and independent replication in up to 5,825 individuals of European ancestry with diabetes and up to 46,061 without diabetes, followed by functional studies. Known associations of variants in CUBN, encoding cubilin, with the urinary albumin-to-creatinine ratio (UACR) were confirmed in the overall sample (P = 2.4 × 10(-10)). Gene-by-diabetes interactions were detected and confirmed for variants in HS6ST1 and near RAB38/CTSC. Single nucleotide polymorphisms at these loci demonstrated a genetic effect on UACR in individuals with but not without diabetes. The change in the average UACR per minor allele was 21% for HS6ST1 (P = 6.3 × 10(-7)) and 13% for RAB38/CTSC (P = 5.8 × 10(-7)). Experiments using streptozotocin-induced diabetic Rab38 knockout and control rats showed higher urinary albumin concentrations and reduced amounts of megalin and cubilin at the proximal tubule cell surface in Rab38 knockout versus control rats. Relative expression of RAB38 was higher in tubuli of patients with diabetic kidney disease compared with control subjects. The loci identified here confirm known pathways and highlight novel pathways influencing albuminuria.
Resumo:
Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angiotensinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.
Resumo:
High levels of von Willebrand factor (vWF) have been associated with cardiovascular disease. The A allele of the -1185A/G polymorphism in the 5'-regulatory region of the vWF gene was associated with the highest plasma vWF levels in a normal population. To examine the association between -1185A/G polymorphism and coronary artery disease (CAD), 173 Brazilian Caucasian subjects submitted to coronary angiography were studied. Of these, 57 (33%) had normal coronary arteries (control group) and 116 (67%) had CAD (patient group). Plasma vWF levels were higher in patients (145 U/dl) than in controls (130 U/dl), but the differences were significant only for O blood group subjects. Polymerase chain reaction amplification of the 864-bp vWF promoter region followed by AccII restriction digestion was used to identify the -1185A/G genotypes. The -1185A allele frequency was 43.1% in patients and 44.7% in controls. Allele and genotype frequencies were not significantly different between patients and controls. No association was observed between the -1185A/G genotypes and plasma vWF levels in patients or controls. These results suggest that -1185A/G polymorphism is not an independent risk factor for CAD.
Resumo:
Individual circadian clocks entrain differently to environmental cycles (zeitgebers, e.g., light and darkness), earlier or later within the day, leading to different chronotypes. In human populations, the distribution of chronotypes forms a bell-shaped curve, with the extreme early and late types _ larks and owls, respectively _ at its ends. Human chronotype, which can be assessed by the timing of an individual's sleep-wake cycle, is partly influenced by genetic factors - known from animal experimentation. Here, we review population genetic studies which have used a questionnaire probing individual daily timing preference for associations with polymorphisms in clock genes. We discuss their inherent limitations and suggest an alternative approach combining a short questionnaire (Munich ChronoType Questionnaire, MCTQ), which assesses chronotype in a quantitative manner, with a genome-wide analysis (GWA). The advantages of these methods in comparison to assessing time-of-day preferences and single nucleotide polymorphism genotyping are discussed. In the future, global studies of chronotype using the MCTQ and GWA may also contribute to understanding the influence of seasons, latitude (e.g., different photoperiods), and climate on allele frequencies and chronotype distribution in different populations.
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
Pharmacovigilance, the monitoring of adverse events (AEs), is an integral part in the clinical evaluation of a new drug. Until recently, attempts to relate the incidence of AEs to putative causes have been restricted to the evaluation of simple demographic and environmental factors. The advent of large-scale genotyping, however, provides an opportunity to look for associations between AEs and genetic markers, such as single nucleotides polymorphisms (SNPs). It is envisaged that a very large number of SNPs, possibly over 500 000, will be used in pharmacovigilance in an attempt to identify any genetic difference between patients who have experienced an AE and those who have not. We propose a sequential genome-wide association test for analysing AEs as they arise, allowing evidence-based decision-making at the earliest opportunity. This gives us the capability of quickly establishing whether there is a group of patients at high-risk of an AE based upon their DNA. Our method provides a valid test which takes account of linkage disequilibrium and allows for the sequential nature of the procedure. The method is more powerful than using a correction, such as idák, that assumes that the tests are independent. Copyright © 2006 John Wiley & Sons, Ltd.
Resumo:
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
Resumo:
Genome-wide association studies (GWAS) have been widely used in genetic dissection of complex traits. However, common methods are all based on a fixed-SNP-effect mixed linear model (MLM) and single marker analysis, such as efficient mixed model analysis (EMMA). These methods require Bonferroni correction for multiple tests, which often is too conservative when the number of markers is extremely large. To address this concern, we proposed a random-SNP-effect MLM (RMLM) and a multi-locus RMLM (MRMLM) for GWAS. The RMLM simply treats the SNP-effect as random, but it allows a modified Bonferroni correction to be used to calculate the threshold p value for significance tests. The MRMLM is a multi-locus model including markers selected from the RMLM method with a less stringent selection criterion. Due to the multi-locus nature, no multiple test correction is needed. Simulation studies show that the MRMLM is more powerful in QTN detection and more accurate in QTN effect estimation than the RMLM, which in turn is more powerful and accurate than the EMMA. To demonstrate the new methods, we analyzed six flowering time related traits in Arabidopsis thaliana and detected more genes than previous reported using the EMMA. Therefore, the MRMLM provides an alternative for multi-locus GWAS.