977 resultados para MULTILOCUS GENOTYPE DATA
Resumo:
Molluscs are a diverse animal phylum with a formidable fossil record. Although there is little doubt about the monophyly of the eight extant classes, relationships between these groups are controversial.We analysed a comprehensive multilocus molecular data set for molluscs, the first to include multiple species from all classes, including five monoplacophorans in both extant families. Our analyses of fivemarkers resolve two major clades: the first includes gastropods and bivalves sister to Serialia (monoplacophorans and chitons), and the second comprises scaphopods sister to aplacophorans and cephalopods. Traditional groupings such as Testaria, Aculifera, and Conchifera are rejected by our data with significant Approximately Unbiased (AU) test values. A new molecular clock indicates that molluscs had a terminal Precambrian origin with rapid divergence of all eight extant classes in the Cambrian. Therecovery of Serialia as a derived, Late Cambrian clade is potentially in line with the stratigraphic chronology of morphologically heterogeneous early mollusc fossils. Serialia is in conflict with traditional molluscan classifications and recent phylogenomic data. Yet our hypothesis, as others from molecular data, implies frequent molluscan shell and body transformations by heterochronic shifts in development and multiple convergent adaptations, leading to the variable shells and body plans in extant lineages.
Resumo:
Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs () across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) and 38% (SE = 4%) of from genotyped SNPs (1.6× enrichment, p = 1.0 × 10−4). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.
Resumo:
Background: The nature and underlying mechanisms of an inverse association between adult height and the risk of coronary artery disease (CAD) are unclear.
Methods: We used a genetic approach to investigate the association between height and CAD, using 180 height-associated genetic variants. We tested the association between a change in genetically determined height of 1 SD (6.5 cm) with the risk of CAD in 65,066 cases and 128,383 controls. Using individual-level genotype data from 18,249 persons, we also examined the risk of CAD associated with the presence of various numbers of height-associated alleles. To identify putative mechanisms, we analyzed whether genetically determined height was associated with known cardiovascular risk factors and performed a pathway analysis of the height-associated genes.
Results: We observed a relative increase of 13.5% (95% confidence interval [CI], 5.4 to 22.1; P<0.001) in the risk of CAD per 1-SD decrease in genetically determined height. There was a graded relationship between the presence of an increased number of height-raising variants and a reduced risk of CAD (odds ratio for height quartile 4 versus quartile 1, 0.74; 95% CI, 0.68 to 0.84; P<0.001). Of the 12 risk factors that we studied, we observed significant associations only with levels of low-density lipoprotein cholesterol and triglycerides (accounting for approximately 30% of the association). We identified several overlapping pathways involving genes associated with both development and atherosclerosis.
Conclusions: There is a primary association between a genetically determined shorter height and an increased risk of CAD, a link that is partly explained by the association between shorter height and an adverse lipid profile. Shared biologic processes that determine achieved height and the development of atherosclerosis may explain some of the association. (Funded by the British Heart Foundation and others.)
Resumo:
The predominantly selfing slug species Arion (Carinarion) fasciatus, A. (C.) silvaticus and A. (C.) circumscriptus are native in Europe and have been introduced into North America, where each species consists of a single, homozygous multilocus genotype (strain), as defined by starch gel electrophoresis (SGE) of allozymes. In Europe, the “one strain per species” hypothesis does not hold since polyacrylamide gel electrophoresis (PAGE) of allozymes uncovered 46 strains divided over the three species. However, electrophoretic techniques may differ in their ability to detect allozyme variation. Therefore, several Carinarion populations from both continents were screened by applying the two techniques simultaneously on the same individual slugs and enzyme loci. SGE and PAGE yielded exactly the same results, so that the different degree of variation in North American and European populations cannot be attributed to differences in resolving power between SGE and PAGE. We found four A. (C.) silvaticus strains in North America indicating that in this region the “one strain per species” hypothesis also cannot be maintained. Hence, the discrepancies between previous electrophoretic studies on Carinarion are most likely due to sampling artefacts and possible founder effects.
Resumo:
Developments in high-throughput genotyping provide an opportunity to explore the application of marker technology in distinctness, uniformity and stability (DUS) testing of new varieties. We have used a large set of molecular markers to assess the feasibility of a UPOV Model 2 approach: “Calibration of threshold levels for molecular characteristics against the minimum distance in traditional characteristics”. We have examined 431 winter and spring barley varieties, with data from UK DUS trials comprising 28 characteristics, together with genotype data from 3072 SNP markers. Inter varietal distances were calculated and we found higher correlations between molecular and morphological distances than have been previously reported. When varieties were grouped by kinship, phenotypic and genotypic distances of these groups correlated well. We estimated the minimum marker numbers required and showed there was a ceiling after which the correlations do not improve. To investigate the possibility of breaking through this ceiling, we attempted genomic prediction of phenotypes from genotypes and higher correlations were achieved. We tested distinctness decisions made using either morphological or genotypic distances and found poor correspondence between each method.
Resumo:
To investigate morphological and genomic differences between cutting and racing lines of Quarter Horses, 120 racing and 68 cutting animals of both sexes, registered at the Brazilian Association of Quarter Horse Breeders, were used. Blood samples were collected, and the following physical traits were measured: weight; height at withers; body length; length of the shank, pastern, rump, head, and neck; and chest, shank, and hoof circumference. For analysis of genomic differences, 54,602 single-nucleotide polymorphisms (SNPs) were genotyped using the Equine SNP50 BeadChip, and the quality of individual and SNP genotype data were evaluated. The fixation index, FST, was used to identify genome regions that were altered in the lines by selection. The results showed significant differences between the lines in all physical traits. Quality control led to the exclusion of four cutting animals with a call rate of <0.95. After filtering, 12,544, 13,815, and 13,370 SNPs were excluded for the whole population (n = 184), the 120 racing animals, and the 64 cutting animals, respectively. The number of informative polymorphisms detected in each line and in the whole population indicated that the Equine SNP50 BeadChip can be used in genetic studies of Quarter Horses. The fixation index, FST, identified 2,558 genome regions that may have been modified by divergent selection. © 2013 Elsevier Inc.
Resumo:
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.
Resumo:
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions.
Resumo:
The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.
Resumo:
The Renin-Angiotensin system (RAS) regulates blood pressure through its effects on vascular tone, renal hemodynamics, and renal sodium and fluid balance. The genes encoding the four major components of the RAS, angiotensinogen, renin, angiotensin I-converting enzyme (ACE), and angiotensin II receptor type 1 (AT1), have been investigated as candidate genes in the pathogenesis of essential hypertension. However, studies have primarily focused on small samples of diseased individuals, and, therefore, have provided little information about the determinants of interindividual variation in blood pressure (BP) in the general population.^ Using data from a large population-based sample from Rochester, MN, I have evaluated the contribution of variation in the region of the RAS genes to interindividual variation in systolic, diastolic, and mean arterial pressure in the population-at-large. Marker genotype data from four polymorphisms located within or very near these genes were first collected on 3,974 individuals from 583 randomly ascertained three-generation pedigrees. Haseman-Elston regression and variance component methods of linkage analysis were then carried out to estimate the proportion of interindividual variance in BP attributable to the effects of variation at these four measured loci.^ A significant effect of the ACE locus on interindividual variation in mean arterial pressure (MAP) was detected in a sample of siblings belonging to the youngest generation. After allowing for measured covariates, this effect accounted for 15-25% of the interindividual variance in MAP, and was even greater in a subset with a positive family history of hypertension. When gender-specific analyses were carried out, this effect was significant in males but not in females. Extended pedigree analyses also provided evidence for an effect of the ACE locus on interindividual variation in MAP, but no difference between males and females was observed. Circumstantial evidence suggests that the ACE gene itself may be responsible for the observed effects on BP, although the possibility that other genes in the region may be at play cannot be excluded.^ No definitive evidence for an effect of the renin, angiotensinogen, or AT1 loci on interindividual variation in BP was obtained in this study, suggesting that the impact of these genes on BP may not be great in the Caucasian population-at-large. However, this does not preclude a larger effect of these genes in some subsets of individuals, especially among those with clinically manifest hypertension or coronary heart disease, or in other populations. ^
Resumo:
The recent development of a goat SNP genotyping microarray enables genome-wide association studies in this important livestock species. We investigated the genetic basis of the black and brown coat colour in Valais Blacknecked and Coppernecked goats. A genome-wide association analysis using goat SNP50 BeadChip genotypes of 22 cases and 23 controls allowed us to map the locus for the brown coat colour to goat chromosome 8. The TYRP1 gene is located within the associated chromosomal region, and TYRP1 variants cause similar coat colour phenotypes in different species. We thus considered TYRP1 as a strong positional and functional candidate. We resequenced the caprine TYRP1 gene by Sanger and Illumina sequencing and identified two non-synonymous variants, p.Ile478Thr and p.Gly496Asp, that might have a functional impact on the TYRP1 protein. However, based on the obtained pedigree and genotype data, the brown coat colour in these goats is not due to a single recessive loss-of-function allele. Surprisingly, the genotype distribution and the pedigree data suggest that the (496) Asp allele might possibly act in a dominant manner. The (496) Asp allele was present in 77 of 81 investigated Coppernecked goats and did not occur in black goats. This strongly suggests heterogeneity underlying the brown coat colour in Coppernecked goats. Functional experiments or targeted matings will be required to verify the unexpected preliminary findings.
Resumo:
The MFG test is a family-based association test that detects genetic effects contributing to disease in offspring, including offspring allelic effects, maternal allelic effects and MFG incompatibility effects. Like many other family-based association tests, it assumes that the offspring survival and the offspring-parent genotypes are conditionally independent provided the offspring is affected. However, when the putative disease-increasing locus can affect another competing phenotype, for example, offspring viability, the conditional independence assumption fails and these tests could lead to incorrect conclusions regarding the role of the gene in disease. We propose the v-MFG test to adjust for the genetic effects on one phenotype, e.g., viability, when testing the effects of that locus on another phenotype, e.g., disease. Using genotype data from nuclear families containing parents and at least one affected offspring, the v-MFG test models the distribution of family genotypes conditional on offspring phenotypes. It simultaneously estimates genetic effects on two phenotypes, viability and disease. Simulations show that the v-MFG test produces accurate genetic effect estimates on disease as well as on viability under several different scenarios. It generates accurate type-I error rates and provides adequate power with moderate sample sizes to detect genetic effects on disease risk when viability is reduced. We demonstrate the v-MFG test with HLA-DRB1 data from study participants with rheumatoid arthritis (RA) and their parents, we show that the v-MFG test successfully detects an MFG incompatibility effect on RA while simultaneously adjusting for a possible viability loss.
Resumo:
This article is protected by copyright. All rights reserved. Acknowledgements This study was funded by a BBSRC studentship (MAW) and NERC grants NE/H00775X/1 and NE/D000602/1 (SBP). The authors are grateful to Mario Röder and Keliya Bai for fieldwork assistance, and all estate owners, factors and keepers for access to field sites, most particularly MJ Taylor and Mike Nisbet (Airlie), Neil Brown (Allargue), RR Gledson and David Scrimgeour (Delnadamph), Andrew Salvesen and John Hay (Dinnet), Stuart Young and Derek Calder (Edinglassie), Kirsty Donald and David Busfield (Glen Dye), Neil Hogbin and Ab Taylor (Glen Muick), Alistair Mitchell (Glenlivet), Simon Blackett, Jim Davidson and Liam Donald (Invercauld), Richard Cooke and Fred Taylor† (Invermark), Shaila Rao and Christopher Murphy (Mar Lodge), and Ralph Peters and Philip Astor (Tillypronie). Data accessibility • Genotype data (DataDryad: doi:10.5061/dryad.4t7jk) • Metadata (information on sampling sites, phenotypes and medication regimen) (DataDryad: doi:10.5061/dryad.4t7jk)
Resumo:
Vascular cognitive impairment (VCI), including its severe form, vascular dementia (VaD), is the second most common form of dementia. The genetic etiology of sporadic VCI remains largely unknown. We previously conducted a systematic review and meta-analysis of all published genetic association studies of sporadic VCI prior to 6 July 2012, which demonstrated that APOE (ɛ4, ɛ2) and MTHFR (rs1801133) variants were associated with susceptibility for VCI. De novo genotyping was conducted in a new independent relatively large collaborative European cohort of VaD (nmax = 549) and elderly non-demented samples (nmax = 552). Where available, genotype data derived from Illumina's 610-quad array for 1210 GERAD1 control samples were also included in analyses of genes examined. Associations were tested using the Cochran-Armitage trend test: MTHFR rs1801133 (OR = 1.36, 95% CI 1.16-1.58, p = <0.0001), APOE rs7412 (OR = 0.62, 95% CI 0.42-0.90, p = 0.01), and APOE rs429358 (OR = 1.59, 95% CI 1.17-2.16, p = 0.003). Association was also observed with APOE epsilon alleles; ɛ4 (OR = 1.85, 95% CI 1.35-2.52, p = <0.0001) and ɛ2 (OR = 0.67, 95% CI 0.46-0.98, p = 0.03). Logistic Regression and Bonferroni correction in a subgroup of the cohort adjusted for gender, age, and population maintained the association of APOE rs429358 and ɛ4 allele.
Resumo:
The identification of subjects at high risk for Alzheimer’s disease is important for prognosis and early intervention. We investigated the polygenic architecture of Alzheimer’s disease and the accuracy of Alzheimer’s disease prediction models, including and excluding the polygenic component in the model. This study used genotype data from the powerful dataset comprising 17 008 cases and 37 154 controls obtained from the International Genomics of Alzheimer’s Project (IGAP). Polygenic score analysis tested whether the alleles identified to associate with disease in one sample set were significantly enriched in the cases relative to the controls in an independent sample. The disease prediction accuracy was investigated in a subset of the IGAP data, a sample of 3049 cases and 1554 controls (for whom APOE genotype data were available) by means of sensitivity, specificity, area under the receiver operating characteristic curve (AUC) and positive and negative predictive values. We observed significant evidence for a polygenic component enriched in Alzheimer’s disease (P = 4.9 × 10−26). This enrichment remained significant after APOE and other genome-wide associated regions were excluded (P = 3.4 × 10−19). The best prediction accuracy AUC = 78.2% (95% confidence interval 77–80%) was achieved by a logistic regression model with APOE, the polygenic score, sex and age as predictors. In conclusion, Alzheimer’s disease has a significant polygenic component, which has predictive utility for Alzheimer’s disease risk and could be a valuable research tool complementing experimental designs, including preventative clinical trials, stem cell selection and high/low risk clinical studies. In modelling a range of sample disease prevalences, we found that polygenic scores almost doubles case prediction from chance with increased prediction at polygenic extremes.