892 resultados para genome wide complex trait analysis
Resumo:
Characterization of population genetic variation and structure can be used as tools for research in human genetics and population isolates are of great interest. The aim of the present study was to characterize the genetic structure of Xavante Indians and compare it with other populations. The Xavante, an indigenous population living in Brazilian Central Plateau, is one of the largest native groups in Brazil. A subset of 53 unrelated subjects was selected from the initial sample of 300 Xavante Indians. Using 86,197 markers, Xavante were compared with all populations of HapMap Phase III and HGDP-CEPH projects and with a Southeast Brazilian population sample to establish its population structure. Principal Components Analysis showed that the Xavante Indians are concentrated in the Amerindian axis near other populations of known Amerindian ancestry such as Karitiana, Pima, Surui and Maya and a low degree of genetic admixture was observed. This is consistent with the historical records of bottlenecks experience and cultural isolation. By calculating pair-wise F-st statistics we characterized the genetic differentiation between Xavante Indians and representative populations of the HapMap and from HGDP-CEPH project. We found that the genetic differentiation between Xavante Indians and populations of Ameridian, Asian, European, and African ancestry increased progressively. Our results indicate that the Xavante is a population that remained genetically isolated over the past decades and can offer advantages for genome-wide mapping studies of inherited disorders.
Resumo:
Membrane proteins are a large and important class of proteins. They are responsible for several of the key functions in a living cell, e.g. transport of nutrients and ions, cell-cell signaling, and cell-cell adhesion. Despite their importance it has not been possible to study their structure and organization in much detail because of the difficulty to obtain 3D structures. In this thesis theoretical studies of membrane protein sequences and structures have been carried out by analyzing existing experimental data. The data comes from several sources including sequence databases, genome sequencing projects, and 3D structures. Prediction of the membrane spanning regions by hydrophobicity analysis is a key technique used in several of the studies. A novel method for this is also presented and compared to other methods. The primary questions addressed in the thesis are: What properties are common to all membrane proteins? What is the overall architecture of a membrane protein? What properties govern the integration into the membrane? How many membrane proteins are there and how are they distributed in different organisms? Several of the findings have now been backed up by experiments. An analysis of the large family of G-protein coupled receptors pinpoints differences in length and amino acid composition of loops between proteins with and without a signal peptide and also differences between extra- and intracellular loops. Known 3D structures of membrane proteins have been studied in terms of hydrophobicity, distribution of secondary structure and amino acid types, position specific residue variability, and differences between loops and membrane spanning regions. An analysis of several fully and partially sequenced genomes from eukaryotes, prokaryotes, and archaea has been carried out. Several differences in the membrane protein content between organisms were found, the most important being the total number of membrane proteins and the distribution of membrane proteins with a given number of transmembrane segments. Of the properties that were found to be similar in all organisms, the most obvious is the bias in the distribution of positive charges between the extra- and intracellular loops. Finally, an analysis of homologues to membrane proteins with known topology uncovered two related, multi-spanning proteins with opposite predicted orientations. The predicted topologies were verified experimentally, providing a first example of "divergent topology evolution".
Resumo:
We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10 ? ? ). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.
Resumo:
P>Outcrossing Arabidopsis species that diverged from their inbreeding relative Arabidopsis thaliana 5 million yr ago and display a biogeographical pattern of interspecific sympatry vs intraspecific allopatry provides an ideal model for studying impacts of gene introgression and polyploidization on species diversification. Flow cytometry analyses detected ploidy polymorphisms of 2x and 4x in Arabidopsis lyrata ssp. kamchatica of Taiwan. Genomic divergence between species/subspecies was estimated based on 98 randomly chosen nuclear genes. Multilocus analyses revealed a mosaic genome in diploid A. l. kamchatica composed of Arabidopsis halleri-like and A. lyrata-like alleles. Coalescent analyses suggest that the segregation of ancestral polymorphisms alone cannot explain the high inconsistency between gene trees across loci, and that gene introgression via diploid A. l. kamchatica likely distorts the molecular phylogenies of Arabidopsis species. However, not all genes migrated across species freely. Gene ontology analyses suggested that some nonmigrating genes were constrained by natural selection. High levels of estimated ancestral polymorphisms between A. halleri and A. lyrata suggest that gene flow between these species has not completely ceased since their initial isolation. Polymorphism data of extant populations also imply recent gene flow between the species. Our study reveals that interspecific gene flow affects the genome evolution in Arabidopsis.
Resumo:
Attention-deficit/hyperactivity disorder (ADHD) is a common, highly heritable neurodevelopmental disorder. Genetic loci have not yet been identified by genome-wide association studies. Rare copy number variations (CNVs), such as chromosomal deletions or duplications, have been implicated in ADHD and other neurodevelopmental disorders. To identify rare (frequency 1%) CNVs that increase the risk of ADHD, we performed a whole-genome CNV analysis based on 489 young ADHD patients and 1285 adult population-based controls and identified one significantly associated CNV region. In tests for a global burden of large (>500 kb) rare CNVs, we observed a nonsignificant (P=0.271) 1.126-fold enriched rate of subjects carrying at least one such CNV in the group of ADHD cases. Locus-specific tests of association were used to assess if there were more rare CNVs in cases compared with controls. Detected CNVs, which were significantly enriched in the ADHD group, were validated by quantitative (q)PCR. Findings were replicated in an independent sample of 386 young patients with ADHD and 781 young population-based healthy controls. We identified rare CNVs within the parkinson protein 2 gene (PARK2) with a significantly higher prevalence in ADHD patients than in controls (P=2.8 × 10(-4) after empirical correction for genome-wide testing). In total, the PARK2 locus (chr 6: 162 659 756-162 767 019) harboured three deletions and nine duplications in the ADHD patients and two deletions and two duplications in the controls. By qPCR analysis, we validated 11 of the 12 CNVs in ADHD patients (P=1.2 × 10(-3) after empirical correction for genome-wide testing). In the replication sample, CNVs at the PARK2 locus were found in four additional ADHD patients and one additional control (P=4.3 × 10(-2)). Our results suggest that copy number variants at the PARK2 locus contribute to the genetic susceptibility of ADHD. Mutations and CNVs in PARK2 are known to be associated with Parkinson disease.Molecular Psychiatry advance online publication, 20 November 2012; doi:10.1038/mp.2012.161.
Resumo:
We report the identification of quantitative trait loci (QTL) affecting carcass composition, carcass length, fat deposition and lean meat content using a genome scan across 462 animals from a combined intercross and backcross between Hampshire and Landrace pigs. Data were analysed using multiple linear regression fitting additive and dominance effects. This model was compared with a model including a parent-of-origin effect to spot evidence of imprinting. Several precisely defined muscle phenotypes were measured in order to dissect body composition in more detail. Three significant QTL were detected in the study at the 1% genome-wide level, and twelve significant QTL were detected at the 5% genome-wide level. These QTL comprise loci affecting fat deposition and lean meat content on SSC1, 4, 9, 10, 13 and 16, a locus on SSC2 affecting the ratio between weight of meat and bone in back and weight of meat and bone in ham and two loci affecting carcass length on SSC12 and 17. The well-defined phenotypes in this study enabled us to detect QTL for sizes of individual muscles and to obtain information of relevance for the description of the complexity underlying other carcass traits.
Resumo:
Humans and dogs are both affected by the allergic skin disease atopic dermatitis (AD), caused by an interaction between genetic and environmental factors. The German shepherd dog (GSD) is a high-risk breed for canine AD (CAD). In this study, we used a Swedish cohort of GSDs as a model for human AD. Serum IgA levels are known to be lower in GSDs compared to other breeds. We detected significantly lower IgA levels in the CAD cases compared to controls (p = 1.1 × 10(-5)) in our study population. We also detected a separation within the GSD cohort, where dogs could be grouped into two different subpopulations. Disease prevalence differed significantly between the subpopulations contributing to population stratification (λ = 1.3), which was successfully corrected for using a mixed model approach. A genome-wide association analysis of CAD was performed (n cases = 91, n controls = 88). IgA levels were included in the model, due to the high correlation between CAD and low IgA levels. In addition, we detected a correlation between IgA levels and the age at the time of sampling (corr = 0.42, p = 3.0 × 10(-9)), thus age was included in the model. A genome-wide significant association was detected on chromosome 27 (praw = 3.1 × 10(-7), pgenome = 0.03). The total associated region was defined as a ~1.5-Mb-long haplotype including eight genes. Through targeted re-sequencing and additional genotyping of a subset of identified SNPs, we defined 11 smaller haplotype blocks within the associated region. Two blocks showed the strongest association to CAD. The ~209-kb region, defined by the two blocks, harbors only the PKP2 gene, encoding Plakophilin 2 expressed in the desmosomes and important for skin structure. Our results may yield further insight into the genetics behind both canine and human AD.
Resumo:
Intense selective pressures applied over short evolutionary time have resulted in homogeneity within, but substantial variation among, horse breeds. Utilizing this population structure, 744 individuals from 33 breeds, and a 54,000 SNP genotyping array, breed-specific targets of selection were identified using an F(ST)-based statistic calculated in 500-kb windows across the genome. A 5.5-Mb region of ECA18, in which the myostatin (MSTN) gene was centered, contained the highest signature of selection in both the Paint and Quarter Horse. Gene sequencing and histological analysis of gluteal muscle biopsies showed a promoter variant and intronic SNP of MSTN were each significantly associated with higher Type 2B and lower Type 1 muscle fiber proportions in the Quarter Horse, demonstrating a functional consequence of selection at this locus. Signatures of selection on ECA23 in all gaited breeds in the sample led to the identification of a shared, 186-kb haplotype including two doublesex related mab transcription factor genes (DMRT2 and 3). The recent identification of a DMRT3 mutation within this haplotype, which appears necessary for the ability to perform alternative gaits, provides further evidence for selection at this locus. Finally, putative loci for the determination of size were identified in the draft breeds and the Miniature horse on ECA11, as well as when signatures of selection surrounding candidate genes at other loci were examined. This work provides further evidence of the importance of MSTN in racing breeds, provides strong evidence for selection upon gait and size, and illustrates the potential for population-based techniques to find genomic regions driving important phenotypes in the modern horse.
Resumo:
During the development of the somatic genome from the Paramecium germline genome the bulk of the copies of ∼45 000 unique, internal eliminated sequences (IESs) are deleted. IES targeting is facilitated by two small RNA (sRNA) classes: scnRNAs, which relay epigenetic information from the parental nucleus to the developing nucleus, and iesRNAs, which are produced and used in the developing nucleus. Why only certain IESs require sRNAs for their removal has been enigmatic. By analyzing the silencing effects of three genes: PGM (responsible for DNA excision), DCL2/3 (scnRNA production) and DCL5 (iesRNA production), we identify key properties required for IES elimination. Based on these results, we propose that, depending on the exact combination of their lengths and end bases, some IESs are less efficiently recognized or excised and have a greater requirement for targeting by scnRNAs and iesRNAs. We suggest that the variation in IES retention following silencing of DCL2/3 is not primarily due to scnRNA density, which is comparatively uniform relative to IES retention, but rather the genetic properties of IESs. Taken together, our analyses demonstrate that in Paramecium the underlying genetic properties of developmentally deleted DNA sequences are essential in determining the sensitivity of these sequences to epigenetic control.
Resumo:
The molecular analysis of genes influencing human height has been notoriously difficult. Genome-wide association studies (GWAS) for height in humans based on tens of thousands to hundreds of thousands of samples so far revealed ∼200 loci for human height explaining only 20% of the heritability. In domestic animals isolated populations with a greatly reduced genetic heterogeneity facilitate a more efficient analysis of complex traits. We performed a genome-wide association study on 1,077 Franches-Montagnes (FM) horses using ∼40,000 SNPs. Our study revealed two QTL for height at withers on chromosomes 3 and 9. The association signal on chromosome 3 is close to the LCORL/NCAPG genes. The association signal on chromosome 9 is close to the ZFAT gene. Both loci have already been shown to influence height in humans. Interestingly, there are very large intergenic regions at the association signals. The two detected QTL together explain ∼18.2% of the heritable variation of height in horses. However, another large fraction of the variance for height in horses results from ECA 1 (11.0%), although the association analysis did not reveal significantly associated SNPs on this chromosome. The QTL region on ECA 3 associated with height at withers was also significantly associated with wither height, conformation of legs, ventral border of mandible, correctness of gaits, and expression of the head. The region on ECA 9 associated with height at withers was also associated with wither height, length of croup and length of back. In addition to these two QTL regions on ECA 3 and ECA 9 we detected another QTL on ECA 6 for correctness of gaits. Our study highlights the value of domestic animal populations for the genetic analysis of complex traits.
Resumo:
To identify novel quantitative trait loci (QTL) within horses, we performed genome-wide association studies (GWAS) based on sequence-level genotypes for conformation and performance traits in the Franches-Montagnes (FM) horse breed. Sequence-level genotypes of FM horses were derived by re-sequencing 30 key founders and imputing 50K data of genotyped horses. In total, we included 1077 FM horses genotyped for ~4 million SNPs and their respective de-regressed breeding values of the traits in the analysis. Based on this dataset, we identified a total of 14 QTL associated with 18 conformation traits and one performance trait. Therefore, our results suggest that the application of sequence-derived genotypes increases the power to identify novel QTL which were not identified previously based on 50K SNP chip data.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^
Resumo:
Pathway based genome wide association study evolves from pathway analysis for microarray gene expression and is under rapid development as a complementary for single-SNP based genome wide association study. However, it faces new challenges, such as the summarization of SNP statistics to pathway statistics. The current study applies the ridge regularized Kernel Sliced Inverse Regression (KSIR) to achieve dimension reduction and compared this method to the other two widely used methods, the minimal-p-value (minP) approach of assigning the best test statistics of all SNPs in each pathway as the statistics of the pathway and the principal component analysis (PCA) method of utilizing PCA to calculate the principal components of each pathway. Comparison of the three methods using simulated datasets consisting of 500 cases, 500 controls and100 SNPs demonstrated that KSIR method outperformed the other two methods in terms of causal pathway ranking and the statistical power. PCA method showed similar performance as the minP method. KSIR method also showed a better performance over the other two methods in analyzing a real dataset, the WTCCC Ulcerative Colitis dataset consisting of 1762 cases, 3773 controls as the discovery cohort and 591 cases, 1639 controls as the replication cohort. Several immune and non-immune pathways relevant to ulcerative colitis were identified by these methods. Results from the current study provided a reference for further methodology development and identified novel pathways that may be of importance to the development of ulcerative colitis.^