939 resultados para genome wide complex trait analysis
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^
Resumo:
Pathway based genome wide association study evolves from pathway analysis for microarray gene expression and is under rapid development as a complementary for single-SNP based genome wide association study. However, it faces new challenges, such as the summarization of SNP statistics to pathway statistics. The current study applies the ridge regularized Kernel Sliced Inverse Regression (KSIR) to achieve dimension reduction and compared this method to the other two widely used methods, the minimal-p-value (minP) approach of assigning the best test statistics of all SNPs in each pathway as the statistics of the pathway and the principal component analysis (PCA) method of utilizing PCA to calculate the principal components of each pathway. Comparison of the three methods using simulated datasets consisting of 500 cases, 500 controls and100 SNPs demonstrated that KSIR method outperformed the other two methods in terms of causal pathway ranking and the statistical power. PCA method showed similar performance as the minP method. KSIR method also showed a better performance over the other two methods in analyzing a real dataset, the WTCCC Ulcerative Colitis dataset consisting of 1762 cases, 3773 controls as the discovery cohort and 591 cases, 1639 controls as the replication cohort. Several immune and non-immune pathways relevant to ulcerative colitis were identified by these methods. Results from the current study provided a reference for further methodology development and identified novel pathways that may be of importance to the development of ulcerative colitis.^
Resumo:
We have developed high-density DNA microarrays of yeast ORFs. These microarrays can monitor hybridization to ORFs for applications such as quantitative differential gene expression analysis and screening for sequence polymorphisms. Automated scripts retrieved sequence information from public databases to locate predicted ORFs and select appropriate primers for amplification. The primers were used to amplify yeast ORFs in 96-well plates, and the resulting products were arrayed using an automated micro arraying device. Arrays containing up to 2,479 yeast ORFs were printed on a single slide. The hybridization of fluorescently labeled samples to the array were detected and quantitated with a laser confocal scanning microscope. Applications of the microarrays are shown for genetic and gene expression analysis at the whole genome level.
Resumo:
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Resumo:
Neuropathological and brain imaging studies suggest that schizophrenia may result from neurodevelopmental defects. Cytoarchitectural studies indicate cellular abnormalities suggestive of a disruption in neuronal connectivity in schizophrenia, particularly in the dorsolateral prefrontal cortex. Yet, the molecular mechanisms underlying these findings remain unclear. To identify molecular substrates associated with schizophrenia, DNA microarray analysis was used to assay gene expression levels in postmortem dorsolateral prefrontal cortex of schizophrenic and control patients. Genes determined to have altered expression levels in schizophrenics relative to controls are involved in a number of biological processes, including synaptic plasticity, neuronal development, neurotransmission, and signal transduction. Most notable was the differential expression of myelination-related genes suggesting a disruption in oligodendrocyte function in schizophrenia.
Resumo:
Background: Eosinophils are granulocytic white blood cells implicated in asthma and atopic disease. The degree of eosinophilia in the blood of patients with asthma correlates with the severity of asthmatic symptoms. Quantitative trait loci (QTL) linkage analysis of eosinophil count may be a more powerful strategy of mapping genes involved in asthma than linkage analysis using affected relative pairs. 1 Objective: To identify QTLs responsible for variation in eosinophil count in adolescent twins. Methods: We measured eosinophil count longitudinally in 738 pairs of twins at 12, 14, and 16 years of age. We typed 757 highly polymorphic microsatellite markers at an average spacing of similar to5 centimorgans across the genome. We then used multipoint variance components linkage analysis to test for linkage between marker loci and eosinophil concentrations at each age across the genome. Results: We found highly significant linkage on chromosome 2q33 in 12-year-old twins (logarithm of the odds = 4.6; P = .000002) and suggestive evidence of linkage in the same region in 14-year-olds (logarithm of the odds = 1.0; P = .016). We also found suggestive evidence of linkage at other areas of the genome, including regions on chromosomes 2, 3, 4, 8, 9, 11, 12, 17, 20, and 22. Conclusion: A QTL for eosinophil count is present on chromosome 2q33. This QTL might represent a gene involved in asthma pathophysiology.
Resumo:
The SOX family of transcription factors are found throughout the animal kingdom and are important in a variety of developmental contexts. Genome analysis has identified 20 Sox genes in human and mouse, which can be subdivided into 8 groups, based on sequence comparison and intron-exon structure. Most of the SOX groups identified in mammals are represented by a single SOX sequence in invertebrate model organisms, suggesting a duplication and divergence mechanism has operated during vertebrate evolution. We have now analysed the Sox gene complement in the pufferfish, Fugu rubripes, in order to shed further light on the diversity and origins of the Sox gene family. Major differences were found between the Sox family in Fugu and those in humans and mice. In particular, Fugu does not have orthologues of Sry, Sox,15 and Sox30, which appear to be specific to mammals, while Sox19, found in Fugu and zebrafish but absent in mammals, seems to be specific to fishes. Six mammalian Sox genes are represented by two copies each in Fugu, indicating a large-scale gene duplication in the fish lineage. These findings point to recent Sox gene loss, duplication and divergence occurring during the evolution of tetrapod and teleost lineages, and provide further evidence for large-scale segmental or a whole-genome duplication occurring early in the radiation of teleosts. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a welldefined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
Resumo:
In a genome-wide RNA-mediated interference screen for genes required in membrane traffic - including endocytic uptake, recycling from endosomes to the plasma membrane, and secretion - we identified 168 candidate endocytosis regulators and 100 candidate secretion regulators. Many of these candidates are highly conserved among metazoans but have not been previously implicated in these processes. Among the positives from the screen, we identified PAR-3, PAR-6, PKC-3 and CDC-42, proteins that are well known for their importance in the generation of embryonic and epithelial-cell polarity. Further analysis showed that endocytic transport in Caenorhabditis elegans coelomocytes and human HeLa cells was also compromised after perturbation of CDC-42/Cdc42 or PAR-6/Par6 function, indicating a general requirement for these proteins in regulating endocytic traffic. Consistent with these results, we found that tagged CDC-42/Cdc42 is enriched on recycling endosomes in C. elegans and mammalian cells, suggesting a direct function in the regulation of transport.
Resumo:
Our sleep timing preference, or chronotype, is a manifestation of our internal biological clock. Variation in chronotype has been linked to sleep disorders, cognitive and physical performance, and chronic disease. Here we perform a genome-wide association study of self-reported chronotype within the UK Biobank cohort (n=100,420). We identify 12 new genetic loci that implicate known components of the circadian clock machinery and point to previously unstudied genetic variants and candidate genes that might modulate core circadian rhythms or light-sensing pathways. Pathway analyses highlight central nervous and ocular systems and fear-response-related processes. Genetic correlation analysis suggests chronotype shares underlying genetic pathways with schizophrenia, educational attainment and possibly BMI. Further, Mendelian randomization suggests that evening chronotype relates to higher educational attainment. These results not only expand our knowledge of the circadian system in humans but also expose the influence of circadian characteristics over human health and life-history variables such as educational attainment.
Resumo:
We thank the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics and the Wellcome Trust Sanger Institute for the generation of the sequencing data. This work was funded by Wellcome Trust grant 090532/Z/09/Z (J.F.). Primary phenotyping of the mice was supported by the Mary Lyon Centre and Mammalian Genetics Unit (Medical Research Council, UK Hub grant G0900747 91070 and Medical Research Council, UK grant MC U142684172). D.A.B acknowledges support from NIH R01AR056280. The sleep work was supported by the state of Vaud (Switzerland) and the Swiss National Science Foundation (SNF 14694 and 136201 to P.F.). The ECG work was supported by the Netherlands CardioVascular Research Initiative (Dutch Heart Foundation, Dutch Federation of University Medical Centres, the Netherlands Organization for Health Research and Development, and the Royal Netherlands Academy of Sciences) PREDICT project, InterUniversity Cardiology Institute of the Netherlands (ICIN; 061.02; C.A.R., C.R.B). Na Cai is supported by the Agency of Science, Technology and Research (A*STAR) Graduate Academy. The authors wish to acknowledge excellent technical assistance from: Ayako Kurioka, Leo Swadling, Catherine de Lara, James Ussher, Rachel Townsend, Sima Lionikaite, Ausra S. Lionikiene, Rianne Wolswinkel and Inge van der Made. We would like to thank Thomas M Keane and Anthony G Doran for their help in annotating variants and adding the FVB/NJ strain to the Mouse Genomes Project.
Resumo:
Background: Esophageal adenocarcinoma (EA) is one of the fastest rising cancers in western countries. Barrett’s Esophagus (BE) is the premalignant precursor of EA. However, only a subset of BE patients develop EA, which complicates the clinical management in the absence of valid predictors. Genetic risk factors for BE and EA are incompletely understood. This study aimed to identify novel genetic risk factors for BE and EA.Methods: Within an international consortium of groups involved in the genetics of BE/EA, we performed the first meta-analysis of all genome-wide association studies (GWAS) available, involving 6,167 BE patients, 4,112 EA patients, and 17,159 representative controls, all of European ancestry, genotyped on Illumina high-density SNP-arrays, collected from four separate studies within North America, Europe, and Australia. Meta-analysis was conducted using the fixed-effects inverse variance-weighting approach. We used the standard genome-wide significant threshold of 5×10-8 for this study. We also conducted an association analysis following reweighting of loci using an approach that investigates annotation enrichment among the genome-wide significant loci. The entire GWAS-data set was also analyzed using bioinformatics approaches including functional annotation databases as well as gene-based and pathway-based methods in order to identify pathophysiologically relevant cellular pathways.Findings: We identified eight new associated risk loci for BE and EA, within or near the CFTR (rs17451754, P=4·8×10-10), MSRA (rs17749155, P=5·2×10-10), BLK (rs10108511, P=2·1×10-9), KHDRBS2 (rs62423175, P=3·0×10-9), TPPP/CEP72 (rs9918259, P=3·2×10-9), TMOD1 (rs7852462, P=1·5×10-8), SATB2 (rs139606545, P=2·0×10-8), and HTR3C/ABCC5 genes (rs9823696, P=1·6×10-8). A further novel risk locus at LPA (rs12207195, posteriori probability=0·925) was identified after re-weighting using significantly enriched annotations. This study thereby doubled the number of known risk loci. The strongest disease pathways identified (P<10-6) belong to muscle cell differentiation and to mesenchyme development/differentiation, which fit with current pathophysiological BE/EA concepts. To our knowledge, this study identified for the first time an EA-specific association (rs9823696, P=1·6×10-8) near HTR3C/ABCC5 which is independent of BE development (P=0·45).Interpretation: The identified disease loci and pathways reveal new insights into the etiology of BE and EA. Furthermore, the EA-specific association at HTR3C/ABCC5 may constitute a novel genetic marker for the prediction of transition from BE to EA. Mutations in CFTR, one of the new risk loci identified in this study, cause cystic fibrosis (CF), the most common recessive disorder in Europeans. Gastroesophageal reflux (GER) belongs to the phenotypic CF-spectrum and represents the main risk factor for BE/EA. Thus, the CFTR locus may trigger a common GER-mediated pathophysiology.
Resumo:
Strawberry (Fragaria × ananassa) is an important soft fruit but easily to be infected by pathogens. Anthracnose and gray mold are two of the most destructive diseases of strawberry which lead to serious fruit rot. The first chapter introduced strawberry anthracnose caused by Colletotrichum acutatum. The infection strategy, disease cycle and management of C. acutatum on strawberry were reported. Likewise, the second chapter summarized the infection strategy of Botrytis cinerea and the defense responses of strawberry. As we already know white unripe strawberry fruits are more resistant to C. acutatum than red ripe fruits. During the interaction between strawberry white/red fruit and C. acutaum, a mannose binding lectin gene, FaMBL1, was found to be the most up-regulated gene and induced exclusively in white fruit. FaMBL1 belongs to the G-type lectin family which has important roles in plant development and defense process. To get insight into the role of FaMBL1, genome-wide identification was carried out on G-type lectin gene family in Fragaria vesca and the results were showed in chapter 3. G-type lectin genes make up a large family in F. vesca. Active expression upon biotic/abiotic stresses suggested a potential role of G-lectin genes in strawberry defenses. Hence, stable transgenic strawberry plants with FaMBL1 gene overexpressed were generated. Transformed strawberry plants were screened and identified. The results were showed in chapter 4, content of disease-related phytohormone, jasmonic acid, was found decreased in overexpressing lines compared with wild type (WT). Petioles inoculated by C. fioriniae of overexpressing lines had lower disease incidence than WT. Leaves of overexpressing lines challenged by B. cinerea showed remarkably smaller lesion diameters compared with WT. The chitinase 2-1 (FaChi2-1) showed higher expression in overexpressing lines than in WT during the interaction with B. cinerea, which could be related with the lower susceptibility of overexpressing lines.
Resumo:
Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.