939 resultados para genome wide complex trait analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The methylotrophic, Crabtree-negative yeast Pichia pastoris is widely used as a heterologous protein production host. Strong inducible promoters derived from methanol utilization genes or constitutive glycolytic promoters are typically used to drive gene expression. Notably, genes involved in methanol utilization are not only repressed by the presence of glucose, but also by glycerol. This unusual regulatory behavior prompted us to study the regulation of carbon substrate utilization in different bioprocess conditions on a genome wide scale. Results: We performed microarray analysis on the total mRNA population as well as mRNA that had been fractionated according to ribosome occupancy. Translationally quiescent mRNAs were defined as being associated with single ribosomes (monosomes) and highly-translated mRNAs with multiple ribosomes (polysomes). We found that despite their lower growth rates, global translation was most active in methanol-grown P. pastoris cells, followed by excess glycerol- or glucose-grown cells. Transcript-specific translational responses were found to be minimal, while extensive transcriptional regulation was observed for cells grown on different carbon sources. Due to their respiratory metabolism, cells grown in excess glucose or glycerol had very similar expression profiles. Genes subject to glucose repression were mainly involved in the metabolism of alternative carbon sources including the control of glycerol uptake and metabolism. Peroxisomal and methanol utilization genes were confirmed to be subject to carbon substrate repression in excess glucose or glycerol, but were found to be strongly de-repressed in limiting glucose-conditions (as are often applied in fed batch cultivations) in addition to induction by methanol. Conclusions: P. pastoris cells grown in excess glycerol or glucose have similar transcript profiles in contrast to S. cerevisiae cells, in which the transcriptional response to these carbon sources is very different. The main response to different growth conditions in P. pastoris is transcriptional; translational regulation was not transcript-specific. The high proportion of mRNAs associated with polysomes in methanol-grown cells is a major finding of this study; it reveals that high productivity during methanol induction is directly linked to the growth condition and not only to promoter strength.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background - Specific language impairment (SLI) is a common neurodevelopmental disorder, observed in 5–10 % of children. Family and twin studies suggest a strong genetic component, but relatively few candidate genes have been reported to date. A recent genome-wide association study (GWAS) described the first statistically significant association specifically for a SLI cohort between a missense variant (rs4280164) in the NOP9 gene and language-related phenotypes under a parent-of-origin model. Replications of these findings are particularly challenging because the availability of parental DNA is required. Methods - We used two independent family-based cohorts characterised with reading- and language-related traits: a longitudinal cohort (n = 106 informative families) including children with language and reading difficulties and a nuclear family cohort (n = 264 families) selected for dyslexia. Results - We observed association with language-related measures when modelling for parent-of-origin effects at the NOP9 locus in both cohorts: minimum P = 0.001 for phonological awareness with a paternal effect in the first cohort and minimum P = 0.0004 for irregular word reading with a maternal effect in the second cohort. Allelic and parental trends were not consistent when compared to the original study. Conclusions - A parent-of-origin effect at this locus was detected in both cohorts, albeit with different trends. These findings contribute in interpreting the original GWAS report and support further investigations of the NOP9 locus and its role in language-related traits. A systematic evaluation of parent-of-origin effects in genetic association studies has the potential to reveal novel mechanisms underlying complex traits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children's polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting DNA sequence with life outcomes may provide targets for interventions to promote population-wide positive development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advent of next-generation sequencing, now nearing a decade in age, has enabled, among other capabilities, measurement of genome-wide sequence features at unprecedented scale and resolution.

In this dissertation, I describe work to understand the genetic underpinnings of non-Hodgkin’s lymphoma through exploration of the epigenetics of its cell of origin, initial characterization and interpretation of driver mutations, and finally, a larger-scale, population-level study that incorporates mutation interpretation with clinical outcome.

In the first research chapter, I describe genomic characteristics of lymphomas through the lens of their cells of origin. Just as many other cancers, such as breast cancer or lung cancer, are categorized based on their cell of origin, lymphoma subtypes can be examined through the context of their normal B Cells of origin, Naïve, Germinal Center, and post-Germinal Center. By applying integrative analysis of the epigenetics of normal B Cells of origin through chromatin-immunoprecipitation sequencing, we find that differences in normal B Cell subtypes are reflected in the mutational landscapes of the cancers that arise from them, namely Mantle Cell, Burkitt, and Diffuse Large B-Cell Lymphoma.

In the next research chapter, I describe our first endeavor into understanding the genetic heterogeneity of Diffuse Large B Cell Lymphoma, the most common form of non-Hodgkin’s lymphoma, which affects 100,000 patients in the world. Through whole-genome sequencing of 1 case as well as whole-exome sequencing of 94 cases, we characterize the most recurrent genetic features of DLBCL and lay the groundwork for a larger study.

In the last research chapter, I describe work to characterize and interpret the whole exomes of 1001 cases of DLBCL in the largest single-cancer study to date. This highly-powered study enabled sub-gene, gene-level, and gene-network level understanding of driver mutations within DLBCL. Moreover, matched genomic and clinical data enabled the connection of these driver mutations to clinical features such as treatment response or overall survival. As sequencing costs continue to drop, whole-exome sequencing will become a routine clinical assay, and another diagnostic dimension in addition to existing methods such as histology. However, to unlock the full utility of sequencing data, we must be able to interpret it. This study undertakes a first step in developing the understanding necessary to uncover the genomic signals of DLBCL hidden within its exomes. However, beyond the scope of this one disease, the experimental and analytical methods can be readily applied to other cancer sequencing studies.

Thus, this dissertation leverages next-generation sequencing analysis to understand the genetic underpinnings of lymphoma, both by examining its normal cells of origin as well as through a large-scale study to sensitively identify recurrently mutated genes and their relationship to clinical outcome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Multiple recent genome-wide association studies (GWAS) have identified a single nucleotide polymorphism (SNP), rs10771399, at 12p11 that is associated with breast cancer risk. METHOD: We performed a fine-scale mapping study of a 700 kb region including 441 genotyped and more than 1300 imputed genetic variants in 48,155 cases and 43,612 controls of European descent, 6269 cases and 6624 controls of East Asian descent and 1116 cases and 932 controls of African descent in the Breast Cancer Association Consortium (BCAC; http://bcac.ccge.medschl.cam.ac.uk/ ), and in 15,252 BRCA1 mutation carriers in the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA). Stepwise regression analyses were performed to identify independent association signals. Data from the Encyclopedia of DNA Elements project (ENCODE) and the Cancer Genome Atlas (TCGA) were used for functional annotation. RESULTS: Analysis of data from European descendants found evidence for four independent association signals at 12p11, represented by rs7297051 (odds ratio (OR) = 1.09, 95 % confidence interval (CI) = 1.06-1.12; P = 3 × 10(-9)), rs805510 (OR = 1.08, 95 % CI = 1.04-1.12, P = 2 × 10(-5)), and rs1871152 (OR = 1.04, 95 % CI = 1.02-1.06; P = 2 × 10(-4)) identified in the general populations, and rs113824616 (P = 7 × 10(-5)) identified in the meta-analysis of BCAC ER-negative cases and BRCA1 mutation carriers. SNPs rs7297051, rs805510 and rs113824616 were also associated with breast cancer risk at P < 0.05 in East Asians, but none of the associations were statistically significant in African descendants. Multiple candidate functional variants are located in putative enhancer sequences. Chromatin interaction data suggested that PTHLH was the likely target gene of these enhancers. Of the six variants with the strongest evidence of potential functionality, rs11049453 was statistically significantly associated with the expression of PTHLH and its nearby gene CCDC91 at P < 0.05. CONCLUSION: This study identified four independent association signals at 12p11 and revealed potentially functional variants, providing additional insights into the underlying biological mechanism(s) for the association observed between variants at 12p11 and breast cancer risk

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diabetes is the leading cause of end stage renal disease. Despite evidence for a substantial heritability of diabetic kidney disease, efforts to identify genetic susceptibility variants have had limited success. We extended previous efforts in three dimensions, examining a more comprehensive set of genetic variants in larger numbers of subjects with type 1 diabetes characterized for a wider range of cross-sectional diabetic kidney disease phenotypes. In 2,843 subjects, we estimated that the heritability of diabetic kidney disease was 35% ( p=6x10-3 ). Genome-wide association analysis and replication in 12,540 individuals identified no single variants reaching stringent levels of significance and, despite excellent power, provided little independent confirmation of previously published associated variants. Whole exome sequencing in 997 subjects failed to identify any large-effect coding alleles of lower frequency influencing the risk of diabetic kidney disease. However, sets of alleles increasing body mass index ( p=2.2×10-5) and the risk of type 2 diabetes (p=6.1x10-4 ) were associated with the risk of diabetic kidney disease. We also found genome-wide genetic correlation between diabetic kidney disease and failure at smoking cessation ( p=1.1×10-4 ). Pathway analysis implicated ascorbate and aldarate metabolism ( p=9×10-6), and pentose and glucuronate interconversions ( p=3×10-6) in pathogenesis of diabetic kidney disease. These data provide further evidence for the role of genetic factors influencing diabetic kidney disease in those with type 1 diabetes and highlight some key pathways that may be responsible. Altogether these results reveal important biology behind the major cause of kidney disease.  

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Schistosomiasis is a neglected tropical disease that affects more than 200 million people worldwide. The main disease-causing agents, Schistosoma japonicum, S. mansoni and S. haematobium, are blood flukes that have complex life cycles involving a snail intermediate host. In Asia, S. japonicum causes hepatointestinal disease (schistosomiasis japonica) and is challenging to control due to a broad distribution of its snail hosts and range of animal reservoir hosts. In China, extensive efforts have been underway to control this parasite, but genetic variability in S. japonicum populations could represent an obstacle to eliminating schistosomiasis japonica. Although a draft genome sequence is available for S. japonicum, there has been no previous study of molecular variation in this parasite on a genome-wide scale. In this study, we conducted the first deep genomic exploration of seven S. japonicum populations from mainland China, constructed phylogenies using mitochondrial and nuclear genomic data sets, and established considerable variation between some of the populations in genes inferred to be linked to key cellular processes and/or pathogen-host interactions. Based on the findings from this study, we propose that verifying intraspecific conservation in vaccine or drug target candidates is an important first step toward developing effective vaccines and chemotherapies against schistosomiasis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Schistosomiasis is a significant cause of human morbidity and mortality. We performed a genome-wide transcriptional survey of liver biopsies obtained from Chinese patients with chronic schistosomiasis only, or chronic schistosomiasis with a current or past history of viral hepatitis B. Both disease groups were compared with patients with no prior history or indicators of any liver disease. Analysis showed in the main, downregulation in gene expression, particularly those involved in signal transduction via EIF2 signalling and mTOR signalling, as were genes associated with cellular remodelling. Focusing on immune associated pathways, genes were generally downregulated. However, a set of three genes associated with granulocytes, MMP7, CLDN7, CXCL6 were upregulated. Differential gene profiles unique to schistosomiasis included the gene Granulin which was decreased despite being generally considered a marker for liver disease, and IGBP2 which is associated with increased liver size, and was the most upregulated gene in schistosomiasis only patients, all of which presented with hepatomegaly. The unique features of gene expression, in conjunction with previous reports in the murine model of the cellular composition of granulomas, granuloma formation and recovery, provide an increased understanding of the molecular immunopathology and general physiological processes underlying hepatic schistosomiasis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification of subjects at high risk for Alzheimer’s disease is important for prognosis and early intervention. We investigated the polygenic architecture of Alzheimer’s disease and the accuracy of Alzheimer’s disease prediction models, including and excluding the polygenic component in the model. This study used genotype data from the powerful dataset comprising 17 008 cases and 37 154 controls obtained from the International Genomics of Alzheimer’s Project (IGAP). Polygenic score analysis tested whether the alleles identified to associate with disease in one sample set were significantly enriched in the cases relative to the controls in an independent sample. The disease prediction accuracy was investigated in a subset of the IGAP data, a sample of 3049 cases and 1554 controls (for whom APOE genotype data were available) by means of sensitivity, specificity, area under the receiver operating characteristic curve (AUC) and positive and negative predictive values. We observed significant evidence for a polygenic component enriched in Alzheimer’s disease (P = 4.9 × 10−26). This enrichment remained significant after APOE and other genome-wide associated regions were excluded (P = 3.4 × 10−19). The best prediction accuracy AUC = 78.2% (95% confidence interval 77–80%) was achieved by a logistic regression model with APOE, the polygenic score, sex and age as predictors. In conclusion, Alzheimer’s disease has a significant polygenic component, which has predictive utility for Alzheimer’s disease risk and could be a valuable research tool complementing experimental designs, including preventative clinical trials, stem cell selection and high/low risk clinical studies. In modelling a range of sample disease prevalences, we found that polygenic scores almost doubles case prediction from chance with increased prediction at polygenic extremes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD)1, 2. These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case–control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer’s disease in seven independent case–control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer’s disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer’s disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer’s disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although epidemiological studies suggest that type 2 diabetes mellitus (T2DM) increases the risk of late-onset Alzheimer's disease (LOAD), the biological basis of this relationship is not well understood. The aim of this study was to examine the genetic comorbidity between the 2 disorders and to investigate whether genetic liability to T2DM, estimated by a genotype risk scores based on T2DM associated loci, is associated with increased risk of LOAD. This study was performed in 2 stages. In stage 1, we combined genotypes for the top 15 T2DM-associated polymorphisms drawn from approximately 3000 individuals (1349 cases and 1351 control subjects) with extracted and/or imputed data from 6 genome-wide studies (>10,000 individuals; 4507 cases, 2183 controls, 4989 population controls) to form a genotype risk score and examined if this was associated with increased LOAD risk in a combined meta-analysis. In stage 2, we investigated the association of LOAD with an expanded T2DM score made of 45 well-established variants drawn from the 6 genome-wide studies. Results were combined in a meta-analysis. Both stage 1 and stage 2 T2DM risk scores were not associated with LOAD risk (odds ratio = 0.988; 95% confidence interval, 0.972-1.004; p = 0.144 and odds ratio = 0.993; 95% confidence interval, 0.983-1.003; p = 0.149 per allele, respectively). Contrary to expectation, genotype risk scores based on established T2DM candidates were not associated with increased risk of LOAD. The observed epidemiological associations between T2DM and LOAD could therefore be a consequence of secondary disease processes, pleiotropic mechanisms, and/or common environmental risk factors. Future work should focus on well-characterized longitudinal cohorts with extensive phenotypic and genetic data relevant to both LOAD and T2DM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Le glaucome est un groupe hétérogène de maladies qui sont caractérisées par l’apoptose des cellules ganglionnaires de la rétine et la dégénérescence progressive du nerf optique. Il s’agit de la première cause de cécité irréversible, qui touche environ 60 millions de personnes dans le monde. Sa forme la plus commune est le glaucome à angle ouvert (GAO), un trouble polygénique causé principalement par une prédisposition génétique, en interaction avec d’autres facteurs de risque tels que l’âge et la pression intraoculaire élevée (PIO). Le GAO est une maladie génétique complexe, bien que certaines formes sévères sont autosomiques dominantes. Dix-sept loci ont été liés à la maladie et acceptés par la « Human Genome Organisation » (HUGO) et cinq gènes ont été identifiés à ces loci (MYOC, OPTN, WDR36, NTF4, ASB10). Récemment, des études d’association sur l’ensemble du génome ont identifié plus de 20 facteurs de risque fréquents, avec des effets relativement faibles. Depuis plus de 50 ans, notre équipe étudie 749 membres de la grande famille canadienne-française CA où la mutation MYOCK423E cause une forme autosomale dominante de GAO dont l’âge de début est fortement variable. Premièrement, il a été montré que cette variabilité de l’âge de début de l’hypertension intraoculaire possède une importante composante génétique causée par au moins un gène modificateur. Ce modificateur interagit avec la mutation primaire et altère la sévérité du glaucome chez les porteurs de MYOCK423E. Un gène modificateur candidat WDR36 a été génotypé dans 2 grandes familles CA et BV. Les porteurs de variations non-synonymes de WDR36 ainsi que de MYOCK423E de la famille CA ont montré une tendance à développer la maladie plus jeune. Un outil de forage de données a été développé pour représenter des informations connues relatives à la maladie et faciliter la priorisation des gènes candidats. Cet outil a été appliqué avec succès à la dépression bipolaire et au glaucome. La suite du projet consiste à finaliser un balayage de génome sur la famille CA et à séquencer les loci afin d’identifier les variations modificatrices du glaucome. Éventuellement, ces variations permettront d’identifier les individus dont le glaucome risque d’être plus agressif.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyzed genome-wide association studies (GWASs), including data from 71,638 individuals from four ancestries, for estimated glomerular filtration rate (eGFR), a measure of kidney function used to define chronic kidney disease (CKD). We identified 20 loci attaining genome-wide-significant evidence of association (p < 5 × 10(-8)) with kidney function and highlighted that allelic effects on eGFR at lead SNPs are homogeneous across ancestries. We leveraged differences in the pattern of linkage disequilibrium between diverse populations to fine-map the 20 loci through construction of "credible sets" of variants driving eGFR association signals. Credible variants at the 20 eGFR loci were enriched for DNase I hypersensitivity sites (DHSs) in human kidney cells. DHS credible variants were expression quantitative trait loci for NFATC1 and RGS14 (at the SLC34A1 locus) in multiple tissues. Loss-of-function mutations in ancestral orthologs of both genes in Drosophila melanogaster were associated with altered sensitivity to salt stress. Renal mRNA expression of Nfatc1 and Rgs14 in a salt-sensitive mouse model was also reduced after exposure to a high-salt diet or induced CKD. Our study (1) demonstrates the utility of trans-ethnic fine mapping through integration of GWASs involving diverse populations with genomic annotation from relevant tissues to define molecular mechanisms by which association signals exert their effect and (2) suggests that salt sensitivity might be an important marker for biological processes that affect kidney function and CKD in humans.