939 resultados para genome wide complex trait analysis
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Resumo:
Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.
Resumo:
We show for the first time that upon injection into the cytoplasm of the oocyte, fluorescein-labeled spliceosomal snRNAs, in the context of functional snRNPs, are targeted to elongating pre-mRNAs. This finding presents us with a novel assay with which to dissect the mechanism by which snRNPs are targeted to nascent pre-mRNA transcripts. Two critical advantages offered by this system are immediately evident. First, it allows us to investigate the mechanisms employed to recruit snRNPs as it actually transpires within the realm of the cell nucleus. Second, it allows a genome-wide analysis of snRNP recruitment to nascent transcripts, and, hence, the conclusions drawn from these studies do not depend on the sequence of any particular promoter or pre-mRNA. Indeed, it is with this assay that we have stumbled upon a most unanticipated discovery: Contrary to the current paradigm, the co-transcriptional recruitment of splicing snRNPs to nascent transcripts is not contingent on their role in splicing in vivo. Based on these and other data, we have constructed a two-step recruitment-loading model wherein snRNPs are first recruited to pre-mRNA transcripts and only then loaded directly onto cis-acting sequences on nascent pre-mRNA. While conducting studies on snRNP trafficking, a new discovery was made. We found that the lampbrush chromosomes could be visualized by light microscopy in vivo, and that these chromosomes have an architecture that is identical with those in formaldehyde treated nuclear spread preparations. Importantly, we now have the first system with which we can examine the dynamic interactions of macromolecules with specific RNA polymerase II transcriptional units in the live nucleus.
Resumo:
Erratum in: Low-frequency and common genetic variation in ischemic stroke: The METASTROKE collaboration. [Neurology. 2016]
Resumo:
BACKGROUND: More than 80 % of all terrestrial plant species establish an arbuscular mycorrhiza (AM) symbiosis with Glomeromycota fungi. This plant-microbe interaction primarily improves phosphate uptake, but also supports nitrogen, mineral, and water aquisition. During the pre-contact stage, the AM symbiosis is controled by an exchange of diffusible factors from either partner. Amongst others, fungal signals were identified as a mix of sulfated and non-sulfated lipochitooligosaccharides (LCOs), being structurally related to rhizobial nodulation (Nod)-factor LCOs that in legumes induce the formation of nitrogen-fixing root nodules. LCO signals are transduced via a common symbiotic signaling pathway (CSSP) that activates a group of GRAS transcription factors (TFs). Using complex gene expression fingerprints as molecular phenotypes, this study primarily intended to shed light on the importance of the GRAS TFs NSP1 and RAM1 for LCO-activated gene expression during pre-symbiotic signaling. RESULTS: We investigated the genome-wide transcriptional responses in 5 days old primary roots of the Medicago truncatula wild type and four symbiotic mutants to a 6 h challenge with LCO signals supplied at 10(-7/-8) M. We were able to show that during the pre-symbiotic stage, sulfated Myc-, non-sulfated Myc-, and Nod-LCO-activated gene expression almost exclusively depends on the LysM receptor kinase NFP and is largely controled by the CSSP, although responses independent of this pathway exist. Our results show that downstream of the CSSP, gene expression activation by Myc-LCOs supplied at 10(-7/-8) M strictly required both the GRAS transcription factors RAM1 and NSP1, whereas those genes either co- or specifically activated by Nod-LCOs displayed a preferential NSP1-dependency. RAM1, a central regulator of root colonization by AM fungi, controled genes activated by non-sulfated Myc-LCOs during the pre-symbiotic stage that are also up-regulated in areas with early physical contact, e.g. hyphopodia and infecting hyphae; linking responses to externally applied LCOs with early root colonization. CONCLUSIONS: Since both RAM1 and NSP1 were essential for the pre-symbiotic transcriptional reprogramming by Myc-LCOs, we propose that downstream of the CSSP, these GRAS transcription factors act synergistically in the transduction of those diffusible signals that pre-announce the presence of symbiotic fungi.
Resumo:
Knowing a cell’s transcriptome is a fundamental requisite in order to analyze its response to the environment. Microarrays have supposed a revolution on this field as they are able to yield an overview of gene expression at any environmental condition on a genome-wide scale. This technique consists in the hybridisation of a nucleic acid sample, previously marked, with a probe (which might be made up of cDNA, oligonucleotides or PCR products) anchored to a solid surface (made of glass, plastic, silicon...) giving as a result a dot grid which reveals, after image analysis, which genes are being expressed. Nevertheless, this only can be achieved if information on the species genome has been generated. Different kinds of expression microarrays exist attending to the probe’s nature and the method used in its synthesis. In this poster two of these will be treated: Spotted Microarrays, for which the probe is synthesised prior to its fixation to the array and allow the analysis of two targets simultaneously. They can be easily customized, but lack high reproducibility and sensitivity. Oligonucleotide Microarrays, which are characterized by the direct printing of the probe on the array. In this case the probes consist on, invariably, oligonucleotides that are complementary to a small fraction of the gene it is representing at the microarray. Their application is somewhat restricted. This fact, however, makes them more reproducible. Currently, the approach towards the transcriptome studies from the Next Generation Sequencing technologies offers a large volume of information in a short amount of time needing less previous information on the target organism than that needed by microarrays, but their expensive price limits their use. The versatility of the latter, together with their reduced costs in comparison to other techniques, makes them an interesting resource in applications that may need less complexity.
Integrative genomic, epigenetic and metabolomic characterization of beef from grass-fed Angus steers
Resumo:
Beef constitutes a main component of the American diet and still represent the principal source of protein in many parts of the world. Currently, the meat market is experiencing an important transformation; consumers are increasingly switching from consuming traditional beef to grass-fed beef. People recognized products obtained from grass-fed animals as more natural and healthy. However, the true variations between these two production systems regarding various aspects remain unclear. This dissertation provides information from closely genetically related animals, in order to decrease confounding factors, to explain several confused divergences between grain-fed and grass-fed beef. First, we examined the growth curve, important economic traits and quality carcass characteristics over four consecutive years in grain-fed and grass-fed animals, generating valuable information for management decisions and economic evaluation for grass-fed cattle operations. Second, we performed the first integrated transcriptomic and metabolomic analysis in grass-fed beef, detecting alterations in glucose metabolism, divergences in free fatty acids and carnitine conjugated lipid levels, and altered β-oxidation. Results suggest that grass finished beef could possibly benefit consumer health from having lower total fat content and better lipid profile than grain-fed beef. Regarding animal welfare, grass-fed animals may experience less stress than grain-fed individuals as well. Finally, we contrasted the genome-wide DNA methylation of grass-fed beef against grain-fed beef using the methyl-CpG binding domain sequencing (MBD-Seq) method, identifying 60 differentially methylated regions (DMRs). Most of DMRs were located inside or upstream of genes and displayed increased levels of methylation in grass-fed individuals, implying a global DNA methylation increment in this group. Interestingly, chromosome 14, which has been associated with large effects on ADG, marbling, back fat, ribeye area and hot carcass weight in beef cattle, allocated the largest number of DMRs (12/60). The pathway analysis identified skeletal and muscular system as the preeminent physiological system and function, and recognized carbohydrates metabolism, lipid metabolism and tissue morphology among the highest ranked networks. Therefore, although we recognize some limitations and assume that additional examination is still required, this project provides the first integrative genomic, epigenetic and metabolomics characterization of beef produced under grass-fed regimen.
Resumo:
Dissertação de Mestrado, Ciências Biomédicas, Departamento de Ciências Biomédicas e Medicina, Universidade do Algarve, 2014
Resumo:
Dissertação de Mestrado, Oncobiologia: Mecanismos Moleculares do Cancro, Departamento de Ciências Biomédicas e Medicina, Universidade do Algarve, 2015
Resumo:
Dissertação de Mestrado, Qualidade em Análises - Erasmus Mundus, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2015
Resumo:
Estimates of effective population size in the Holstein cattle breed have usually been low despite the large number of animals that constitute this breed. Effective population size is inversely related to the rates at which coancestry and inbreeding increase and these rates have been high as a consequence of intense and accurate selection. Traditionally, coancestry and inbreeding coefficients have been calculated from pedigree data. However, the development of genome-wide single nucleotide polymorphisms has increased the interest of calculating these coefficients from molecular data in order to improve their accuracy. In this study, genomic estimates of coancestry, inbreeding and effective population size were obtained in the Spanish Holstein population and then compared with pedigree-based estimates. A total of 11,135 animals genotyped with the Illumina BovineSNP50 BeadChip were available for the study. After applying filtering criteria, the final genomic dataset included 36,693 autosomal SNPs and 10,569 animals. Pedigree data from those genotyped animals included 31,203 animals. These individuals represented only the last five generations in order to homogenise the amount of pedigree information across animals. Genomic estimates of coancestry and inbreeding were obtained from identity by descent segments (coancestry) or runs of homozygosity (inbreeding). The results indicate that the percentage of variance of pedigree-based coancestry estimates explained by genomic coancestry estimates was higher than that for inbreeding. Estimates of effective population size obtained from genome-wide and pedigree information were consistent and ranged from about 66 to 79. These low values emphasize the need of controlling the rate of increase of coancestry and inbreeding in Holstein selection programmes.
Resumo:
Les maladies inflammatoires de l’intestin (MIIs, [MIM 266600]) sont caractérisées par une inflammation chronique au niveau du tube gastro-intestinal. Les deux principales formes sont la maladie de Crohn (MC) et la colite ulcéreuse (CU). Les MIIs résulteraient d’un défaut du système immunitaire et de l’épithélium intestinal. Ce dernier forme une barrière physique et biochimique qui sépare notre système immunitaire des microorganismes commensaux et pathogènes de la microflore intestinale. Un défaut dans la barrière épithéliale intestinale pourrait donc mener à une réponse immunitaire soutenue contre notre microflore intestinale. Les études d’association pangénomiques (GWAS) ont permis d’identifier 201 régions de susceptibilité aux MIIs. Parmi celles-ci, la région 1q32 associée à la MC (p<2x10-11) et à la CU (p<6x10-7) contient 4 gènes, dont C1orf106, un gène codant pour une protéine de fonction inconnue. Le re-séquençage de la région 1q32 a permis d’identifier une variante génétique rare de C1orf106 (MAF˂1%) associée aux MIIs (p=0,009), Y333F. Nous avons démontré que la substitution de la tyr333 par une phénylalanine semble avoir un effet sur la stabilité protéique de C1orf106 tel que démontré lors de l’inhibition de la synthèse protéique induite par le cycloheximide. Nous avons déterminé que C1orf106 est exprimé dans le côlon et l’intestin grêle. De plus, son expression est augmentée lors de la différenciation des cellules épithéliales Caco-2 en épithélium intestinal polarisé. Son profil d’expression correspond aux types cellulaires et tissulaires affectés dans les MIIs. De plus, C1orf106 est partiellement co-localisée avec le marqueur des jonctions serrées, ZO-1. Toutefois, son marquage reproduit parfaitement celui du marqueur des jonctions adhérentes, E-cadhérine. Les jonctions serrées et adhérentes sont localisées du côté apical de la jonction intercellulaire et sont toutes deux impliquées dans l’établissement de la barrière épithéliale. Nous avons donc testé l’impact de C1orf106 sur la perméabilité de l’épithélium intestinal. Nous avons observé une augmentation de la perméabilité épithéliale chez un épithélium intestinal formé par des cellules Caco-2 sous-exprimant C1orf106. Nos résultats suggèrent que C1orf106 pourrait être le gène causal de la région 1q32.
Resumo:
Mycobacteria of the Mycobacterium tuberculosis complex (MTBC) greatly affect humans and animals worldwide. The life cycle of mycobacteria is complex and the mechanisms resulting in pathogen infection and survival in host cells are not fully understood. Recently, comparative genomics analyses have provided new insights into the evolution and adaptation of the MTBC to survive inside the host. However, most of this information has been obtained using M. tuberculosis but not other members of the MTBC such as M. bovis and M. caprae. In this study, the genome of three M. bovis (MB1, MB3, MB4) and one M. caprae (MB2) field isolates with different lesion score, prevalence and host distribution phenotypes were sequenced. Genome sequence information was used for whole-genome and protein-targeted comparative genomics analysis with the aim of finding correlates with phenotypic variation with potential implications for tuberculosis (TB) disease risk assessment and control. At the whole-genome level the results of the first comparative genomics study of field isolates of M. bovis including M. caprae showed that as previously reported for M. tuberculosis, sequential chromosomal nucleotide substitutions were the main driver of the M. bovis genome evolution. The phylogenetic analysis provided a strong support for the M. bovis/M. caprae clade, but supported M. caprae as a separate species. The comparison of the MB1 and MB4 isolates revealed differences in genome sequence, including gene families that are important for bacterial infection and transmission, thus highlighting differences with functional implications between isolates otherwise classified with the same spoligotype. Strategic protein-targeted analysis using the ESX or type VII secretion system, proteins linking stress response with lipid metabolism, host T cell epitopes of mycobacteria, antigens and peptidoglycan assembly protein identified new genetic markers and candidate vaccine antigens that warrant further study to develop tools to evaluate risks for TB disease caused by M. bovis/M.caprae and for TB control in humans and animals.
Resumo:
Les maladies inflammatoires de l’intestin (MIIs, [MIM 266600]) sont caractérisées par une inflammation chronique au niveau du tube gastro-intestinal. Les deux principales formes sont la maladie de Crohn (MC) et la colite ulcéreuse (CU). Les MIIs résulteraient d’un défaut du système immunitaire et de l’épithélium intestinal. Ce dernier forme une barrière physique et biochimique qui sépare notre système immunitaire des microorganismes commensaux et pathogènes de la microflore intestinale. Un défaut dans la barrière épithéliale intestinale pourrait donc mener à une réponse immunitaire soutenue contre notre microflore intestinale. Les études d’association pangénomiques (GWAS) ont permis d’identifier 201 régions de susceptibilité aux MIIs. Parmi celles-ci, la région 1q32 associée à la MC (p<2x10-11) et à la CU (p<6x10-7) contient 4 gènes, dont C1orf106, un gène codant pour une protéine de fonction inconnue. Le re-séquençage de la région 1q32 a permis d’identifier une variante génétique rare de C1orf106 (MAF˂1%) associée aux MIIs (p=0,009), Y333F. Nous avons démontré que la substitution de la tyr333 par une phénylalanine semble avoir un effet sur la stabilité protéique de C1orf106 tel que démontré lors de l’inhibition de la synthèse protéique induite par le cycloheximide. Nous avons déterminé que C1orf106 est exprimé dans le côlon et l’intestin grêle. De plus, son expression est augmentée lors de la différenciation des cellules épithéliales Caco-2 en épithélium intestinal polarisé. Son profil d’expression correspond aux types cellulaires et tissulaires affectés dans les MIIs. De plus, C1orf106 est partiellement co-localisée avec le marqueur des jonctions serrées, ZO-1. Toutefois, son marquage reproduit parfaitement celui du marqueur des jonctions adhérentes, E-cadhérine. Les jonctions serrées et adhérentes sont localisées du côté apical de la jonction intercellulaire et sont toutes deux impliquées dans l’établissement de la barrière épithéliale. Nous avons donc testé l’impact de C1orf106 sur la perméabilité de l’épithélium intestinal. Nous avons observé une augmentation de la perméabilité épithéliale chez un épithélium intestinal formé par des cellules Caco-2 sous-exprimant C1orf106. Nos résultats suggèrent que C1orf106 pourrait être le gène causal de la région 1q32.