891 resultados para genome-wide association
Resumo:
The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as ``Prakriti''. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p <= 1 x 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.
Resumo:
Colorectal cancer is one of the most frequent neoplasms and an important cause of mortality in the developed world. Mendelian syndromes account for about 5% of the total burden of CRC, being Lynch syndrome and familial adenomatous polyposis the most common forms. Lynch syndrome tumors develop mainly as a consequence of defective DNA mismatch repair associated with germline mutations in MLH1, MSH2, MSH6 and PMS2. A significant proportion of variants identified by screening these genes correspond to missense or noncoding changes without a clear pathogenic consequence, and they are designated as "variants of uncertain significance'', being the c.1852_1853delinsGC (p.K618A) variant in the MLH1 gene a clear example. The implication of this variant as a low-penetrance risk variant for CRC was assessed in the present study by performing a case-control study within a large cohort from the COGENT consortium-COST Action BM1206 including 18,723 individuals (8,055 colorectal cancer cases and 10,668 controls) and a case-only genotype-phenotype correlation with several clinical and pathological characteristics restricted to the Epicolon cohort. Our results showed no involvement of this variant as a low-penetrance variant for colorectal cancer genetic susceptibility and no association with any clinical and pathological characteristics including family history for this neoplasm or Lynch syndrome.
Resumo:
We have made a complete set of painting probes for the domestic horse by degenerate oligonucleotide-primed PCR amplification of flow-sorted horse chromosomes. The horse probes, together with a full set of those available for human, were hybridized onto metaphase chromosomes of human, horse and mule. Based on the hybridization results, we have generated genome-wide comparative chromosome maps involving the domestic horse, donkey and human. These maps define the overall distribution and boundaries of evolutionarily conserved chromosomal segments in the three genomes. Our results shed further light on the karyotypic relationships among these species and, in particular, the chromosomal rearrangements that underlie hybrid sterility and the occasional fertility of mules.
Resumo:
Cytosine methylation is important for transposon silencing and epigenetic regulation of endogenous genes, although the extent to which this DNA modification functions to regulate the genome is still unknown. Here we report the first comprehensive DNA methylation map of an entire genome, at 35 base pair resolution, using the flowering plant Arabidopsis thaliana as a model. We find that pericentromeric heterochromatin, repetitive sequences, and regions producing small interfering RNAs are heavily methylated. Unexpectedly, over one-third of expressed genes contain methylation within transcribed regions, whereas only approximately 5% of genes show methylation within promoter regions. Interestingly, genes methylated in transcribed regions are highly expressed and constitutively active, whereas promoter-methylated genes show a greater degree of tissue-specific expression. Whole-genome tiling-array transcriptional profiling of DNA methyltransferase null mutants identified hundreds of genes and intergenic noncoding RNAs with altered expression levels, many of which may be epigenetically controlled by DNA methylation.
Resumo:
Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication(1,2). To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data(3). Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.
Resumo:
Using a combined computational program. we identified 50 potential microRNAs (miRNAs) in Giardia lamblia. one of the most primitive unicellular eukaryotes. These miRNAs are unique to G. lamblia and no homologues have been found in other organisms; miRNAs.
Resumo:
Background: Cytochrome P450 monooxygenases play key roles in the metabolism of a wide variety of substrates and they are closely associated with endocellular physiological processes or detoxification metabolism under environmental exposure. To date, however, none has been systematically characterized in the phylum Ciliophora. T. thermophila possess many advantages as a eukaryotic model organism and it exhibits rapid and sensitive responses to xenobiotics, making it an ideal model system to study the evolutionary and functional diversity of the P450 monooxygenase gene family. Results: A total of 44 putative functional cytochrome P450 genes were identified and could be classified into 13 families and 21 sub-families according to standard nomenclature. The characteristics of both the conserved intron-exon organization and scaffold localization of tandem repeats within each P450 family clade suggested that the enlargement of T. thermophila P450 families probably resulted from recent separate small duplication events. Gene expression patterns of all T. thermophila P450s during three important cell physiological stages (vegetative growth, starvation and conjugation) were analyzed based on EST and microarray data, and three main categories of expression patterns were postulated. Evolutionary analysis including codon usage preference, sit-especific selection and gene-expression evolution patterns were investigated and the results indicated remarkable divergences among the T. thermophila P450 genes. Conclusion: The characterization, expression and evolutionary analysis of T. thermophila P450 monooxygenase genes in the current study provides useful information for understanding the characteristics and diversities of the P450 genes in the Ciliophora, and provides the baseline for functional analyses of individual P450 isoforms in this model ciliate species.
Resumo:
Background: Serine/threonine kinases (STKs) have been found in an increasing number of prokaryotes, showing important roles in signal transduction that supplement the well known role of two-component system. Cyanobacteria are photoautotrophic prokaryotes able to grow in a wide range of ecological environments, and their signal transduction systems are important in adaptation to the environment. Sequence information from several cyanobacterial genomes offers a unique opportunity to conduct a comprehensive comparative analysis of this kinase family. In this study, we extracted information regarding Ser/Thr kinases from 21 species of sequenced cyanobacteria and investigated their diversity, conservation, domain structure, and evolution. Results: 286 putative STK homologues were identified. STKs are absent in four Prochlorococcus strains and one marine Synechococcus strain and abundant in filamentous nitrogen-fixing cyanobacteria. Motifs and invariant amino acids typical in eukaryotic STKs were conserved well in these proteins, and six more cyanobacteria- or bacteria-specific conserved residues were found. These STK proteins were classified into three major families according to their domain structures. Fourteen types and a total of 131 additional domains were identified, some of which are reported to participate in the recognition of signals or substrates. Cyanobacterial STKs show rather complicated phylogenetic relationships that correspond poorly with phylogenies based on 16S rRNA and those based on additional domains. Conclusion: The number of STK genes in different cyanobacteria is the result of the genome size, ecophysiology, and physiological properties of the organism. Similar conserved motifs and amino acids indicate that cyanobacterial STKs make use of a similar catalytic mechanism as eukaryotic STKs. Gene gain-and-loss is significant during STK evolution, along with domain shuffling and insertion. This study has established an overall framework of sequence-structure-function interactions for the STK gene family, which may facilitate further studies of the role of STKs in various organisms.
Resumo:
Background: Serine/threonine kinases (STKs) have been found in an increasing number of prokaryotes, showing important roles in signal transduction that supplement the well known role of two-component system. Cyanobacteria are photoautotrophic prokaryotes able to grow in a wide range of ecological environments, and their signal transduction systems are important in adaptation to the environment. Sequence information from several cyanobacterial genomes offers a unique opportunity to conduct a comprehensive comparative analysis of this kinase family. In this study, we extracted information regarding Ser/Thr kinases from 21 species of sequenced cyanobacteria and investigated their diversity, conservation, domain structure, and evolution. Results: 286 putative STK homologues were identified. STKs are absent in four Prochlorococcus strains and one marine Synechococcus strain and abundant in filamentous nitrogen-fixing cyanobacteria. Motifs and invariant amino acids typical in eukaryotic STKs were conserved well in these proteins, and six more cyanobacteria- or bacteria-specific conserved residues were found. These STK proteins were classified into three major families according to their domain structures. Fourteen types and a total of 131 additional domains were identified, some of which are reported to participate in the recognition of signals or substrates. Cyanobacterial STKs show rather complicated phylogenetic relationships that correspond poorly with phylogenies based on 16S rRNA and those based on additional domains. Conclusion: The number of STK genes in different cyanobacteria is the result of the genome size, ecophysiology, and physiological properties of the organism. Similar conserved motifs and amino acids indicate that cyanobacterial STKs make use of a similar catalytic mechanism as eukaryotic STKs. Gene gain-and-loss is significant during STK evolution, along with domain shuffling and insertion. This study has established an overall framework of sequence-structure-function interactions for the STK gene family, which may facilitate further studies of the role of STKs in various organisms.
Genome-wide analysis of restriction-modification system in unicellular and filamentous cyanobacteria
Resumo:
Cyanobacteria are an ancient group of gram-negative bacteria with strong genome size variation ranging from 1.6 to 9.1 Mb. Here, we first retrieved all the putative restriction-modification (RM) genes in the draft genome of Spirulina and then performed a range of comparative and bioinformatic analyses on RM genes from unicellular and filamentous cyanobacterial genomes. We have identified 6 gene clusters containing putative Type I RMs and 11 putative Type II RMs or the solitary methyltransferases (MTases). RT-PCR analysis reveals that 6 of 18 MTases are not expressed in Spirulina, whereas one hsdM gene, with a mutated cognate hsdS, was detected to be expressed. Our results indicate that the number of RM genes in filamentous cyanobacteria is significantly higher than in unicellular species, and this expansion of RM systems in filamentous cyanobacteria may be related to their wide range of ecological tolerance. Furthermore, a coevolutionary pattern is found between hsdM and hsdR, with a large number of site pairs positively or negatively correlated, indicating the functional importance of these pairing interactions between their tertiary structures. No evidence for positive selection is found for the majority of RMs, e. g., hsdM, hsdS, hsdR, and Type II restriction endonuclease gene families, while a group of MTases exhibit a remarkable signature of adaptive evolution. Sites and genes identified here to have been under positive selection would provide targets for further research on their structural and functional evaluations.
Genome-wide analysis of restriction-modification system in unicellular and filamentous cyanobacteria
Resumo:
Cyanobacteria are an ancient group of gram-negative bacteria with strong genome size variation ranging from 1.6 to 9.1 Mb. Here, we first retrieved all the putative restriction-modification (RM) genes in the draft genome of Spirulina and then performed a range of comparative and bioinformatic analyses on RM genes from unicellular and filamentous cyanobacterial genomes. We have identified 6 gene clusters containing putative Type I RMs and 11 putative Type II RMs or the solitary methyltransferases (MTases). RT-PCR analysis reveals that 6 of 18 MTases are not expressed in Spirulina, whereas one hsdM gene, with a mutated cognate hsdS, was detected to be expressed. Our results indicate that the number of RM genes in filamentous cyanobacteria is significantly higher than in unicellular species, and this expansion of RM systems in filamentous cyanobacteria may be related to their wide range of ecological tolerance. Furthermore, a coevolutionary pattern is found between hsdM and hsdR, with a large number of site pairs positively or negatively correlated, indicating the functional importance of these pairing interactions between their tertiary structures. No evidence for positive selection is found for the majority of RMs, e. g., hsdM, hsdS, hsdR, and Type II restriction endonuclease gene families, while a group of MTases exhibit a remarkable signature of adaptive evolution. Sites and genes identified here to have been under positive selection would provide targets for further research on their structural and functional evaluations.
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
The TET enzymes convert methylcytosine to the newly discovered base hydroxymethylcytosine. While recent reports suggest that TETs may play a role in response to oxidative stress, this role remains uncertain, and results lack in vivo models. Here we show a global decrease of hydroxymethylcytosine in cells treated with buthionine sulfoximine, and in mice depleted for the major antioxidant enzymes GPx1 and 2. Furthermore, genome-wide profiling revealed differentially hydroxymethylated regions in coding genes, and intriguingly in microRNA genes, both involved in response to oxidative stress. These results thus suggest a profound effect of in vivo oxidative stress on the global hydroxymethylome.