931 resultados para wide genome sequencing
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
The use of microsatellite markers in large-scale genetic studies is limited by its low throughput and high cost and labor requirements. Here, we provide a panel of 45 multiplex PCRs for fast and cost-efficient genome-wide fluorescence-based microsatellite analysis in grapevine. The developed multiplex PCRs panel (with up to 15-plex) enables the scoring of 270 loci covering all the grapevine genome (9 to 20 loci/chromosome) using only 45 PCRs and sequencer runs. The 45 multiplex PCRs were validated using a diverse grapevine collection of 207 accessions, selected to represent most of the cultivated Vitis vinifera genetic diversity. Particular attention was paid to quality control throughout the whole process (assay replication, null allele detection, ease of scoring). Genetic diversity summary statistics and features of electrophoretic profiles for each studied marker are provided, as are the genotypes of 25 common cultivars that could be used as references in other studies.
Resumo:
The pufferfish Fugu rubripes has a genome ≈7.5 times smaller than that of mammals but with a similar number of genes. Although conserved synteny has been demonstrated between pufferfish and mammals across some regions of the genome, there is some controversy as to what extent Fugu will be a useful model for the human genome, e.g., [Gilley, J., Armes, N. & Fried, M. (1997) Nature (London) 385, 305–306]. We report extensive conservation of synteny between a 1.5-Mb region of human chromosome 11 and <100 kb of the Fugu genome in three overlapping cosmids. Our findings support the idea that the majority of DNA in the region of human chromosome 11p13 is intergenic. Comparative analysis of three unrelated genes with quite different roles, WT1, RCN1, and PAX6, has revealed differences in their structural evolution. Whereas the human WT1 gene can generate 16 protein isoforms via a combination of alternative splicing, RNA editing, and alternative start site usage, our data predict that Fugu WT1 is capable of generating only two isoforms. This raises the question of the extent to which the evolution of WT1 isoforms is related to the evolution of the mammalian genitourinary system. In addition, this region of the Fugu genome shows a much greater overall compaction than usual but with significant noncoding homology observed at the PAX6 locus, implying that comparative genomics has identified regulatory elements associated with this gene.
Resumo:
ETS transcription factors play important roles in hematopoiesis, angiogenesis, and organogenesis during murine development. The ETS genes also have a role in neoplasia, for example in Ewing’s sarcomas and retrovirally induced cancers. The ETS genes encode transcription factors that bind to specific DNA sequences and activate transcription of various cellular and viral genes. To isolate novel ETS target genes, we used two approaches. In the first approach, we isolated genes by the RNA differential display technique. Previously, we have shown that the overexpression of ETS1 and ETS2 genes effects transformation of NIH 3T3 cells and specific transformants produce high levels of the ETS proteins. To isolate ETS1 and ETS2 responsive genes in these transformed cells, we prepared RNA from ETS1, ETS2 transformants, and normal NIH 3T3 cell lines and converted it into cDNA. This cDNA was amplified by PCR and displayed on sequencing gels. The differentially displayed bands were subcloned into plasmid vectors. By Northern blot analysis, several clones showed differential patterns of mRNA expression in the NIH 3T3-, ETS1-, and ETS2-expressing cell lines. Sixteen clones were analyzed by DNA sequence analysis, and 13 of them appeared to be unique because their DNA sequences did not match with any of the known genes present in the gene bank. Three known genes were found to be identical to the CArG box binding factor, phospholipase A2-activating protein, and early growth response 1 (Egr1) genes. In the second approach, to isolate ETS target promoters directly, we performed ETS1 binding with MboI-cleaved genomic DNA in the presence of a specific mAb followed by whole genome PCR. The immune complex-bound ETS binding sites containing DNA fragments were amplified and subcloned into pBluescript and subjected to DNA sequence and computer analysis. We found that, of a large number of clones isolated, 43 represented unique sequences not previously identified. Three clones turned out to contain regulatory sequences derived from human serglycin, preproapolipoprotein C II, and Egr1 genes. The ETS binding sites derived from these three regulatory sequences showed specific binding with recombinant ETS proteins. Of interest, Egr1 was identified by both of these techniques, suggesting strongly that it is indeed an ETS target gene.
Resumo:
A loxP-transposon retrofitting strategy for generating large nested deletions from one end of the insert DNA in bacterial artificial chromosomes and P1 artificial chromosomes was described recently [Chatterjee, P. K. & Coren, J. S. (1997) Nucleic Acids Res. 25, 2205–2212]. In this report, we combine this procedure with direct sequencing of nested-deletion templates by using primers located in the transposon end to illustrate its value for position-specific single-nucleotide polymorphism (SNP) discovery from chosen regions of large insert clones. A simple ampicillin sensitivity screen was developed to facilitate identification and recovery of deletion clones free of transduced transposon plasmid. This directed approach requires minimal DNA sequencing, and no in vitro subclone library generation; positionally oriented SNPs are a consequence of the method. The procedure is used to discover new SNPs as well as physically map those identified from random subcloned libraries or sequence databases. The deletion templates, positioned SNPs, and markers are also used to orient large insert clones into a contig. The deletion clone can serve as a ready resource for future functional genomic studies because each carries a mammalian cell-specific antibiotic resistance gene from the transposon. Furthermore, the technique should be especially applicable to the analysis of genomes for which a full genome sequence or radiation hybrid cell lines are unavailable.
Resumo:
Multiple-complete-digest mapping is a DNA mapping technique based on complete-restriction-digest fingerprints of a set of clones that provides highly redundant coverage of the mapping target. The maps assembled from these fingerprints order both the clones and the restriction fragments. Maps are coordinated across three enzymes in the examples presented. Starting with yeast artificial chromosome contigs from the 7q31.3 and 7p14 regions of the human genome, we have produced cosmid-based maps spanning more than one million base pairs. Each yeast artificial chromosome is first subcloned into cosmids at a redundancy of ×15–30. Complete-digest fragments are electrophoresed on agarose gels, poststained, and imaged on a fluorescent scanner. Aberrant clones that are not representative of the underlying genome are rejected in the map construction process. Almost every restriction fragment is ordered, allowing selection of minimal tiling paths with clone-to-clone overlaps of only a few thousand base pairs. These maps demonstrate the practicality of applying the experimental and software-based steps in multiple-complete-digest mapping to a target of significant size and complexity. We present evidence that the maps are sufficiently accurate to validate both the clones selected for sequencing and the sequence assemblies obtained once these clones have been sequenced by a “shotgun” method.
Resumo:
Nonsyndromic clefting of the lip and palate in humans has a highly complex etiology, with both multiple genetic loci and exposure to teratogens influencing susceptibility. Previous studies using mouse models have examined only very small portions of the genome. Here we report the findings of a genome-wide search for susceptibility genes for teratogen-induced clefting in the AXB and BXA set of recombinant inbred mouse strains. We compare results obtained using phenytoin (which induces cleft lip) and 6-aminonicotinamide (which induces cleft palate). We use a new statistical approach based on logistic regression suitable for these categorical data to identify several chromosomal regions as possible locations of clefting susceptibility loci, and we review candidate genes located within each region. Because cleft lip and cleft palate do not frequently co-aggregate in human families and because these structures arise semi-independently during development, these disorders are usually considered to be distinct in etiology. Our data, however, implicate several of the same chromosomal regions for both forms of clefting when teratogen-induced. Furthermore, different parental strain alleles are usually associated with clefting of the lip versus that of the palate (i.e., allelic heterogeneity). Because several other chromosomal regions are associated with only one form of clefting, locus heterogeneity also appears to be involved. Our findings in this mouse model suggest several priority areas for evaluation in human epidemiological studies.
Resumo:
The function of many of the uncharacterized open reading frames discovered by genomic sequencing can be determined at the level of expressed gene products, the proteome. However, identifying the cognate gene from minute amounts of protein has been one of the major problems in molecular biology. Using yeast as an example, we demonstrate here that mass spectrometric protein identification is a general solution to this problem given a completely sequenced genome. As a first screen, our strategy uses automated laser desorption ionization mass spectrometry of the peptide mixtures produced by in-gel tryptic digestion of a protein. Up to 90% of proteins are identified by searching sequence data bases by lists of peptide masses obtained with high accuracy. The remaining proteins are identified by partially sequencing several peptides of the unseparated mixture by nanoelectrospray tandem mass spectrometry followed by data base searching with multiple peptide sequence tags. In blind trials, the method led to unambiguous identification in all cases. In the largest individual protein identification project to date, a total of 150 gel spots—many of them at subpicomole amounts—were successfully analyzed, greatly enlarging a yeast two-dimensional gel data base. More than 32 proteins were novel and matched to previously uncharacterized open reading frames in the yeast genome. This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.
Resumo:
Hepatitis δ virus (HDV) replicates its circular RNA genome via a rolling circle mechanism. During this process, cis-acting ribozymes cleave adjacent upstream sequences and thereby resolve replication intermediates to unit-length RNA. The subsequent ligation of these 5′OH and 2′,3′-cyclic phosphate termini to form circular RNA is an essential step in the life cycle of the virus. Here we present evidence for the involvement of a host activity in the ligation of HDV RNA. We used both HDV and hammerhead ribozymes to generate a panel of HDV and non-HDV RNA substrates that bear 5′ hydroxyl and 2′,3′- cyclic phosphate termini. We found that ligation of these substrates occurred in host cells, but not in vitro or in Escherichia coli. The host-specific ligation activity was capable of joining RNA in both bimolecular and intramolecular reactions and functioned in a sequence-independent manner. We conclude that mammalian cells contain a default pathway that efficiently circularizes ribozyme processed RNAs. This pathway could be exploited in the delivery of stable antisense and decoy RNA to the nucleus.
Resumo:
The Mouse Genome Database (MGD) is the community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology and disease (http://www.informatics.jax.org). MGD provides standard nomenclature and consensus map positions for mouse genes and genetic markers; it provides a curated set of mammalian homology records, user-defined chromosomal maps, experimental data sets and the definitive mouse ‘gene to sequence’ reference set for the research community. The integration and standardization of these data sets facilitates the transition between mouse DNA sequence, gene and phenotype annotations. A recent focus on allele and phenotype representations enhances the ability of MGD to organize and present data for exploring the relationship between genotype and phenotype. This link between the genome and the biology of the mouse is especially important as phenotype information grows from large mutagenesis projects and genotype information grows from large-scale sequencing projects.
Resumo:
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford.edu/Saccharomyces/.
Resumo:
The ARKdb genome databases provide comprehensive public repositories for genome mapping data from farmed species and other animals (http://www.thearkdb.org) providing a resource similar in function to that offered by GDB or MGD for human or mouse genome mapping data, respectively. Because we have attempted to build a generic mapping database, the system has wide utility, particularly for those species for which development of a specific resource would be prohibitive. The ARKdb genome database model has been implemented for 10 species to date. These are pig, chicken, sheep, cattle, horse, deer, tilapia, cat, turkey and salmon. Access to the ARKdb databases is effected via the World Wide Web using the ARKdb browser and Anubis map viewer. The information stored includes details of loci, maps, experimental methods and the source references. Links to other information sources such as PubMed and EMBL/GenBank are provided. Responsibility for data entry and curation is shared amongst scientists active in genome research in the species of interest. Mirror sites in the United States are maintained in addition to the central genome server at Roslin.
Resumo:
GOBASE (http://megasun.bch.umontreal.ca/gobase/) is a network-accessible biological database, which is unique in bringing together diverse biological data on organelles with taxonomically broad coverage, and in furnishing data that have been exhaustively verified and completed by experts. So far, we have focused on mitochondrial data: GOBASE contains all published nucleotide and protein sequences encoded by mitochondrial genomes, selected RNA secondary structures of mitochondria-encoded molecules, genetic maps of completely sequenced genomes, taxonomic information for all species whose sequences are present in the database and organismal descriptions of key protistan eukaryotes. All of these data have been integrated and organized in a formal database structure to allow sophisticated biological queries using terms that are inherent in biological concepts. Most importantly, data have been validated, completed, corrected and standardized, a prerequisite of meaningful analysis. In addition, where critical data are lacking, such as genetic maps and RNA secondary structures, they are generated by the GOBASE team and collaborators, and added to the database. The database is implemented in a relational database management system, but features an object-oriented view of the biological data through a Web/Genera-generated World Wide Web interface. Finally, we have developed software for database curation (i.e. data updates, validation and correction), which will be described in some detail in this paper.
Resumo:
The Medicago Genome Initiative (MGI) is a database of EST sequences of the model legume Medicago truncatula. The database is available to the public and has resulted from a collaborative research effort between the Samuel Roberts Noble Foundation and the National Center for Genome Resources to investigate the genome of M.truncatula. MGI is part of the greater integrated Medicago functional genomics program at the Noble Foundation (http://www.noble .org), which is taking a global approach in studying the genetic and biochemical events associated with the growth, development and environmental interactions of this model legume. Our approach will include: large-scale EST sequencing, gene expression profiling, the generation of M.truncatula activation-tagged and promoter trap insertion mutants, high-throughput metabolic profiling, and proteome studies. These multidisciplinary information pools will be interfaced with one another to provide scientists with an integrated, holistic set of tools to address fundamental questions pertaining to legume biology. The public interface to the MGI database can be accessed at http://www.ncgr.org/research/mgi.
Resumo:
We used genome-wide expression analysis to explore how gene expression in Saccharomyces cerevisiae is remodeled in response to various changes in extracellular environment, including changes in temperature, oxidation, nutrients, pH, and osmolarity. The results demonstrate that more than half of the genome is involved in various responses to environmental change and identify the global set of genes induced and repressed by each condition. These data implicate a substantial number of previously uncharacterized genes in these responses and reveal a signature common to environmental responses that involves ∼10% of yeast genes. The results of expression analysis with MSN2/MSN4 mutants support the model that the Msn2/Msn4 activators induce the common response to environmental change. These results provide a global description of the transcriptional response to environmental change and extend our understanding of the role of activators in effecting this response.