964 resultados para comparative genomics
Adaptive Mechanisms of an Estuarine Synechococcus based on Genomics, Transcriptomics, and Proteomics
Resumo:
Picocyanobacteria are important phytoplankton and primary producers in the ocean. Although extensive work has been conducted for picocyanobacteria (i.e. Synechococcus and Prochlorococcus) in coastal and oceanic waters, little is known about those found in estuaries like the Chesapeake Bay. Synechococcus CB0101, an estuarine isolate, is more tolerant to shifts in temperature, salinity, and metal toxicity than coastal and oceanic Synechococcus strains, WH7803 and WH7805. Further, CB0101 has a greater sensitivity to high light intensity, likely due to its adaptation to low light environments. A complete and annotated genome sequence of CB0101 was completed to explore its genetic capacity and to serve as a basis for further molecular analysis. Comparative genomics between CB0101, WH7803, and WH7805 show that CB0101 contains more genes involved in regulation, sensing, and stress response. At the transcript and protein level, CB0101 regulates its metabolic pathways, transport systems, and sensing mechanisms when nitrate and phosphate are limited. Zinc toxicity led to oxidative stress and a global down regulation of photosystems and the translation machinery. From the stress response studies seven chromosomal toxin-antitoxin (TA) genes, were identified in CB0101, which led to the discovery of TA genes in several marine Synechococcus strains. The activation of the relB2/relE1 TA system allows CB0101 to arrest its growth under stressful conditions, but the growth arrest is reversible, once the stressful environment dissipates. The genome of CB0101 contains a relatively large number of genomic island (GI) genes compared to known marine Synechococcus genomes. Interestingly, a massive shutdown (255 out of 343) of GI genes occurred after CB0101 was infected by a lytic phage. On the other hand, phage-encoded host-like proteins (hli, psbA, ThyX) were highly expressed upon phage infection. This research provides new evidence that estuarine Synechococcus like CB0101 have inherited unique genetic machinery, which allows them to be versatile in the estuarine environment.
Resumo:
The last few years have seen dramatic advances in genomics, including the discovery of a large number of non-coding and antisense transcripts. This has revolutionised our understanding of multifaceted transcript structures found within gene loci and their roles in the regulation of development, neurogenesis and other complex processes. The recent and continuing surge of knowledge has prompted researchers to reassess and further dissect gene loci. The ghrelin gene (GHRL) gives rise to preproghrelin, which in turn produces ghrelin, a 28 amino acid peptide hormone that acts via the ghrelin receptor (growth hormone secretagogue receptor/GHSR 1a). Ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, and cancer development. A truncated receptor splice variant, GHSR 1b, does not bind ghrelin, but dimerises with GHSR 1a, and may act as a dominant negative receptor. The gene products of ghrelin and its receptor are frequently overexpressed in human cancer While it is well known that the ghrelin axis (ghrelin and its receptor) plays a range of important functional roles, little is known about the molecular structure and regulation of the ghrelin gene (GHRL) and ghrelin receptor gene (GHSR). This thesis reports the re-annotation of the ghrelin gene, discovery of alternative 5’ exons and transcription start sites, as well as the description of a number of novel splice variants, including isoforms with a putative signal peptide. We also describe the discovery and characterisation of a ghrelin antisense gene (GHRLOS), and the discovery and expression of a ghrelin receptor (growth hormone secretagogue receptor/GHSR) antisense gene (GHSR-OS). We have identified numerous ghrelin-derived transcripts, including variants with extended 5' untranslated regions and putative secreted obestatin and C-ghrelin transcripts. These transcripts initiate from novel first exons, exon -1, exon 0 and a 5' extended 1, with multiple transcription start sites. We used comparative genomics to identify, and RT-PCR to experimentally verify, that the proximal exon 0 and 5' extended exon 1 are transcribed in the mouse ghrelin gene, which suggests the mouse and human proximal first exon architecture is conserved. We have identified numerous novel antisense transcripts in the ghrelin locus. A candidate non-coding endogenous natural antisense gene (GHRLOS) was cloned and demonstrates very low expression levels in the stomach and high levels in the thymus, testis and brain - all major tissues of non-coding RNA expression. Next, we examined if transcription occurs in the antisense orientation to the ghrelin receptor gene, GHSR. A novel gene (GHSR-OS) on the opposite strand of intron 1 of the GHSR gene was identified and characterised using strand-specific RT-PCR and rapid amplification of cDNA ends (RACE). GHSR-OS is differentially expressed and a candidate non-coding RNA gene. In summary, this study has characterised the ghrelin and ghrelin receptor loci and demonstrated natural antisense transcripts to ghrelin and its receptor. Our preliminary work shows that the ghrelin axis generates a broad and complex transcriptional repertoire. This study provides the basis for detailed functional studies of the the ghrelin and GHSR loci and future studies will be needed to further unravel the function, diagnostic and therapeutic potential of the ghrelin axis.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
BLAST Atlas is a visual analysis system for comparative genomics that supports genome-wide gene characterisation, functional assignment and function-based browsing of one or more chromosomes. Inspired by applications such as the WorldWide Telescope, Bing Maps 3D and Google Earth, BLAST Atlas uses novel three-dimensional gene and function views that provide a highly interactive and intuitive way for scientists to navigate, query and compare gene annotations. The system can be used for gene identification and functional assignment or as a function-based multiple genome comparison tool which complements existing position based comparison and alignment viewers.
Resumo:
Following the completion of the draft Human Genome in 2001, genomic sequence data is becoming available at an accelerating rate, fueled by advances in sequencing and computational technology. Meanwhile, large collections of astronomical and geospatial data have allowed the creation of virtual observatories, accessible throughout the world and requiring only commodity hardware. Through a combination of advances in data management, data mining and visualization, this infrastructure enables the development of new scientific and educational applications as diverse as galaxy classification and real-time tracking of earthquakes and volcanic plumes. In the present paper, we describe steps taken along a similar path towards a virtual observatory for genomes – an immersive three-dimensional visual navigation and query system for comparative genomic data.
Resumo:
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
Resumo:
Bats account for one-fifth of mammalian species, are the only mammals with powered flight, and are among the few animals that echolocate. The insect-eating Brandt’s bat (Myotis brandtii) is the longest-lived bat species known to date (lifespan exceeds 40 years) and, at 4–8 g adult body weight, is the most extreme mammal with regard to disparity between body mass and longevity. Here we report sequencing and analysis of the Brandt’s bat genome and transcriptome, which suggest adaptations consistent with echolocation and hibernation, as well as altered metabolism, reproduction and visual function. Unique sequence changes in growth hormone and insulin-like growth factor 1 receptors are also observed. The data suggest that an altered growth hormone/insulin-like growth factor 1 axis, which may be common to other long-lived bat species, together with adaptations such as hibernation and low reproductive rate, contribute to the exceptional lifespan of the Brandt’s bat.
Resumo:
Background: Tuberculosis still remains one of the largest killer infectious diseases, warranting the identification of newer targets and drugs. Identification and validation of appropriate targets for designing drugs are critical steps in drug discovery, which are at present major bottle-necks. A majority of drugs in current clinical use for many diseases have been designed without the knowledge of the targets, perhaps because standard methodologies to identify such targets in a high-throughput fashion do not really exist. With different kinds of 'omics' data that are now available, computational approaches can be powerful means of obtaining short-lists of possible targets for further experimental validation. Results: We report a comprehensive in silico target identification pipeline, targetTB, for Mycobacterium tuberculosis. The pipeline incorporates a network analysis of the protein-protein interactome, a flux balance analysis of the reactome, experimentally derived phenotype essentiality data, sequence analyses and a structural assessment of targetability, using novel algorithms recently developed by us. Using flux balance analysis and network analysis, proteins critical for survival of M. tuberculosis are first identified, followed by comparative genomics with the host, finally incorporating a novel structural analysis of the binding sites to assess the feasibility of a protein as a target. Further analyses include correlation with expression data and non-similarity to gut flora proteins as well as 'anti-targets' in the host, leading to the identification of 451 high-confidence targets. Through phylogenetic profiling against 228 pathogen genomes, shortlisted targets have been further explored to identify broad-spectrum antibiotic targets, while also identifying those specific to tuberculosis. Targets that address mycobacterial persistence and drug resistance mechanisms are also analysed. Conclusion: The pipeline developed provides rational schema for drug target identification that are likely to have high rates of success, which is expected to save enormous amounts of money, resources and time in the drug discovery process. A thorough comparison with previously suggested targets in the literature demonstrates the usefulness of the integrated approach used in our study, highlighting the importance of systems-level analyses in particular. The method has the potential to be used as a general strategy for target identification and validation and hence significantly impact most drug discovery programmes.
Resumo:
Using an established genetic map, a single gene conditioning covered smut resistance, Ruh.7H, was mapped to the telomere region of chromosome 7HS in an Alexis/Sloop doubled haploid barley population. The closest marker to Ruh.7H, abg704 was 7.5 cM away. Thirteen loci on the distal end of 7HS with potential to contain single nucleotide polymorphisms (SNPs) were identified by applying a comparative genomics approach using rice sequence data. Of these, one locus produced polymorphic co-dominant bands of different size while two further loci contained SNPs that were identified using the recently developed high resolution melting (HRM) technique. Two of these markers flanked Ruh.7H with the proximal marker located 3.8 cM and the distal marker 2.7 cM away. This is the first report on the application of the HRM technique to SNP detection and to rapid scoring of known cleaved amplified polymorphic sequence (CAPS) markers in plants. This simple, precise post-PCR technique should find widespread use in the fine-mapping of genetic regions of interest in complex cereal and other plant genomes.
Resumo:
The goal of this research is to understand the function of allelic variation of genes underpinning the stay-green drought adaptation trait in sorghum in order to enhance yield in water-limited environments. Stay-green, a delayed leaf senescence phenotype in sorghum, is primarily an emergent consequence of the improved balance between the supply and demand of water. Positional and functional fine-mapping of candidate genes associated with stay-green in sorghum is the focus of an international research partnership between Australian (UQ/DAFFQ) and US (Texas A&M University) scientists. Stay-green was initially mapped to four chromosomal regions (Stg1, Stg2, Stg3, and Stg4) by a number of research groups in the US and Australia. Physiological dissection of near-isolines containing single introgressions of Stg QTL (Stg1-4) indicate that these QTL reduce water demand before flowering by constricting the size of the canopy, thereby increasing water availability during grain filling and, ultimately, grain yield. Stg and root angle QTL are also co-located and, together with crop water use data, suggest the role of roots in the stay-green phenomenon. Candidate genes have been identified in Stg1-4, including genes from the PIN family of auxin efflux carriers in Stg1 and Stg2, with 10 of 11 PIN genes in sorghum co-locating with Stg QTL. Modified gene expression in some of these PIN candidates in the stay-green compared with the senescent types has been found in preliminary RNA expression profiling studies. Further proof-of-function studies are underway, including comparative genomics, SNP analysis to assess diversity at candidate genes, reverse genetics and transformation.
Resumo:
Takifugu rubripes is teleost fish widely used in comparative genomics to understand the human system better due to its similarities both in number of genes and structure of genes. In this work we survey the fugu genome, and, using sensitive computational approaches, we identify the repertoire of putative protein kinases and classify them into groups and subfamilies. The fugu genome encodes 519 protein kinase-like sequences and this number of putative protein kinases is comparable closely to that of human. However, in spite of its similarities to human kinases at the group level, there are differences at the subfamily level as noted in the case of KIS and DYRK subfamilies which contribute to differences which are specific to the adaptation of the organism. Also, certain unique domain combination of galectin domain and YkA domain suggests alternate mechanisms for immune response and binding to lipoproteins. Lastly, an overall similarity with the MAPK pathway of humans suggests its importance to understand signaling mechanisms in humans. Overall the fugu serves as a good model organism to understand roles of human kinases as far as kinases such as LRRK and IRAK and their associated pathways are concerned.