974 resultados para comparative genomics


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Comparative genomics offers unparalleled opportunities to integrate historically distinct disciplines, to link disparate biological kingdoms, and to bridge basic and applied science. Cross-species, cross-genera, and cross-kingdom comparisons are proving key to understanding how genes are structured, how gene structure relates to gene function, and how changes in DNA have given rise to the biological diversity on the planet. The application of genomics to the study of crop species offers special opportunities for innovative approaches for combining sequence information with the vast reservoirs of historical information associated with crops and their evolution. The grasses provide a particularly well developed system for the development of tools to facilitate comparative genetic interpretation among members of a diverse and evolutionarily successful family. Rice provides advantages for genomic sequencing because of its small genome and its diploid nature, whereas each of the other grasses provides complementary genetic information that will help extract meaning from the sequence data. Because of the importance of the cereals to the human food chain, developments in this area can lead directly to opportunities for improving the health and productivity of our food systems and for promoting the sustainable use of natural resources.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Improvements in genomic technology, both in the increased speed and reduced cost of sequencing, have expanded the appreciation of the abundance of human genetic variation. However the sheer amount of variation, as well as the varying type and genomic content of variation, poses a challenge in understanding the clinical consequence of a single mutation. This work uses several methodologies to interpret the observed variation in the human genome, and presents novel strategies for the prediction of allele pathogenicity.

Using the zebrafish model system as an in vivo assay of allele function, we identified a novel driver of Bardet-Biedl Syndrome (BBS) in CEP76. A combination of targeted sequencing of 785 cilia-associated genes in a cohort of BBS patients and subsequent in vivo functional assays recapitulating the human phenotype gave strong evidence for the role of CEP76 mutations in the pathology of an affected family. This portion of the work demonstrated the necessity of functional testing in validating disease-associated mutations, and added to the catalogue of known BBS disease genes.

Further study into the role of copy-number variations (CNVs) in a cohort of BBS patients showed the significant contribution of CNVs to disease pathology. Using high-density array comparative genomic hybridization (aCGH) we were able to identify pathogenic CNVs as small as several hundred bp. Dissection of constituent gene and in vivo experiments investigating epistatic interactions between affected genes allowed for an appreciation of several paradigms by which CNVs can contribute to disease. This study revealed that the contribution of CNVs to disease in BBS patients is much higher than previously expected, and demonstrated the necessity of consideration of CNV contribution in future (and retrospective) investigations of human genetic disease.

Finally, we used a combination of comparative genomics and in vivo complementation assays to identify second-site compensatory modification of pathogenic alleles. These pathogenic alleles, which are found compensated in other species (termed compensated pathogenic deviations [CPDs]), represent a significant fraction (from 3 – 10%) of human disease-associated alleles. In silico pathogenicity prediction algorithms, a valuable method of allele prioritization, often misrepresent these alleles as benign, leading to omission of possibly informative variants in studies of human genetic disease. We created a mathematical model that was able to predict CPDs and putative compensatory sites, and functionally showed in vivo that second-site mutation can mitigate the pathogenicity of disease alleles. Additionally, we made publically available an in silico module for the prediction of CPDs and modifier sites.

These studies have advanced the ability to interpret the pathogenicity of multiple types of human variation, as well as made available tools for others to do so as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The rumen is home to a diverse population of microorganisms encompassing all three domains of life: Bacteria, Archaea, and Eukarya. Viruses have also been documented to be present in large numbers; however, little is currently known about their role in the dynamics of the rumen ecosystem. This research aimed to use a comparative genomics approach in order to assess the potential evolutionary mechanisms at work in the rumen environment. We proposed to do this by first assessing the diversity and potential for horizontal gene transfer (HGT) of multiple strains of the cellulolytic rumen bacterium, Ruminococcus flavefaciens, and then by conducting a survey of rumen viral metagenome (virome) and subsequent comparison of the virome and microbiome sequences to ascertain if there was genetic information shared between these populations. We hypothesize that the bacteriophages play an integral role in the community dynamics of the rumen, as well as driving the evolution of the rumen microbiome through HGT. In our analysis of the Ruminococcus flavefaciens genomes, there were several mobile elements and clustered regularly interspaced short palindromic repeat (CRISPR) sequences detected, both of which indicate interactions with bacteriophages. The rumen virome sequences revealed a great deal of diversity in the viral populations. Additionally, the microbial and viral populations appeared to be closely associated; the dominant viral types were those that infect the dominant microbial phyla. The correlation between the distribution of taxa in the microbiome and virome sequences as well as the presence of CRISPR loci in the R. flavefaciens genomes, suggested that there is a “kill-the-winner” community dynamic between the viral and microbial populations in the rumen. Additionally, upon comparison of the rumen microbiome and rumen virome sequences, we found that there are many sequence similarities between these populations indicating a potential for phage-mediated HGT. These results suggest that the phages represent a gene pool in the rumen that could potentially contain genes that are important for adaptation and survival in the rumen environment, as well as serving as a molecular ‘fingerprint’ of the rumen ecosystem.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Picocyanobacteria are important phytoplankton and primary producers in the ocean. Although extensive work has been conducted for picocyanobacteria (i.e. Synechococcus and Prochlorococcus) in coastal and oceanic waters, little is known about those found in estuaries like the Chesapeake Bay. Synechococcus CB0101, an estuarine isolate, is more tolerant to shifts in temperature, salinity, and metal toxicity than coastal and oceanic Synechococcus strains, WH7803 and WH7805. Further, CB0101 has a greater sensitivity to high light intensity, likely due to its adaptation to low light environments. A complete and annotated genome sequence of CB0101 was completed to explore its genetic capacity and to serve as a basis for further molecular analysis. Comparative genomics between CB0101, WH7803, and WH7805 show that CB0101 contains more genes involved in regulation, sensing, and stress response. At the transcript and protein level, CB0101 regulates its metabolic pathways, transport systems, and sensing mechanisms when nitrate and phosphate are limited. Zinc toxicity led to oxidative stress and a global down regulation of photosystems and the translation machinery. From the stress response studies seven chromosomal toxin-antitoxin (TA) genes, were identified in CB0101, which led to the discovery of TA genes in several marine Synechococcus strains. The activation of the relB2/relE1 TA system allows CB0101 to arrest its growth under stressful conditions, but the growth arrest is reversible, once the stressful environment dissipates. The genome of CB0101 contains a relatively large number of genomic island (GI) genes compared to known marine Synechococcus genomes. Interestingly, a massive shutdown (255 out of 343) of GI genes occurred after CB0101 was infected by a lytic phage. On the other hand, phage-encoded host-like proteins (hli, psbA, ThyX) were highly expressed upon phage infection. This research provides new evidence that estuarine Synechococcus like CB0101 have inherited unique genetic machinery, which allows them to be versatile in the estuarine environment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The last few years have seen dramatic advances in genomics, including the discovery of a large number of non-coding and antisense transcripts. This has revolutionised our understanding of multifaceted transcript structures found within gene loci and their roles in the regulation of development, neurogenesis and other complex processes. The recent and continuing surge of knowledge has prompted researchers to reassess and further dissect gene loci. The ghrelin gene (GHRL) gives rise to preproghrelin, which in turn produces ghrelin, a 28 amino acid peptide hormone that acts via the ghrelin receptor (growth hormone secretagogue receptor/GHSR 1a). Ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, and cancer development. A truncated receptor splice variant, GHSR 1b, does not bind ghrelin, but dimerises with GHSR 1a, and may act as a dominant negative receptor. The gene products of ghrelin and its receptor are frequently overexpressed in human cancer While it is well known that the ghrelin axis (ghrelin and its receptor) plays a range of important functional roles, little is known about the molecular structure and regulation of the ghrelin gene (GHRL) and ghrelin receptor gene (GHSR). This thesis reports the re-annotation of the ghrelin gene, discovery of alternative 5’ exons and transcription start sites, as well as the description of a number of novel splice variants, including isoforms with a putative signal peptide. We also describe the discovery and characterisation of a ghrelin antisense gene (GHRLOS), and the discovery and expression of a ghrelin receptor (growth hormone secretagogue receptor/GHSR) antisense gene (GHSR-OS). We have identified numerous ghrelin-derived transcripts, including variants with extended 5' untranslated regions and putative secreted obestatin and C-ghrelin transcripts. These transcripts initiate from novel first exons, exon -1, exon 0 and a 5' extended 1, with multiple transcription start sites. We used comparative genomics to identify, and RT-PCR to experimentally verify, that the proximal exon 0 and 5' extended exon 1 are transcribed in the mouse ghrelin gene, which suggests the mouse and human proximal first exon architecture is conserved. We have identified numerous novel antisense transcripts in the ghrelin locus. A candidate non-coding endogenous natural antisense gene (GHRLOS) was cloned and demonstrates very low expression levels in the stomach and high levels in the thymus, testis and brain - all major tissues of non-coding RNA expression. Next, we examined if transcription occurs in the antisense orientation to the ghrelin receptor gene, GHSR. A novel gene (GHSR-OS) on the opposite strand of intron 1 of the GHSR gene was identified and characterised using strand-specific RT-PCR and rapid amplification of cDNA ends (RACE). GHSR-OS is differentially expressed and a candidate non-coding RNA gene. In summary, this study has characterised the ghrelin and ghrelin receptor loci and demonstrated natural antisense transcripts to ghrelin and its receptor. Our preliminary work shows that the ghrelin axis generates a broad and complex transcriptional repertoire. This study provides the basis for detailed functional studies of the the ghrelin and GHSR loci and future studies will be needed to further unravel the function, diagnostic and therapeutic potential of the ghrelin axis.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BLAST Atlas is a visual analysis system for comparative genomics that supports genome-wide gene characterisation, functional assignment and function-based browsing of one or more chromosomes. Inspired by applications such as the WorldWide Telescope, Bing Maps 3D and Google Earth, BLAST Atlas uses novel three-dimensional gene and function views that provide a highly interactive and intuitive way for scientists to navigate, query and compare gene annotations. The system can be used for gene identification and functional assignment or as a function-based multiple genome comparison tool which complements existing position based comparison and alignment viewers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Following the completion of the draft Human Genome in 2001, genomic sequence data is becoming available at an accelerating rate, fueled by advances in sequencing and computational technology. Meanwhile, large collections of astronomical and geospatial data have allowed the creation of virtual observatories, accessible throughout the world and requiring only commodity hardware. Through a combination of advances in data management, data mining and visualization, this infrastructure enables the development of new scientific and educational applications as diverse as galaxy classification and real-time tracking of earthquakes and volcanic plumes. In the present paper, we describe steps taken along a similar path towards a virtual observatory for genomes – an immersive three-dimensional visual navigation and query system for comparative genomic data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bats account for one-fifth of mammalian species, are the only mammals with powered flight, and are among the few animals that echolocate. The insect-eating Brandt’s bat (Myotis brandtii) is the longest-lived bat species known to date (lifespan exceeds 40 years) and, at 4–8 g adult body weight, is the most extreme mammal with regard to disparity between body mass and longevity. Here we report sequencing and analysis of the Brandt’s bat genome and transcriptome, which suggest adaptations consistent with echolocation and hibernation, as well as altered metabolism, reproduction and visual function. Unique sequence changes in growth hormone and insulin-like growth factor 1 receptors are also observed. The data suggest that an altered growth hormone/insulin-like growth factor 1 axis, which may be common to other long-lived bat species, together with adaptations such as hibernation and low reproductive rate, contribute to the exceptional lifespan of the Brandt’s bat.