935 resultados para Complete Genome Sequence
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
The extremely halophilic archaeon Halobacterium sp. NRC-1 can grow phototrophically by means of light-driven proton pumping by bacteriorhodopsin in the purple membrane. Here, we show by genetic analysis of the wild type, and insertion and double-frame shift mutants of Bat that this transcriptional regulator coordinates synthesis of a structural protein and a chromophore for purple membrane biogenesis in response to both light and oxygen. Analysis of the complete Halobacterium sp. NRC-1 genome sequence showed that the regulatory site, upstream activator sequence (UAS), the putative binding site for Bat upstream of the bacterio-opsin gene (bop), is also present upstream to the other Bat-regulated genes. The transcription regulator Bat contains a photoresponsive cGMP-binding (GAF) domain, and a bacterial AraC type helix–turn–helix DNA binding motif. We also provide evidence for involvement of the PAS/PAC domain of Bat in redox-sensing activity by genetic analysis of a purple membrane overproducer. Five additional Bat-like putative regulatory genes were found, which together are likely to be responsible for orchestrating the complex response of this archaeon to light and oxygen. Similarities of the bop-like UAS and transcription factors in diverse organisms, including a plant and a γ-proteobacterium, suggest an ancient origin for this regulon capable of coordinating light and oxygen responses in the three major branches of the evolutionary tree of life. Finally, sensitivity of four of five regulon genes to DNA supercoiling is demonstrated and correlated to presence of alternating purine–pyrimidine sequences (RY boxes) near the regulated promoters.
Resumo:
Complete genome sequences are providing a framework to allow the investigation of biological processes by the use of comprehensive approaches. Genome analysis also is having a dramatic impact on medicine through its identification of genes and mutations involved in disease and the elucidation of entire microbial gene sets. Studies of the sequences of model organisms, such as that of the nematode worm Caenorhabditis elegans, are providing extraordinary insights into development and differentiation that aid the study of these processes in humans. The field of functional genomics seeks to devise and apply technologies that take advantage of the growing body of sequence information to analyze the full complement of genes and proteins encoded by an organism.
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
Resumo:
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.
Resumo:
This article investigates the expression patterns of 160 genes that are expressed during early mouse development. The cDNAs were isolated from 7.5 d postcoitum (dpc) encloderm, a region that comprises visceral encloderm (VE), definitive encloderm, and the node-tissues that are required for the initial steps of axial specification and tissue patterning in the mouse. To avoid examining the same gene more than once, and to exclude potentially ubiquitously expressed housekeeping genes, cDNA sequence was derived from 1978 clones of the Endoderm library. These yielded 1440 distinct cDNAs, of which 123 proved to be novel in the mouse. In situ hybridization analysis was carried out on 160 of the cDNAs, and of these, 29 (18%) proved to have restricted expression patterns.
Resumo:
The function of the prion protein gene (PRNP) and its normal product PrPC is elusive. We used comparative genomics as a strategy to understand the normal function of PRNP. As the reliability of comparisons increases with the number of species and increased evolutionary distance, we isolated and sequenced a 66.5 kb BAC containing the PRNP gene from a distantly related mammal, the model Australian marsupial Macropus eugenii (tammar wallaby). Marsupials are separated from eutherians such as human and mouse by roughly 180 million years of independent evolution. We found that tammar PRNP, like human PRNP, has two exons. Prion proteins encoded by the tammar wallaby and a distantly related marsupial, Monodelphis domestica (Brazilian opossum) PRNP contain proximal PrP repeats with a distinct, marsupial-specific composition and a variable number. Comparisons of tammar wallaby PRNP with PRNPs from human, mouse, bovine and ovine allowed us to identify non-coding gene regions conserved across the marsupial-eutherian evolutionary distance, which are candidates for regulatory regions. In the PRNP 3' UTR we found a conserved signal for nuclear-specific polyadenylation and the putative cytoplasmic polyadenylation element (CPE), indicating that post-transcriptional control of PRNP mRNA activity is important. Phylogenetic footprinting revealed conserved potential binding sites for the MZF-1 transcription factor in both upstream promoter and intron/intron 1, and for the MEF2, MyTI, Oct-1 and NFAT transcription factors in the intron(s). The presence of a conserved NFAT-binding site and CPE indicates involvement of PrPC in signal transduction and synaptic plasticity. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
The southern cattle tick, Boophilus microplus (Canestrini), causes annual economic losses in the hundreds of millions of dollars to cattle producers throughout the world, and ranks as the most economically important tick from a global perspective. Control failures attributable to the development of pesticide resistance have become commonplace, and novel control technologies are needed. The availability of the genome sequence will facilitate the development of these new technologies, and we are proposing sequencing to a 4-6X draft coverage. Many existing biological resources are available to facilitate a genome sequencing project, including several inbred laboratory tick strains, a database of approximate to 45,000 expressed sequence tags compiled into a B. microplus Gene Index, a bacterial artificial chromosome (BAC) library, an established B. microplus cell line, and genomic DNA suitable for library synthesis. Collaborative projects are underway to map BACs and cDNAs to specific chromosomes and to sequence selected BAC clones. When completed, the genome sequences from the cow, B. microphis, and the B. microphis-borne pathogens Babesia bovis and Anaplasma marginale will enhance studies of host-vector-pathogen systems. Genes involved in the regeneration of amputated tick limbs and transitions through developmental stages are largely unknown. Studies of these and other interesting biological questions will be advanced by tick genome sequence data. Comparative genomics offers the prospect of new insight into many, perhaps all, aspects of the biology of ticks and the pathogens they transmit to farm animals and people. The B. microplus genome sequence will fill a major gap in comparative genomics: a sequence from the Metastriata lineage of ticks. The purpose of the article is to synergize interest in and provide rationales for sequencing the genome of B. microplus and for publicizing currently available genomic resources for this tick.
Resumo:
In Late summer 1999, an outbreak of human encephalitis occurred in the northeastern United States that was concurrent with extensive mortality in crows (Corvus species) as well as the deaths of several exotic birds at a zoological park in the same area. Complete genome sequencing of a flavivirus isolated from the brain of a dead Chilean flamingo (Phoenicopterus chilensis), together with partial sequence analysis of envelope glycoprotein (E-glycoprotein) genes amplified from several other species including mosquitoes and two fatal human cases, revealed that West Nile (WN) virus circulated in natural transmission cycles and was responsible for the human disease. Antigenic mapping with E-glycoprotein-specific monoclonal antibodies and E-glycoprotein phylogenetic analysis confirmed these viruses as WN. This North American WN virus was most closely related to a WN virus isolated from a dead goose in Israel in 1998.
Resumo:
Photographs from the February 1997 Bermuda meeting. Courtesy of Gert-Jan van Ommen.
Resumo:
This thesis describes two newly sequenced B. longum subsp. longum genomes and subsequent comparative analysis with publicly available B. longum subsp. longum, B. longum subsp. infantis and B. longum subsp. suis genomes (Chapter 2). The acquired data revealed a closed pan-genome for this bifidobacterial species and furthermore facilitated the definition of the B. longum core genome. The comparative analysis also highlights differences in the potential metabolic abilities of all three sub-species. Interestingly, phylogenetic analysis of the B. longum core genome indicated the existence of a novel B. longum subspecies. Characterisation of restriction-modification systems from two B. longum subsp. longum strains is described in Chapter 3. These defence mechanisms limit the uptake of genetic material, which was successfully demonstrated for some of the identified systems. When these systems were by-passed by methylation of DNA prior to the transformation procedure, the resulting transformation efficiency of both B. longum subsp. longum strains was increased to a level that allowed for the generation of mutants via homologous recombination. Arabinoxylan metabolism by B. longum subsp. longum NCIMB 8809 was investigated in Chapter 4 of this thesis. Transcriptome analysis allowed the identification of a number of genes involved in the degradation, uptake and utilisation of arabinoxylan. Biochemical analysis revealed that three of the identified genes encode arabinofuranosidase activity. Phenotypic assessment of a number of insertion mutants in genes identified by the transcriptome analysis revealed the essential role of two of these enzymes in arabinoxylan metabolism, and a third enzyme in the metabolism of debranched arabinan. Furthermore, this investigation revealed that B. longum subsp. longum NCIMB 8809 does not completely degrade arabinoxylan, but utilises the arabinose substitutions only, while leaving the xylan backbone untouched.Finally, Chapter 5 outlines that B. longum subsp. longum NCIMB 8809 is capable of removing ferulic and p-coumaric acid substitutions that originate from arabinoxylan. Analysis of the genome sequence led to the identification of a candidate gene for this activity, which was subsequently cloned and expressed in E. coli. Biochemical analysis revealed that the enzyme, designated here as FaeA, is indeed capable of releasing both ferulic and p-coumaric acid from arabinoxylan. Furthermore, it is shown that a derivative of B. longum subsp. longum NCIMB 8809 carrying an insertion mutation in faeA had lost the ability to release ferulic and p-coumaric acid from arabinoxylan, and that growth of this mutant strain is negatively affected when cultivated on growth-limiting levels of arabinoxylan.
Resumo:
The Bifibobacterium longum subsp. longum 35624™ strain (formerly named Bifidobacterium longum subsp. infantis) is a well described probiotic with clinical efficacy in Irritable Bowel Syndrome clinical trials and induces immunoregulatory effects in mice and in humans. This paper presents (a) the genome sequence of the organism allowing the assignment to its correct subspeciation longum; (b) a comparative genome assessment with other B. longum strains and (c) the molecular structure of the 35624 exopolysaccharide (EPS624). Comparative genome analysis of the 35624 strain with other B. longum strains determined that the sub-speciation of the strain is longum and revealed the presence of a 35624-specific gene cluster, predicted to encode the biosynthetic machinery for EPS624. Following isolation and acid treatment of the EPS, its chemical structure was determined using gas and liquid chromatography for sugar constituent and linkage analysis, electrospray and matrix assisted laser desorption ionization mass spectrometry for sequencing and NMR. The EPS consists of a branched hexasaccharide repeating unit containing two galactose and two glucose moieties, galacturonic acid and the unusual sugar 6-deoxy-L-talose. These data demonstrate that the B. longum 35624 strain has specific genetic features, one of which leads to the generation of a characteristic exopolysaccharide.
Adaptive Mechanisms of an Estuarine Synechococcus based on Genomics, Transcriptomics, and Proteomics
Resumo:
Picocyanobacteria are important phytoplankton and primary producers in the ocean. Although extensive work has been conducted for picocyanobacteria (i.e. Synechococcus and Prochlorococcus) in coastal and oceanic waters, little is known about those found in estuaries like the Chesapeake Bay. Synechococcus CB0101, an estuarine isolate, is more tolerant to shifts in temperature, salinity, and metal toxicity than coastal and oceanic Synechococcus strains, WH7803 and WH7805. Further, CB0101 has a greater sensitivity to high light intensity, likely due to its adaptation to low light environments. A complete and annotated genome sequence of CB0101 was completed to explore its genetic capacity and to serve as a basis for further molecular analysis. Comparative genomics between CB0101, WH7803, and WH7805 show that CB0101 contains more genes involved in regulation, sensing, and stress response. At the transcript and protein level, CB0101 regulates its metabolic pathways, transport systems, and sensing mechanisms when nitrate and phosphate are limited. Zinc toxicity led to oxidative stress and a global down regulation of photosystems and the translation machinery. From the stress response studies seven chromosomal toxin-antitoxin (TA) genes, were identified in CB0101, which led to the discovery of TA genes in several marine Synechococcus strains. The activation of the relB2/relE1 TA system allows CB0101 to arrest its growth under stressful conditions, but the growth arrest is reversible, once the stressful environment dissipates. The genome of CB0101 contains a relatively large number of genomic island (GI) genes compared to known marine Synechococcus genomes. Interestingly, a massive shutdown (255 out of 343) of GI genes occurred after CB0101 was infected by a lytic phage. On the other hand, phage-encoded host-like proteins (hli, psbA, ThyX) were highly expressed upon phage infection. This research provides new evidence that estuarine Synechococcus like CB0101 have inherited unique genetic machinery, which allows them to be versatile in the estuarine environment.
Resumo:
Some factors complicate comparisons between linkage maps from different studies. This problem can be resolved if measures of precision, such as confidence intervals and frequency distributions, are associated with markers. We examined the precision of distances and ordering of microsatellite markers in the consensus linkage maps of chromosomes 1, 3 and 4 from two F 2 reciprocal Brazilian chicken populations, using bootstrap sampling. Single and consensus maps were constructed. The consensus map was compared with the International Consensus Linkage Map and with the whole genome sequence. Some loci showed segregation distortion and missing data, but this did not affect the analyses negatively. Several inversions and position shifts were detected, based on 95% confidence intervals and frequency distributions of loci. Some discrepancies in distances between loci and in ordering were due to chance, whereas others could be attributed to other effects, including reciprocal crosses, sampling error of the founder animals from the two populations, F(2) population structure, number of and distance between microsatellite markers, number of informative meioses, loci segregation patterns, and sex. In the Brazilian consensus GGA1, locus LEI1038 was in a position closer to the true genome sequence than in the International Consensus Map, whereas for GGA3 and GGA4, no such differences were found. Extending these analyses to the remaining chromosomes should facilitate comparisons and the integration of several available genetic maps, allowing meta-analyses for map construction and quantitative trait loci (QTL) mapping. The precision of the estimates of QTL positions and their effects would be increased with such information.
Resumo:
Despite the wide distribution of transposable elements (TEs) in mammalian genomes, part of their evolutionary significance remains to be discovered. Today there is a substantial amount of evidence showing that TEs are involved in the generation of new exons in different species. In the present study, we searched 22,805 genes and reported the occurrence of TE-cassettes in coding sequences of 542 cow genes using the RepeatMasker program. Despite the significant number (542) of genes with TE insertions in exons only 14 (2.6%) of them were translated into protein, which we characterized as chimeric genes. From these chimeric genes, only the FAST kinase domains 3 (FASTKD3) gene, present on chromosome BTA 20, is a functional gene and showed evidence of the exaptation event. The genome sequence analysis showed that the last exon coding sequence of bovine FASTKD3 is similar to 85% similar to the ART2A retrotransposon sequence. In addition, comparison among FASTKD3 proteins shows that the last exon is very divergent from those of Homo sapiens, Pan troglodytes and Canis familiares. We suggest that the gene structure of bovine FASTKD3 gene could have originated by several ectopic recombinations between TE copies. Additionally, the absence of TE sequences in all other species analyzed suggests that the TE insertion is clade-specific, mainly in the ruminant lineage.
Resumo:
Background: Schistosoma mansoni is the major causative agent of schistosomiasis. The parasite takes advantage of host signals to complete its development in the human body. Tumor necrosis factor-alpha (TNF-alpha) is a human cytokine involved in skin inflammatory responses, and although its effect on the adult parasite's metabolism and egg-laying process has been previously described, a comprehensive assessment of the TNF-alpha pathway and its downstream molecular effects is lacking. Methodology/Principal Findings: In the present work we describe a possible TNF-alpha receptor (TNFR) homolog gene in S. mansoni (SmTNFR). SmTNFR encodes a complete receptor sequence composed of 599 amino acids, and contains four cysteine-rich domains as described for TNFR members. Real-time RT-PCR experiments revealed that SmTNFR highest expression level is in cercariae, 3.5 (+/- 0.7) times higher than in adult worms. Downstream members of the known human TNF-alpha pathway were identified by an in silico analysis, revealing a possible TNF-alpha signaling pathway in the parasite. In order to simulate parasite's exposure to human cytokine during penetration of the skin, schistosomula were exposed to human TNF-alpha just 3 h after cercariae-to-schistosomula in vitro transformation, and large-scale gene expression measurements were performed with microarrays. A total of 548 genes with significantly altered expression were detected, when compared to control parasites. In addition, treatment of adult worms with TNF-alpha caused a significantly altered expression of 1857 genes. Interestingly, the set of genes altered in adults is different from that of schistosomula, with 58 genes in common, representing 3% of altered genes in adults and 11% in 3 h-old early schistosomula. Conclusions/Significance: We describe the possible molecular elements and targets involved in human TNF-alpha effect on S. mansoni, highlighting the mechanism by which recently transformed schistosomula may sense and respond to this host mediator at the site of cercarial penetration into the skin.