31 resultados para complete genome
em National Center for Biotechnology Information - NCBI
Resumo:
Bipolar mood disorder (BP) is a debilitating syndrome characterized by episodes of mania and depression. We designed a multistage study to detect all major loci predisposing to severe BP (termed BP-I) in two pedigrees drawn from the Central Valley of Costa Rica, where the population is largely descended from a few founders in the 16th–18th centuries. We considered only individuals with BP-I as affected and screened the genome for linkage with 473 microsatellite markers. We used a model for linkage analysis that incorporated a high phenocopy rate and a conservative estimate of penetrance. Our goal in this study was not to establish definitive linkage but rather to detect all regions possibly harboring major genes for BP-I in these pedigrees. To facilitate this aim, we evaluated the degree to which markers that were informative in our data set provided coverage of each genome region; we estimate that at least 94% of the genome has been covered, at a predesignated threshold determined through prior linkage simulation analyses. We report here the results of our genome screen for BP-I loci and indicate several regions that merit further study, including segments in 18q, 18p, and 11p, in which suggestive lod scores were observed for two or more contiguous markers. Isolated lod scores that exceeded our thresholds in one or both families also occurred on chromosomes 1, 2, 3, 4, 5, 7, 13, 15, 16, and 17. Interesting regions highlighted in this genome screen will be followed up using linkage disequilibrium (LD) methods.
Resumo:
The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living α-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.
Resumo:
The 1,852,442-bp sequence of an M1 strain of Streptococcus pyogenes, a Gram-positive pathogen, has been determined and contains 1,752 predicted protein-encoding genes. Approximately one-third of these genes have no identifiable function, with the remainder falling into previously characterized categories of known microbial function. Consistent with the observation that S. pyogenes is responsible for a wider variety of human disease than any other bacterial species, more than 40 putative virulence-associated genes have been identified. Additional genes have been identified that encode proteins likely associated with microbial “molecular mimicry” of host characteristics and involved in rheumatic fever or acute glomerulonephritis. The complete or partial sequence of four different bacteriophage genomes is also present, with each containing genes for one or more previously undiscovered superantigen-like proteins. These prophage-associated genes encode at least six potential virulence factors, emphasizing the importance of bacteriophages in horizontal gene transfer and a possible mechanism for generating new strains with increased pathogenic potential.
Resumo:
The genome of the crenarchaeon Sulfolobus solfataricus P2 contains 2,992,245 bp on a single chromosome and encodes 2,977 proteins and many RNAs. One-third of the encoded proteins have no detectable homologs in other sequenced genomes. Moreover, 40% appear to be archaeal-specific, and only 12% and 2.3% are shared exclusively with bacteria and eukarya, respectively. The genome shows a high level of plasticity with 200 diverse insertion sequence elements, many putative nonautonomous mobile elements, and evidence of integrase-mediated insertion events. There are also long clusters of regularly spaced tandem repeats. Different transfer systems are used for the uptake of inorganic and organic solutes, and a wealth of intracellular and extracellular proteases, sugar, and sulfur metabolizing enzymes are encoded, as well as enzymes of the central metabolic pathways and motility proteins. The major metabolic electron carrier is not NADH as in bacteria and eukarya but probably ferredoxin. The essential components required for DNA replication, DNA repair and recombination, the cell cycle, transcriptional initiation and translation, but not DNA folding, show a strong eukaryal character with many archaeal-specific features. The results illustrate major differences between crenarchaea and euryarchaea, especially for their DNA replication mechanism and cell cycle processes and their translational apparatus.
Resumo:
We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ≈270 million years ago and the γ subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ≈1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.
Resumo:
The determination of complete genome sequences provides us with an opportunity to describe and analyze evolution at the comprehensive level of genomes. Here we compare nine genomes with respect to their protein coding genes at two levels: (i) we compare genomes as “bags of genes” and measure the fraction of orthologs shared between genomes and (ii) we quantify correlations between genes with respect to their relative positions in genomes. Distances between the genomes are related to their divergence times, measured as the number of amino acid substitutions per site in a set of 34 orthologous genes that are shared among all the genomes compared. We establish a hierarchy of rates at which genomes have changed during evolution. Protein sequence identity is the most conserved, followed by the complement of genes within the genome. Next is the degree of conservation of the order of genes, whereas gene regulation appears to evolve at the highest rate. Finally, we show that some genomes are more highly organized than others: they show a higher degree of the clustering of genes that have orthologs in other genomes.
Resumo:
Bacillus subtilis strain ATCC6633 has been identified as a producer of mycosubtilin, a potent antifungal peptide antibiotic. Mycosubtilin, which belongs to the iturin family of lipopeptide antibiotics, is characterized by a β-amino fatty acid moiety linked to the circular heptapeptide Asn-Tyr-Asn-Gln-Pro-Ser-Asn, with the second, third, and sixth position present in the D-configuration. The gene cluster from B. subtilis ATCC6633 specifying the biosynthesis of mycosubtilin was identified. The putative operon spans 38 kb and consists of four ORFs, designated fenF, mycA, mycB, and mycC, with strong homologies to the family of peptide synthetases. Biochemical characterization showed that MycB specifically adenylates tyrosine, as expected for mycosubtilin synthetase, and insertional mutagenesis of the operon resulted in a mycosubtilin-negative phenotype. The mycosubtilin synthetase reveals features unique for peptide synthetases as well as for fatty acid synthases: (i) The mycosubtilin synthase subunit A (MycA) combines functional domains derived from peptide synthetases, amino transferases, and fatty acid synthases. MycA represents the first example of a natural hybrid between these enzyme families. (ii) The organization of the synthetase subunits deviates from that commonly found in peptide synthetases. On the basis of the described characteristics of the mycosubtilin synthetase, we present a model for the biosynthesis of iturin lipopeptide antibiotics. Comparison of the sequences flanking the mycosubtilin operon of B. subtilis ATCC6633, with the complete genome sequence of B. subtilis strain 168 indicates that the fengycin and mycosubtilin lipopeptide synthetase operons are exchanged between the two B. subtilis strains.
Resumo:
A recent study of the divergence times of the major groups of organisms as gauged by amino acid sequence comparison has been expanded and the data have been reanalyzed with a distance measure that corrects for both constraints on amino acid interchange and variation in substitution rate at different sites. Beyond that, the availability of complete genome sequences for several eubacteria and an archaebacterium has had a great impact on the interpretation of certain aspects of the data. Thus, the majority of the archaebacterial sequences are not consistent with currently accepted views of the Tree of Life which cluster the archaebacteria with eukaryotes. Instead, they are either outliers or mixed in with eubacterial orthologs. The simplest resolution of the problem is to postulate that many of these sequences were carried into eukaryotes by early eubacterial endosymbionts about 2 billion years ago, only very shortly after or even coincident with the divergence of eukaryotes and archaebacteria. The strong resemblances of these same enzymes among the major eubacterial groups suggest that the cyanobacteria and Gram-positive and Gram-negative eubacteria also diverged at about this same time, whereas the much greater differences between archaebacterial and eubacterial sequences indicate these two groups may have diverged between 3 and 4 billion years ago.
Resumo:
An increasing number of proteins with weak sequence similarity have been found to assume similar three-dimensional fold and often have similar or related biochemical or biophysical functions. We propose a method for detecting the fold similarity between two proteins with low sequence similarity based on their amino acid properties alone. The method, the proximity correlation matrix (PCM) method, is built on the observation that the physical properties of neighboring amino acid residues in sequence at structurally equivalent positions of two proteins of similar fold are often correlated even when amino acid sequences are different. The hydrophobicity is shown to be the most strongly correlated property for all protein fold classes. The PCM method was tested on 420 proteins belonging to 64 different known folds, each having at least three proteins with little sequence similarity. The method was able to detect fold similarities for 40% of the 420 sequences. Compared with sequence comparison and several fold-recognition methods, the method demonstrates good performance in detecting fold similarities among the proteins with low sequence identity. Applied to the complete genome of Methanococcus jannaschii, the method recognized the folds for 22 hypothetical proteins.
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
Complete genome sequences are providing a framework to allow the investigation of biological processes by the use of comprehensive approaches. Genome analysis also is having a dramatic impact on medicine through its identification of genes and mutations involved in disease and the elucidation of entire microbial gene sets. Studies of the sequences of model organisms, such as that of the nematode worm Caenorhabditis elegans, are providing extraordinary insights into development and differentiation that aid the study of these processes in humans. The field of functional genomics seeks to devise and apply technologies that take advantage of the growing body of sequence information to analyze the full complement of genes and proteins encoded by an organism.
Resumo:
Symbiotic associations with microorganisms are pivotal in many insects. Yet, the functional roles of obligate symbionts have been difficult to study because it has not been possible to cultivate these organisms in vitro. The medically important tsetse fly (Diptera: Glossinidae) relies on its obligate endosymbiont, Wigglesworthia glossinidia, a member of the Enterobacteriaceae, closely related to Escherichia coli, for fertility and possibly nutrition. We show here that the intracellular Wigglesworthia has a reduced genome size smaller than 770 kb. In an attempt to understand the composition of its genome, we used the gene arrays developed for E. coli. We were able to identify 650 orthologous genes in Wigglesworthia corresponding to ≈85% of its genome. The arrays were also applied for expression analysis using Wigglesworthia cDNA and 61 gene products were detected, presumably coding for some of its most abundant products. Overall, genes involved in cell processes, DNA replication, transcription, and translation were found largely retained in the small genome of Wigglesworthia. In addition, genes coding for transport proteins, chaperones, biosynthesis of cofactors, and some amino acids were found to comprise a significant portion, suggesting an important role for these proteins in its symbiotic life. Based on its expression profile, we predict that Wigglesworthia may be a facultative anaerobic organism that utilizes ammonia as its major source of nitrogen. We present an application of E. coli gene arrays to obtain broad genome information for a closely related organism in the absence of complete genome sequence data.
Resumo:
The structural proteins of the cytoplasmic intermediate filaments (IFs) arise in the nematode Caenorhabditis elegans from eight reported genes and an additional three genes now identified in the complete genome. With the use of double-stranded RNA interference (RNAi) for all 11 C. elegans genes encoding cytoplasmic IF proteins, we observe phenotypes for the five genes A1, A2, A3, B1, and C2. These range from embryonic lethality (B1) and embryonic/larval lethality (A3) to larval lethality (A1 and A2) and a mild dumpy phenotype of adults (C2). Phenotypes A2 and A3 involve displaced body muscles and paralysis. They probably arise by reduction of hypodermal IFs that participate in the transmission of force from the muscle cells to the cuticle. The B1 phenotype has multiple morphogenetic defects, and the A1 phenotype is arrested at the L1 stage. Thus, at least four IF genes are essential for C. elegans development. Their RNAi phenotypes are lethal defects due to silencing of single IF genes. In contrast to C. elegans, no IF genes have been identified in the complete Drosophila genome, posing the question of how Drosophila can compensate for the lack of these proteins, which are essential in mammals and C. elegans. We speculate that the lack of IF proteins in Drosophila can be viewed as cytoskeletal alteration in which, for instance, stable microtubules, often arranged as bundles, substitute for cytoplasmic IFs.
Resumo:
Despite more than a century of debate, the evolutionary position of turtles (Testudines) relative to other amniotes (reptiles, birds, and mammals) remains uncertain. One of the major impediments to resolving this important evolutionary problem is the highly distinctive and enigmatic morphology of turtles that led to their traditional placement apart from diapsid reptiles as sole descendants of presumably primitive anapsid reptiles. To address this question, the complete (16,787-bp) mitochondrial genome sequence of the African side-necked turtle (Pelomedusa subrufa) was determined. This molecule contains several unusual features: a (TA)n microsatellite in the control region, the absence of an origin of replication for the light strand in the WANCY region of five tRNA genes, an unusually long noncoding region separating the ND5 and ND6 genes, an overlap between ATPase 6 and COIII genes, and the existence of extra nucleotides in ND3 and ND4L putative ORFs. Phylogenetic analyses of the complete mitochondrial genome sequences supported the placement of turtles as the sister group of an alligator and chicken (Archosauria) clade. This result clearly rejects the Haematothermia hypothesis (a sister-group relationship between mammals and birds), as well as rejecting the placement of turtles as the most basal living amniotes. Moreover, evidence from both complete mitochondrial rRNA genes supports a sister-group relationship of turtles to Archosauria to the exclusion of Lepidosauria (tuatara, snakes, and lizards). These results challenge the classic view of turtles as the only survivors of primary anapsid reptiles and imply that turtles might have secondarily lost their skull fenestration.
Resumo:
The pufferfish Fugu rubripes has a genome ≈7.5 times smaller than that of mammals but with a similar number of genes. Although conserved synteny has been demonstrated between pufferfish and mammals across some regions of the genome, there is some controversy as to what extent Fugu will be a useful model for the human genome, e.g., [Gilley, J., Armes, N. & Fried, M. (1997) Nature (London) 385, 305–306]. We report extensive conservation of synteny between a 1.5-Mb region of human chromosome 11 and <100 kb of the Fugu genome in three overlapping cosmids. Our findings support the idea that the majority of DNA in the region of human chromosome 11p13 is intergenic. Comparative analysis of three unrelated genes with quite different roles, WT1, RCN1, and PAX6, has revealed differences in their structural evolution. Whereas the human WT1 gene can generate 16 protein isoforms via a combination of alternative splicing, RNA editing, and alternative start site usage, our data predict that Fugu WT1 is capable of generating only two isoforms. This raises the question of the extent to which the evolution of WT1 isoforms is related to the evolution of the mammalian genitourinary system. In addition, this region of the Fugu genome shows a much greater overall compaction than usual but with significant noncoding homology observed at the PAX6 locus, implying that comparative genomics has identified regulatory elements associated with this gene.