999 resultados para codon usage
Resumo:
This is a study on the Avian coronavirus IBV and chicken host-relationship from the codon usage point of view based on fifty-nine non-redundant IBV S1 sequences (nt 1-507) from strains detected worldwide and chicken tissue-specific protein genes sequences from IBV-replicating sites. The effective number of codons (ENC) values ranged from 36 to 47.8, indicating a high-to-moderate codon usage bias. The highest IBV codon adaptation index (CAI) value was 0.7, indicating a distant virus versus host synonymous codons usage. The ENC x GC3 % curve indicates that both mutational pressure and natural selection are the driving forces on codon usage pattern in S1. The low CAI values agree with a low S protein expression and considering that S protein is a determinant for attachment and neutralization, this could be a further mechanism besides mRNA transcription attenuation for a low expression of this protein leading to an immune camouflage.
Resumo:
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.
Resumo:
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
At present a complete mtDNA sequence has been reported for only two hymenopterans, the Old World honey bee, Apis mellifera and the sawfly Perga condei. Among the bee group, the tribe Meliponini (stingless bees) has some distinction due to its Pantropical distribution, great number of species and large importance as main pollinators in several ecosystems, including the Brazilian rain forest. However few molecular studies have been conducted on this group of bees and few sequence data from mitochondrial genomes have been described. In this project, we PCR amplified and sequenced 78% of the mitochondrial genome of the stingless bee Melipona bicolor (Apidae, Meliponini). The sequenced region contains all of the 13 mitochondrial protein-coding genes, 18 of 22 tRNA genes, and both rRNA genes (one of them was partially sequenced). We also report the genome organization (gene content and order), gene translation, genetic code, and other molecular features, such as base frequencies, codon usage, gene initiation and termination. We compare these characteristics of M. bicolor to those of the mitochondrial genome of A. mellifera and other insects. A highly biased A+T content is a typical characteristic of the A. mellifera mitochondrial genome and it was even more extreme in that of M. bicolor. Length and compositional differences between M. bicolor and A. mellifera genes were detected and the gene order was compared. Eleven tRNA gene translocations were observed between these two species. This latter finding was surprising, considering the taxonomic proximity of these two bee tribes. The tRNA Lys gene translocation was investigated within Meliponini and showed high conservation across the Pantropical range of the tribe.
Resumo:
We present here the sequence of the mitochondrial genome of the basidiomycete phytopathogenic hemibiotrophic fungus Moniliophthora perniciosa, causal agent of the Witches` Broom Disease in Theobroma cacao. The DNA is a circular molecule of 109103 base pairs, with 31.9 % GC, and is the largest sequenced so far. This size is due essentially to the presence of numerous non-conserved hypothetical ORFs. It contains the 14 genes coding for proteins involved in the oxidative phosphorylation, the two rRNA genes, one ORF coding for a ribosomal protein (rps3), and a set of 26 tRNA genes that recognize codons for all amino acids. Seven homing endonucleases are located inside introns. Except atp8, all conserved known genes are in the same orientation. Phylogenetic analysis based on the cox genes agrees with the commonly accepted fungal taxonomy. An uncommon feature of this mitochondrial genome is the presence of a region that contains a set of four, relatively small, nested, inverted repeats enclosing two genes coding for polymerases with an invertron-type structure and three conserved hypothetical genes interpreted as the stable integration of a mitochondrial linear plasmid. The integration of this plasmid seems to be a recent evolutionary event that could have implications in fungal biology. This sequence is available under GenBank accession number AY376688. (c) 2008 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
Complete sequences were obtained for the coding portions of the mitochondrial (mt) genomes of Schistosoma mansoni (NMRI strain, Puerto Rico; 14415 bp), S. japonicum (Anhui strain, China; 14085 bp) and S. mekongi (Khong Island, Laos; 14072 bp). Each comprises 36 genes: 12 protein-encoding genes (cox1-3, nad1-6, nad4L, atp6 and cob); two ribosomal RNAs, rrnL (large subunit rRNA or 16S) and rrnS (small subunit rRNA or 12S); as well as 22 transfer RNA (tRNA) genes. The atp8 gene is absent. A large segment (9.6 kb) of the coding region (comprising 14 tRNAs, eight complete and two incomplete protein-encoding genes) for S. malayensis (Baling, Malaysian Peninsula) was also obtained. Each genome also possesses a long non-coding region that is divided into two parts (a small and a large non-coding region, the latter not fully sequenced in any species) by one or more tRNAs. The protein-encoding genes are similar in size, composition and codon usage in all species except for cox1 in S. mansoni (609 aa) and cox2 in S. mekongi (219 an), both of which are longer than homologues in other species. An unexpected finding in all the Schistosoma species was the presence of a leucine zipper motif in the nad4L gene. The gene order in S. mansoni is strikingly different from that seen in the S. japonicum group and other flatworms. There is a high level of identity (87-94% at both the nucleotide and amino acid levels) for all protein-encoding genes of S. mekongi and S. malayensis. The identity between genes of these two species and those of S. japonicum is less (56-83% for amino acids and 73-79 for nucleotides). The identity between the genes of S. mansoni and the Asian schistosomes is far less (33-66% for amino acids and 54-68% for nucleotides), an observation consistent with the known phylogenetic distance between S. mansoni and the other species. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.
Resumo:
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.
Resumo:
MOTIVATION: Lateral gene transfer is a major mechanism contributing to bacterial genome dynamics and pathovar emergence via pathogenicity island (PAI) spreading. However, since few of these genomic exchanges are experimentally reproducible, it is difficult to establish evolutionary scenarios for the successive PAI transmissions between bacterial genera. Methods initially developed at the gene and/or nucleotide level for genomics, i.e. comparisons of concatenated sequences, ortholog frequency, gene order or dinucleotide usage, were combined and applied here to homologous PAIs: we call this approach comparative PAI genometrics. RESULTS: YAPI, a Yersinia PAI, and related islands were compared with measure evolutionary relationships between related modules. Through use of our genometric approach designed for tracking codon usage adaptation and gene phylogeny, an ancient inter-genus PAI transfer was oriented for the first time by characterizing the genomic environment in which the ancestral island emerged and its subsequent transfers to other bacterial genera.
Resumo:
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.
Resumo:
The nucleotide sequence of a genomic DNA fragment thought previously to contain the dihydrofolate reductase gene (DFR1) of Saccharomyces cerevisiae by genetic criteria was determined. This DNA fragment of 1784' basepairs contains a large open reading frame from position 800 to 1432, which encodes a enzyme with a predicted molecular weight of 24,229.8 Daltons. Analysis of the amino acid sequence of this protein revealed that the yeast polypep·tide contained 211 amino acids, compared to the 186 residues commonly found in the polypeptides of other eukaryotes. The difference in size of the gene product can be attributed mainly to an insert in the yeast gene. Within this region, several consensus sequences required for processing of yeast nuclear and class II mitochondrial introns were identified, but appear not sufficient for the RNA splicing. The primary structure of the yeast DHFR protein has considerable sequence homology with analogous polypeptides from other organisms, especially in the consensus residues involved in cofactor and/or inhibitor binding. Analysis of the nucleotide sequence also revealed the presence of a number of canonical sequences identified in yeast as having some function in the regulation of gene expression. These include UAS elements (TGACTC) required for tIle amino acid general control response, and "TATA H boxes as well as several consensus sequences thought to be required for transcriptional termination and polyadenylation. Analysis of the codon usage of the yeast DFRl coding region revealed a codon bias index of 0.0083. this valve very close to zero suggestes 3 that the gene is expressed at a relatively low level under normal physiological conditions. The information concerning the organization of the DFRl were used to construct a variety of fusions of its 5' regulatory region with the coding region of the lacZ gene of E. coli. Some of such fused genes encoded a fusion product that expressed in E.coli and/or in yeast under the control of the 5' regulatory elements of the DFR1. Further studies with these fusion constructions revealed that the beta-galactosidase activity encoded on multicopy plasmids was stimulated transiently by prior exposure of yeast host cells to UV light. This suggests that the yeast PFRl gene is indu.ced by UV light and nlay in1ply a novel function of DHFR protein in the cellular responses to DNA damage. Another novel f~ature of yeast DHFR was revealed during preliminary studies of a diploid strain containing a heterozygous DFRl null allele. The strain was constructed by insertion of a URA3 gene within the coding region of DFR1. Sporulation of this diploid revealed that meiotic products segregated 2:0 for uracil prototrophy when spore clones were germinated on medium supplemented with 5-formyltetrahydrofolate (folinic acid). This finding suggests that, in addition to its catalytic activity, the DFRl gene product nlay play some role in the anabolisln of folinic acid. Alternatively, this result may indicate that Ura+ haploid segregants were inviable and suggest that the enzyme has an essential cellular function in this species.
Resumo:
We characterized four eEF1A genes in the alternative rhabditid nematode model organism Oscheius tipulae. This is twice the copy number of eEF1A genes in C. elegans, C. briggsae, and, probably, many other free-living and parasitic nematodes. The introns show features remarkably different from those of other metazoan eEF1A genes. Most of the introns in the eEF1A genes are specific to O. tipulae and are not shared with any of the other genes described in metazoans. Most of the introns are phase 0 (inserted between two codons), and few are inserted in protosplice sites (introns inserted between the nucleotide sequence A/CAG and G/A). Two of these phase 0 introns are conserved in sequence in two or more of the four eEF1A gene copies, and are inserted in the same position in the genes. Neither of these characteristics has been detected in any of the nematode eEF1A genes characterized to date. The coding sequences were also compared with other eEF1A cDNAs from 11 different nematodes to determine the variability of these genes within the phylum Nematoda. Parsimony and distance trees yielded similar topologies, which were similar to those created using other molecular markers. The presence of more than one copy of the eEF1A gene with nearly identical coding regions makes it difficult to define the orthologous cDNAs. As shown by our data on O. tipulae, careful and extensive examination of intron positions in the eEF1A gene across the phylum is necessary to define their potential for use as valid phylogenetic markers.
Resumo:
A total of 3,631 expressed sequence tags (ESTs) were established from two size-selected cDNA libraries made from the tetrasporophytic phase of the agarophytic red alga Gracilaria tenuistipitata. The average sizes of the inserts in the two libraries were 1,600 bp and 600 bp, with an average length of the edited sequences of 850 bp. Clustering gave 2,387 assembled sequences with a redundancy of 53%. Of the ESTs, 65% had significant matches to sequences deposited in public databases, 11% to proteins without known function, and 35% were novel. The most represented ESTs were a Na/K-transporting ATPase, a hedgehog-like protein, a glycine dehydrogenase and an actin. Most of the identified genes were involved in primary metabolism and housekeeping. The largest functional group was thus genes involved in metabolism with 14% of the ESTs; other large functional categories included energy, transcription, and protein synthesis and destination. The codon usage was examined using a subset of the data, and the codon bias was found to be limited with all codon combinations used.
Resumo:
Characterization of Human Respiratory Syncytial Virus (HRSV) protein interactions with host cell components is crucial to devise antiviral strategies. Viral nucleoprotein, phosphoprotein and matrix protein genes were optimized for human codon usage and cloned into expression vectors. HEK-293T cells were transfected with these vectors, viral proteins were immunoprecipitated, and co-immunoprecipitated cellular proteins were identified through mass spectrometry. Cell proteins identified with higher confidence scores were probed in the immunoprecipitation using specific antibodies. The results indicate that nucleoprotein interacts with arginine methyl-transferase, methylosome protein and Hsp70. Phosphoprotein interacts with Hsp70 and tropomysin, and matrix with tropomysin and nucleophosmin. Additionally, we performed immunoprecipitation of these cellular proteins in cells infected with HRSV, followed by detection of co-immunoprecipitated viral proteins. The results indicate that these interactions also occur in the context of viral infection, and their potential contribution for a HRSV replication model is discussed.