934 resultados para Codon bias
Resumo:
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.
Resumo:
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.
Resumo:
The informational properties of biological systems are the subject of much debate and research. I present a general argument in favor of the existence and central importance of information in organisms, followed by a case study of the genetic code (specifically, codon bias) and the translation system from the perspective of information. The codon biases of 831 Bacteria and Archeae are analyzed and modeled as points in a 64-dimensional statistical space. The major results are that (1) codon bias evolution does not follow canonical patterns, and (2) the use of coding space in organsims is a subset of the total possible coding space. These findings imply that codon bias is a unique adaptive mechanism that owes its existence to organisms' use of information in representing genes, and that there is a particularly biological character to the resulting biased coding and information use.
Resumo:
The psbA gene of the chloroplast genome has a codon usage that is unusual for plant chloroplast genes. In the present study the evolutionary status of this codon usage is tested by reconstructing putative ancestral psbA sequences to determine the pattern of change in codon bias during angiosperm divergence. It is shown that the codon biases of the ancestral genes are much stronger than all extant flowering plant psbA genes. This is related to previous work that demonstrated a significant increase in synonymous substitution in psbA relative to other chloroplast genes. It is suggested, based on the two lines of evidence, that the codon bias of this gene currently is not being maintained by selection. Rather, the atypical codon bias simply may be a remnant of an ancestral codon bias that now is being degraded by the mutation bias of the chloroplast genome, in other words, that the psbA gene is not at equilibrium. A model for the evolution of selective pressure on the codon usage of plant chloroplast genes is discussed.
Resumo:
We first review what is known about patterns of codon usage bias in Drosophila and make the following points: (i) Drosophila genes are as biased or more biased than those in microorganisms. (ii) The level of bias of genes and even the particular pattern of codon bias can remain phylogenetically invariant for very long periods of evolution. (iii) However, some genes, even very tightly linked genes, can change very greatly in codon bias across species. (iv) Generally G and especially C are favored at synonymous sites in biased genes. (v) With the exception of aspartic acid, all amino acids contribute significantly and about equally to the codon usage bias of a gene. (vi) While most individual amino acids that can use G or C at synonymous sites display a preference for C, there are exceptions: valine and leucine, which prefer G. (vii) Finally, smaller genes tend to be more biased than longer genes. We then examine possible causes of these patterns and discount mutation bias on three bases: there is little evidence of regional mutation bias in Drosophila, mutation bias is likely toward A+T (the opposite of codon usage bias), and not all amino acids display the preference for the same nucleotide in the wobble position. Two lines of evidence support a selection hypothesis based on tRNA pools: highly biased genes tend to be highly and/or rapidly expressed, and the preferred codons in highly biased genes optimally bind the most abundant isoaccepting tRNAs. Finally, we examine the effect of bias on DNA evolution and confirm that genes with high codon usage bias have lower rates of synonymous substitution between species than do genes with low codon usage bias. Surprisingly, we find that genes with higher codon usage bias display higher levels of intraspecific synonymous polymorphism. This may be due to opposing effects of recombination.
Resumo:
Complexity of biological function relies on large networks of interacting molecules. However, the evolutionary properties of these networks are not fully understood. It has been shown that selective pressures depend on the position of genes in the network. We have previously shown that in the Drosophila insulin/target of rapamycin (TOR) signal transduction pathway there is a correlation between the pathway position and the strength of purifying selection, with the downstream genes being most constrained. In this study, we investigated the evolutionary dynamics of this well-characterized pathway in vertebrates. More specifically, we determined the impact of natural selection on the evolution of 72 genes of this pathway. We found that in vertebrates there is a similar gradient of selective constraint in the insulin/TOR pathway to that found in Drosophila. This feature is neither the result of a polarity in the impact of positive selection nor of a series of factors affecting selective constraint levels (gene expression level and breadth, codon bias, protein length, and connectivity). We also found that pathway genes encoding physically interacting proteins tend to evolve under similar selective constraints. The results indicate that the architecture of the vertebrate insulin/TOR pathway constrains the molecular evolution of its components. Therefore, the polarity detected in Drosophila is neither specific nor incidental of this genus. Hence, although the underlying biological mechanisms remain unclear, these may be similar in both vertebrates and Drosophila.
Resumo:
The nucleotide sequence of a genomic DNA fragment thought previously to contain the dihydrofolate reductase gene (DFR1) of Saccharomyces cerevisiae by genetic criteria was determined. This DNA fragment of 1784' basepairs contains a large open reading frame from position 800 to 1432, which encodes a enzyme with a predicted molecular weight of 24,229.8 Daltons. Analysis of the amino acid sequence of this protein revealed that the yeast polypep·tide contained 211 amino acids, compared to the 186 residues commonly found in the polypeptides of other eukaryotes. The difference in size of the gene product can be attributed mainly to an insert in the yeast gene. Within this region, several consensus sequences required for processing of yeast nuclear and class II mitochondrial introns were identified, but appear not sufficient for the RNA splicing. The primary structure of the yeast DHFR protein has considerable sequence homology with analogous polypeptides from other organisms, especially in the consensus residues involved in cofactor and/or inhibitor binding. Analysis of the nucleotide sequence also revealed the presence of a number of canonical sequences identified in yeast as having some function in the regulation of gene expression. These include UAS elements (TGACTC) required for tIle amino acid general control response, and "TATA H boxes as well as several consensus sequences thought to be required for transcriptional termination and polyadenylation. Analysis of the codon usage of the yeast DFRl coding region revealed a codon bias index of 0.0083. this valve very close to zero suggestes 3 that the gene is expressed at a relatively low level under normal physiological conditions. The information concerning the organization of the DFRl were used to construct a variety of fusions of its 5' regulatory region with the coding region of the lacZ gene of E. coli. Some of such fused genes encoded a fusion product that expressed in E.coli and/or in yeast under the control of the 5' regulatory elements of the DFR1. Further studies with these fusion constructions revealed that the beta-galactosidase activity encoded on multicopy plasmids was stimulated transiently by prior exposure of yeast host cells to UV light. This suggests that the yeast PFRl gene is indu.ced by UV light and nlay in1ply a novel function of DHFR protein in the cellular responses to DNA damage. Another novel f~ature of yeast DHFR was revealed during preliminary studies of a diploid strain containing a heterozygous DFRl null allele. The strain was constructed by insertion of a URA3 gene within the coding region of DFR1. Sporulation of this diploid revealed that meiotic products segregated 2:0 for uracil prototrophy when spore clones were germinated on medium supplemented with 5-formyltetrahydrofolate (folinic acid). This finding suggests that, in addition to its catalytic activity, the DFRl gene product nlay play some role in the anabolisln of folinic acid. Alternatively, this result may indicate that Ura+ haploid segregants were inviable and suggest that the enzyme has an essential cellular function in this species.
Resumo:
Porcine S100A12 is a member of the S100 proteins, family of small acidic calcium-binding proteins characterized by the presence of two EF-hand motifs. These proteins are involved in many cellular events such as the regulation of protein phosphorylation, enzymatic activity, protein-protein interaction, Ca(2+) homeostasis, inflammatory processes and intermediate filament polymerization. In addition, members of this family bind Zn(2+) or Ca(2+) with cooperative effect on binding. In this study, the gene sequence encoding porcine S100A12 was obtained by the synthetic gene approach using E. coli codon bias. Additionally, we report a thermodynamic study of the recombinant S100A12 using circular dichroism, fluorescence and isothermal titration calorimetry. The results of urea and temperature induced unfolding and refolding processes indicated a reversible two-state process. Also, the ANS fluorescence studies showed that in presence of divalent ions the protein exposes hydrophobic sites which could facilitate the interaction with other proteins and trigger the physiological responses. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
A total of 3,631 expressed sequence tags (ESTs) were established from two size-selected cDNA libraries made from the tetrasporophytic phase of the agarophytic red alga Gracilaria tenuistipitata. The average sizes of the inserts in the two libraries were 1,600 bp and 600 bp, with an average length of the edited sequences of 850 bp. Clustering gave 2,387 assembled sequences with a redundancy of 53%. Of the ESTs, 65% had significant matches to sequences deposited in public databases, 11% to proteins without known function, and 35% were novel. The most represented ESTs were a Na/K-transporting ATPase, a hedgehog-like protein, a glycine dehydrogenase and an actin. Most of the identified genes were involved in primary metabolism and housekeeping. The largest functional group was thus genes involved in metabolism with 14% of the ESTs; other large functional categories included energy, transcription, and protein synthesis and destination. The codon usage was examined using a subset of the data, and the codon bias was found to be limited with all codon combinations used.
Resumo:
From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.
Resumo:
Our recent demonstration that many eukaryotic mRNAs contain sequences complementary to rRNA led to the hypothesis that these sequences might mediate specific interactions between mRNAs and ribosomes and thereby affect translation. In the present experiments, the ability of complementary sequences to bind to rRNA was investigated by using photochemical cross-linking. RNA probes with perfect complementarity to 18S or 28S rRNA were shown to cross-link specifically to the corresponding rRNA within intact ribosomal subunits. Similar results were obtained by using probes based on natural mRNA sequences with varying degrees of complementarity to the 18S rRNA. RNase H cleavage localized four such probes to complementary regions of the 18S rRNA. The effects of complementarity on translation were assessed by using the mRNA encoding ribosomal protein S15. This mRNA contains a sequence within its coding region that is complementary to the 18S rRNA at 20 of 22 nucleotides. RNA from an S15-luciferase fusion construct was translated in a cell-free lysate and compared with the translation of four related constructs that were mutated to decrease complementarity to the 18S rRNA. These mutations did not alter the amino acid sequence or the codon bias. A correlation between complementarity and translation was observed; constructs with less complementarity increased the amount of translation up to 54%. These findings raised the possibility that direct base-pairing of particular mRNAs to rRNAs within ribosomes may function as a mechanism of translational control.
Resumo:
We have identified a class of proteins that bind single-stranded telomeric DNA and are required for the nuclear organization of telomeres and/or telomere-associated proteins. Rlf6p was identified by its sequence similarity to Gbp1p, a single-stranded telomeric DNA-binding protein from Chlamydomonas reinhardtii. Rlf6p and Gbp1p bind yeast single-stranded G-strand telomeric DNA. Both proteins include at least two RNA recognition motifs, which are found in many proteins that interact with single-stranded nucleic acids. Disruption of RLF6 alters the distribution of repressor/activator protein 1 (Rap1p), a telomere-associated protein. In wild-type yeast cells, Rap1p localizes to a small number of perinuclear spots, while in rlf6 cells Rap1p appears diffuse and nuclear. Interestingly, telomere position effect and telomere length control, which require RAP1, are unaffected by rlf6 mutations, demonstrating that Rap1p localization can be uncoupled from other Rap1p-dependent telomere functions. In addition, expression of Chlamydomonas GBP1 restores perinuclear, punctate Rap1p localization in rlf6 mutant cells. The functional complementation of a fungal gene by an algal gene suggests that Rlf6p and Gbp1p are members of a conserved class of single-stranded telomeric DNA-binding proteins that influence nuclear organization. Furthermore, it demonstrates that, despite their unusual codon bias, C. reinhardtii genes can be efficiently translated in Saccharomyces cerevisiae cells.
Resumo:
Saturation mutagenesis is a powerful tool in modern protein engineering, which permits key residues within a protein to be targeted in order to potentially enhance specific functionalities. However, the creation of large libraries using conventional saturation mutagenesis with degenerate codons (NNN or NNK/S) has inherent redundancy and consequent disparities in codon representation. Therefore, both chemical (trinucleotide phosphoramidites) and biological methods (sequential, enzymatic single codon additions) of non-degenerate saturation mutagenesis have been developed in order to combat these issues and so improve library quality. Large libraries with multiple saturated positions can be limited by the method used to screen them. Although the traditional screening method of choice, cell-dependent methods, such as phage display, are limited by the need for transformation. A number of cell-free screening methods, such as CIS display, which link the screened phenotype with the encoded genotype, have the capability of screening libraries with up to 1014 members. This thesis describes the further development of ProxiMAX technology to reduce library codon bias and its integration with CIS display to screen the resulting library. Synthetic MAX oligonucleotides are ligated to an acceptor base sequence, amplified, and digested, subsequently adding a randomised codon to the acceptor, which forms an iterative cycle using the digested product of the previous cycle as the base sequence for the next. Initial use of ProxiMAX highlighted areas of the process where changes could be implemented in order to improve the codon representation in the final library. The refined process was used to construct a monomeric anti-NGF peptide library, based on two proprietary dimeric peptides (Isogenica) that bind NGF. The resulting library showed greatly improved codon representation that equated to a theoretical diversity of ~69%. The library was subsequently screened using CIS display and the discovered peptides assessed for NGF-TrkA inhibition by ELISA. Despite binding to TrkA, these peptides showed lower levels of inhibition of the NGF-TrkA interaction than the parental dimeric peptides, highlighting the importance of dimerization for inhibition of NGF-TrkA binding.
Resumo:
Ribosome profiling (Ribo-seq), a promising technology for exploring ribosome decoding rates, is characterized by the presence of infrequent high peaks in ribosome footprint density and by long alignment gaps. Here, to reduce the impact of data heterogeneity we introduce a simple normalization method, Ribo-seq Unit Step Transformation (RUST). RUST is robust and outperforms other normalization techniques in the presence of heterogeneous noise. We illustrate how RUST can be used for identifying mRNA sequence features that affect ribosome footprint densities globally. We show that a few parameters extracted with RUST are sufficient for predicting experimental densities with high accuracy. Importantly the application of RUST to 30 publicly available Ribo-seq data sets revealed a substantial variation in sequence determinants of ribosome footprint frequencies, questioning the reliability of Ribo-seq as an accurate representation of local ribosome densities without prior quality control. This emphasizes our incomplete understanding of how protocol parameters affect ribosome footprint densities.
Resumo:
We analyzed the codon usage bias of eight open reading frames (ORFs) across up to 79 human papillomavirus (HPV) genotypes from three distinct phylogenetic groups. All eight ORFs across HPV genotypes show a strong codon usage bias, amongst degenerately encoded amino acids, toward 18 codons mainly with T at the 3rd position. For all 18 degenerately encoded amino acids, codon preferences amongst human and animal PV ORFs are significantly different from those averaged across mammalian genes. Across the HPV types, the L2 ORFs show the highest codon usage bias (73.2 +/- 1.6% and the E4 ORFs the lowest (51.1 +/- 0.5%), reflecting as similar bias in codon 3rd position A + T content (L2: 76.1 +/- 4.2%; E4: 58.6 +/- 4.5%). The E4 ORF, uniquely amongst the HPV ORFs, is G + C rich, while the other ORFs are A + T rich. Codon usage bias correlates positively with A + T content at the codon 3rd position in the E2, E6, L1 and L2 ORFs, but negatively in the E4 ORFs. A general conservation of preferred codon usage across human and non-human PV genotypes whether they originate from a same supergroup or not, together with observed difference between the preferred codon usage for HPV ORFs and for genes of the cells they infect, suggests that specific codon usage bias and A + T content variation may somehow increase the replicational fitness of HPVs in mammalian epithelial cells, and have practical implications for gene therapy of HPV infection. (C) 2003 Elsevier B.V. All rights reserved.