956 resultados para CODON USAGE
Resumo:
Porphyra haitanensis T. J. Chang et B. F. Zheng (Bangiales, Rhodophyta) is cultivated in China and widely consumed in Asia. To gain more insight into its physiological and biochemical properties, we generated 5318 expressed sequence tags (ESTs) from the sporophyte of P. haitanensis, and upon assembling into a nonredundant set, 2535 sequences were obtained, among which only 32.2% (816) shared certain similarity with published sequences (Nr and KOG). Functional classification of such ESTs revealed that most of the transcripts were related to its conservative biological metabolism, and P. haitanensis most likely possesses cyanide-resistant respiration and a C4-like carbon-fixation pathway, both of which have never been reported in a rhodophyte before. Twenty-eight percent of the nonredundant gene clusters exhibited significant similarity to those from P. yezoensis Ueda sporophytes, and 16 genes up-regulated in P. yezoensis sporophytes were also expressed abundantly in P. haitanensis. Codon usage analysis indicated that exposure to high GC pressure might occur during evolution of P. haitanensis. These findings represent the most extensive collection of ESTs from P. haitanensis to date, and all the ESTs in this study have been submitted to GenBank (accession nos. DN604790-DN608469, EG016226-EG018540).
Resumo:
Arthrospira (Spirulina) (Setchell& Gardner) is an important cyanobacterium not only in its nutritional potential but in its special biological characteristics. An unbiased fosmid library of Arthrospira maxima FACHB438 that contains 4300 clones was constructed. The size distribution of insert fragments is from 15.5 to 48.9 kb and the average size is 37.6 kb. The recombination frequency is 100%. Therefore the library is 29.9 equivalents to the Arthrospira genome size of 5.4 Mb. A total of 719 sample clones were randomly chosen from the library and 602 available sequences, which consisted of 307,547 bases, covering 5.70% of the whole genome. The codon usage of A. maxima was not strongly biased. GC content at the first position of codons (46.9%) was higher than the second (39.8%) and the third (45.5%) positions. GC content of the genome was 43.6%. Of these sequences, 287 (47.7%) showed high similarities to known genes, 63 (10.5%) to hypothetical genes and the remaining 252 (41.8%) had no significant similarities. The assigned genes were classified into 22 categories with respect to different biological roles. Remarkably, the high presence of 25 sequences (4.2%) encoding reverse transcriptase indicates the RT gene may have multiple copies in the A. maxima genome and might play an important role in the evolutionary history and metabolic regulation. In addition, the sequences encoding the ATP-binding cassette transport system and the two-component signal transduction system were the second and third most frequent genes, respectively. These genomic features provide some clues as to the mechanisms by which this organism adapts to the high concentration of bicarbonate and to the high pH environment.
Resumo:
BACKGROUND: While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS: Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS: Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Resumo:
Um dos maiores avanços científicos do século XX foi o desenvolvimento de tecnologia que permite a sequenciação de genomas em larga escala. Contudo, a informação produzida pela sequenciação não explica por si só a sua estrutura primária, evolução e seu funcionamento. Para esse fim novas áreas como a biologia molecular, a genética e a bioinformática são usadas para estudar as diversas propriedades e funcionamento dos genomas. Com este trabalho estamos particularmente interessados em perceber detalhadamente a descodificação do genoma efectuada no ribossoma e extrair as regras gerais através da análise da estrutura primária do genoma, nomeadamente o contexto de codões e a distribuição dos codões. Estas regras estão pouco estudadas e entendidas, não se sabendo se poderão ser obtidas através de estatística e ferramentas bioinfomáticas. Os métodos tradicionais para estudar a distribuição dos codões no genoma e seu contexto não providenciam as ferramentas necessárias para estudar estas propriedades à escala genómica. As tabelas de contagens com as distribuições de codões, assim como métricas absolutas, estão actualmente disponíveis em bases de dados. Diversas aplicações para caracterizar as sequências genéticas estão também disponíveis. No entanto, outros tipos de abordagens a nível estatístico e outros métodos de visualização de informação estavam claramente em falta. No presente trabalho foram desenvolvidos métodos matemáticos e computacionais para a análise do contexto de codões e também para identificar zonas onde as repetições de codões ocorrem. Novas formas de visualização de informação foram também desenvolvidas para permitir a interpretação da informação obtida. As ferramentas estatísticas inseridas no modelo, como o clustering, análise residual, índices de adaptação dos codões revelaram-se importantes para caracterizar as sequências codificantes de alguns genomas. O objectivo final é que a informação obtida permita identificar as regras gerais que governam o contexto de codões em qualquer genoma.
Resumo:
Thesis (Ph.D.)--University of Washington, 2014
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
The nucleotide sequence of a genomic DNA fragment thought previously to contain the dihydrofolate reductase gene (DFR1) of Saccharomyces cerevisiae by genetic criteria was determined. This DNA fragment of 1784' basepairs contains a large open reading frame from position 800 to 1432, which encodes a enzyme with a predicted molecular weight of 24,229.8 Daltons. Analysis of the amino acid sequence of this protein revealed that the yeast polypep·tide contained 211 amino acids, compared to the 186 residues commonly found in the polypeptides of other eukaryotes. The difference in size of the gene product can be attributed mainly to an insert in the yeast gene. Within this region, several consensus sequences required for processing of yeast nuclear and class II mitochondrial introns were identified, but appear not sufficient for the RNA splicing. The primary structure of the yeast DHFR protein has considerable sequence homology with analogous polypeptides from other organisms, especially in the consensus residues involved in cofactor and/or inhibitor binding. Analysis of the nucleotide sequence also revealed the presence of a number of canonical sequences identified in yeast as having some function in the regulation of gene expression. These include UAS elements (TGACTC) required for tIle amino acid general control response, and "TATA H boxes as well as several consensus sequences thought to be required for transcriptional termination and polyadenylation. Analysis of the codon usage of the yeast DFRl coding region revealed a codon bias index of 0.0083. this valve very close to zero suggestes 3 that the gene is expressed at a relatively low level under normal physiological conditions. The information concerning the organization of the DFRl were used to construct a variety of fusions of its 5' regulatory region with the coding region of the lacZ gene of E. coli. Some of such fused genes encoded a fusion product that expressed in E.coli and/or in yeast under the control of the 5' regulatory elements of the DFR1. Further studies with these fusion constructions revealed that the beta-galactosidase activity encoded on multicopy plasmids was stimulated transiently by prior exposure of yeast host cells to UV light. This suggests that the yeast PFRl gene is indu.ced by UV light and nlay in1ply a novel function of DHFR protein in the cellular responses to DNA damage. Another novel f~ature of yeast DHFR was revealed during preliminary studies of a diploid strain containing a heterozygous DFRl null allele. The strain was constructed by insertion of a URA3 gene within the coding region of DFR1. Sporulation of this diploid revealed that meiotic products segregated 2:0 for uracil prototrophy when spore clones were germinated on medium supplemented with 5-formyltetrahydrofolate (folinic acid). This finding suggests that, in addition to its catalytic activity, the DFRl gene product nlay play some role in the anabolisln of folinic acid. Alternatively, this result may indicate that Ura+ haploid segregants were inviable and suggest that the enzyme has an essential cellular function in this species.
Resumo:
We characterized four eEF1A genes in the alternative rhabditid nematode model organism Oscheius tipulae. This is twice the copy number of eEF1A genes in C. elegans, C. briggsae, and, probably, many other free-living and parasitic nematodes. The introns show features remarkably different from those of other metazoan eEF1A genes. Most of the introns in the eEF1A genes are specific to O. tipulae and are not shared with any of the other genes described in metazoans. Most of the introns are phase 0 (inserted between two codons), and few are inserted in protosplice sites (introns inserted between the nucleotide sequence A/CAG and G/A). Two of these phase 0 introns are conserved in sequence in two or more of the four eEF1A gene copies, and are inserted in the same position in the genes. Neither of these characteristics has been detected in any of the nematode eEF1A genes characterized to date. The coding sequences were also compared with other eEF1A cDNAs from 11 different nematodes to determine the variability of these genes within the phylum Nematoda. Parsimony and distance trees yielded similar topologies, which were similar to those created using other molecular markers. The presence of more than one copy of the eEF1A gene with nearly identical coding regions makes it difficult to define the orthologous cDNAs. As shown by our data on O. tipulae, careful and extensive examination of intron positions in the eEF1A gene across the phylum is necessary to define their potential for use as valid phylogenetic markers.
Resumo:
A total of 3,631 expressed sequence tags (ESTs) were established from two size-selected cDNA libraries made from the tetrasporophytic phase of the agarophytic red alga Gracilaria tenuistipitata. The average sizes of the inserts in the two libraries were 1,600 bp and 600 bp, with an average length of the edited sequences of 850 bp. Clustering gave 2,387 assembled sequences with a redundancy of 53%. Of the ESTs, 65% had significant matches to sequences deposited in public databases, 11% to proteins without known function, and 35% were novel. The most represented ESTs were a Na/K-transporting ATPase, a hedgehog-like protein, a glycine dehydrogenase and an actin. Most of the identified genes were involved in primary metabolism and housekeeping. The largest functional group was thus genes involved in metabolism with 14% of the ESTs; other large functional categories included energy, transcription, and protein synthesis and destination. The codon usage was examined using a subset of the data, and the codon bias was found to be limited with all codon combinations used.
Resumo:
Characterization of Human Respiratory Syncytial Virus (HRSV) protein interactions with host cell components is crucial to devise antiviral strategies. Viral nucleoprotein, phosphoprotein and matrix protein genes were optimized for human codon usage and cloned into expression vectors. HEK-293T cells were transfected with these vectors, viral proteins were immunoprecipitated, and co-immunoprecipitated cellular proteins were identified through mass spectrometry. Cell proteins identified with higher confidence scores were probed in the immunoprecipitation using specific antibodies. The results indicate that nucleoprotein interacts with arginine methyl-transferase, methylosome protein and Hsp70. Phosphoprotein interacts with Hsp70 and tropomysin, and matrix with tropomysin and nucleophosmin. Additionally, we performed immunoprecipitation of these cellular proteins in cells infected with HRSV, followed by detection of co-immunoprecipitated viral proteins. The results indicate that these interactions also occur in the context of viral infection, and their potential contribution for a HRSV replication model is discussed.
Resumo:
On the basis of the sequence of the mitochondrial genome in the flowering plant Arabidopsis thaliana, RNA editing events were systematically investigated in the respective RNA population. A total of 456 C to U, but no U to C, conversions were identified exclusively in mRNAs, 441 in ORFs, 8 in introns, and 7 in leader and trailer sequences. No RNA editing was seen in any of the rRNAs or in several tRNAs investigated for potential mismatch corrections. RNA editing affects individual coding regions with frequencies varying between 0 and 18.9% of the codons. The predominance of RNA editing events in the first two codon positions is not related to translational decoding, because it is not correlated with codon usage. As a general effect, RNA editing increases the hydrophobicity of the coded mitochondrial proteins. Concerning the selection of RNA editing sites, little significant nucleotide preference is observed in their vicinity in comparison to unedited C residues. This sequence bias is, per se, not sufficient to specify individual C nucleotides in the total RNA population in Arabidopsis mitochondria.
Resumo:
Comparisons of codon frequencies of genes to several gene classes are used to characterize highly expressed and alien genes on the Synechocystis PCC6803 genome. The primary gene classes include the ensemble of all genes (average gene), ribosomal protein (RP) genes, translation processing factors (TF) and genes encoding chaperone/degradation proteins (CH). A gene is predicted highly expressed (PHX) if its codon usage is close to that of the RP/TF/CH standards but strongly deviant from the average gene. Putative alien (PA) genes are those for which codon usage is significantly different from all four classes of gene standards. In Synechocystis, 380 genes were identified as PHX. The genes with the highest predicted expression levels include many that encode proteins vital for photosynthesis. Nearly all of the genes of the RP/TF/CH gene classes are PHX. The principal glycolysis enzymes, which may also function in CO2 fixation, are PHX, while none of the genes encoding TCA cycle enzymes are PHX. The PA genes are mostly of unknown function or encode transposases. Several PA genes encode polypeptides that function in lipopolysaccharide biosynthesis. Both PHX and PA genes often form significant clusters (operons). The proteins encoded by PHX and PA genes are described with respect to functional classifications, their organization in the genome and their stoichiometry in multi-subunit complexes.
Resumo:
Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.
Resumo:
With global heavy metal contamination increasing, plants that can process heavy metals might provide efficient and ecologically sound approaches to sequestration and removal. Mercuric ion reductase, MerA, converts toxic Hg2+ to the less toxic, relatively inert metallic mercury (Hg0) The bacterial merA sequence is rich in CpG dinucleotides and has a highly skewed codon usage, both of which are particularly unfavorable to efficient expression in plants. We constructed a mutagenized merA sequence, merApe9, modifying the flanking region and 9% of the coding region and placing this sequence under control of plant regulatory elements. Transgenic Arabidopsis thaliana seeds expressing merApe9 germinated, and these seedlings grew, flowered, and set seed on medium containing HgCl2 concentrations of 25-100 microM (5-20 ppm), levels toxic to several controls. Transgenic merApe9 seedlings evolved considerable amounts of Hg0 relative to control plants. The rate of mercury evolution and the level of resistance were proportional to the steady-state mRNA level, confirming that resistance was due to expression of the MerApe9 enzyme. Plants and bacteria expressing merApe9 were also resistant to toxic levels of Au3+. These and other data suggest that there are potentially viable molecular genetic approaches to the phytoremediation of metal ion pollution.
Resumo:
The codon usage of a hybrid bacterial gene encoding a thermostable (1,3-1,4)-beta-glucanase was modified to match that of the barley (1,3-1,4)-beta-glucanase isoenzyme EII gene. Both the modified and unmodified bacterial genes were fused to a DNA segment encoding the barley high-pI alpha-amylase signal peptide downstream of the barley (1,3-1,4)-beta-glucanase isoenzyme EII gene promoter. When introduced into barley aleurone protoplasts, the bacterial gene with adapted codon usage directed synthesis of heat stable (1,3-1,4)-beta-glucanase, whereas activity of the heterologous enzyme was not detectable when protoplasts were transfected with the unmodified gene. In a different expression plasmid, the codon modified bacterial gene was cloned downstream of the barley high-pI alpha-amylase gene promoter and signal peptide coding region. This expression cassette was introduced into immature barley embryos together with plasmids carrying the bar and the uidA genes. Green, fertile plants were regenerated and approximately 75% of grains harvested from primary transformants synthesized thermostable (1,3-1,4)-beta-glucanase during germination. All three trans genes were detected in 17 progenies from a homozygous T1 plant.