969 resultados para coding sequence
Resumo:
Background Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Methods Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. Results We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. Conclusions The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.
Resumo:
The Schwalbenberg II loess-paleosol sequence (LPS) denotes a key site for Marine Isotope Stage (MIS 3) in Western Europe owing to eight succeeding cambisols, which primarily constitute the Ahrgau Subformation. Therefore, this LPS qualifies as a test candidate for the potential of temporal high-resolution geochemical data obtained X-ray fluorescence (XRF) scanning of discrete samplesproviding a fast and non-destructive tool for determining the element composition. The geochemical data is first contextualized to existing proxy data such as magnetic susceptibility (MS) and organic carbon (Corg) and then aggregated to element log ratios characteristic for weathering intensity [LOG (Ca/Sr), LOG (Rb/Sr), LOG (Ba/Sr), LOG (Rb/K)] and dust provenance [LOG (Ti/Zr), LOG (Ti/Al), LOG (Si/Al)]. Generally, an interpretation of rock magnetic particles is challenged in western Europe, where not only magnetic enhancement but also depletion plays a role. Our data indicates leaching and top-soil erosion induced MS depletion at the Schwalbenberg II LPS. Besides weathering, LOG (Ca/Sr) is susceptible for secondary calcification. Thus, also LOG (Rb/Sr) and LOG (Ba/Sr) are shown to be influenced by calcification dynamics. Consequently, LOG (Rb/K) seems to be the most suitable weathering index identifying the Sinzig Soils S1 and S2 as the most pronounced paleosols for this site. Sinzig Soil S3 is enclosed by gelic gleysols and in contrast to S1 and S2 only initially weathered pointing to colder climate conditions. Also the Remagen Soils are characterized by subtle to moderate positive excursions in the weathering indices. Comparing the Schwalbenberg II LPS with the nearby Eifel Lake Sediment Archive (ELSA) and other more distant German, Austrian and Czech LPS while discussing time and climate as limiting factors for pedogenesis, we suggest that the lithologically determined paleosols are in-situ soil formations. The provenance indices document a Zr-enrichment at the transition from the Ahrgau to the Hesbaye Subformation. This is explained by a conceptual model incorporating multiple sediment recycling and sorting effects in eolian and fluvial domains.
Resumo:
A novel scheme for depth sequences compression, based on a perceptual coding algorithm, is proposed. A depth sequence describes the object position in the 3D scene, and is used, in Free Viewpoint Video, for the generation of synthetic video sequences. In perceptual video coding the human visual system characteristics are exploited to improve the compression efficiency. As depth sequences are never shown, the perceptual video coding, assessed over them, is not effective. The proposed algorithm is based on a novel perceptual rate distortion optimization process, assessed over the perceptual distortion of the rendered views generated through the encoded depth sequences. The experimental results show the effectiveness of the proposed method, able to obtain a very considerable improvement of the rendered view perceptual quality.
Resumo:
Fractionation of the abundant small ribonucleoproteins (RNPs) of the trypanosomatid Leptomonas collosoma revealed the existence of a group of unidentified small RNPs that were shown to fractionate differently than the well-characterized trans-spliceosomal RNPs. One of these RNAs, an 80-nt RNA, did not possess a trimethylguanosine (TMG) cap structure but did possess a 5′ phosphate terminus and an invariant consensus U5 snRNA loop 1. The gene coding for the RNA was cloned, and the coding region showed 55% sequence identity to the recently described U5 homologue of Trypanosoma brucei [Dungan, J. D., Watkins, K. P. & Agabian, N. (1996) EMBO J. 15, 4016–4029]. The L. collosoma U5 homologue exists in multiple forms of RNP complexes, a 10S monoparticle, and two subgroups of 18S particles that either contain or lack the U4 and U6 small nuclear RNAs, suggesting the existence of a U4/U6⋅U5 tri-small nuclear RNP complex. In contrast to T. brucei U5 RNA (62 nt), the L. collosoma homologue is longer (80 nt) and possesses a second stem–loop. Like the trypanosome U3, U6, and 7SL RNA genes, a tRNA gene coding for tRNACys was found 98 nt upstream to the U5 gene. A potential for base pair interaction between U5 and SL RNA in the 5′ splice site region (positions −1 and +1) and downstream from it is proposed. The presence of a U5-like RNA in trypanosomes suggests that the most essential small nuclear RNPs are ubiquitous for both cis- and trans-splicing, yet even among the trypanosomatids the U5 RNA is highly divergent.
Resumo:
The influenza C virus CM2 protein is a small glycosylated integral membrane protein (115 residues) that spans the membrane once and contains a cleavable signal sequence at its N terminus. The coding region for CM2 (CM2 ORF) is located at the C terminus of the 342-amino acid (aa) ORF of a colinear mRNA transcript derived from influenza C virus RNA segment 6. Splicing of the colinear transcript introduces a translational stop codon into the ORF and the spliced mRNA encodes the viral matrix protein (CM1) (242 aa). The mechanism of CM2 translation was investigated by using in vitro and in vivo translation of RNA transcripts. It was found that the colinear mRNA derived from influenza C virus RNA segment 6 serves as the mRNA for CM2. Furthermore, CM2 translation does not depend on any of the three in-frame methionine residues located at the beginning of CM2 ORF. Rather, CM2 is a proteolytic cleavage product of the p42 protein product encoded by the colinear mRNA: a cleavage event that involves the recognition and cleavage of an internal signal peptide presumably by signal peptidase resident in the endoplasmic reticulum. Alteration of the predicted signal peptidase cleavage site by mutagenesis blocked generation of CM2. The other polypeptide species resulting from the cleavage of p42, designated p31, contains the CM1 coding region and an additional C-terminal 17 aa (formerly the CM2 signal peptide). Protein p31, in comparison to CM1, displays characteristics of an integral membrane protein.
Resumo:
The intensely studied MHC has become the paradigm for understanding the architectural evolution of vertebrate multigene families. The 4-Mb human MHC (also known as the HLA complex) encodes genes critically involved in the immune response, graft rejection, and disease susceptibility. Here we report the continuous 1,796,938-bp genomic sequence of the HLA class I region, linking genes between MICB and HLA-F. A total of 127 genes or potentially coding sequences were recognized within the analyzed sequence, establishing a high gene density of one per every 14.1 kb. The identification of 758 microsatellite provides tools for high-resolution mapping of HLA class I-associated disease genes. Most importantly, we establish that the repeated duplication and subsequent diversification of a minimal building block, MIC-HCGIX-3.8–1-P5-HCGIV-HLA class I-HCGII, engendered the present-day MHC. That the currently nonessential HLA-F and MICE genes have acted as progenitors to today’s immune-competent HLA-ABC and MICA/B genes provides experimental evidence for evolution by “birth and death,” which has general relevance to our understanding of the evolutionary forces driving vertebrate multigene families.
Resumo:
The cell matrix adhesion regulator (CMAR) gene has been suggested to be a signal transduction molecule influencing cell adhesion to collagen and, through this, possibly involved in tumor suppression. The originally reported CMAR cDNA was 464 bp long with a tyrosine phosphorylation site at the extreme 3′ end, which mutagenesis studies had shown to be central to the function of this gene. Since the discovery of a 4-bp insertion polymorphism within the originally reported coding region, further sequence information has been obtained. The cDNA has been extended 5′ by ≈2 kb revealing a 559-bp region showing strong homology to the proposed 5′ untranslated sequence of a murine protein kinase receptor family member, variant in kinase (vik). CMAR genomic sequencing has shown the presence of an intron, the intron/exon boundary lying within this region of homology. An RNA transcript for CMAR of ≈2.5 kb has also been identified. The data suggest complex mechanisms for control of expression of two closely associated genes, CMAR and the vik- associated sequence.
Resumo:
The genome of the Kaposi sarcoma-associated herpesvirus (KSHV or HHV8) was mapped with cosmid and phage genomic libraries from the BC-1 cell line. Its nucleotide sequence was determined except for a 3-kb region at the right end of the genome that was refractory to cloning. The BC-1 KSHV genome consists of a 140.5-kb-long unique coding region flanked by multiple G+C-rich 801-bp terminal repeat sequences. A genomic duplication that apparently arose in the parental tumor is present in this cell culture-derived strain. At least 81 ORFs, including 66 with homology to herpesvirus saimiri ORFs, and 5 internal repeat regions are present in the long unique region. The virus encodes homologs to complement-binding proteins, three cytokines (two macrophage inflammatory proteins and interleukin 6), dihydrofolate reductase, bcl-2, interferon regulatory factors, interleukin 8 receptor, neural cell adhesion molecule-like adhesin, and a D-type cyclin, as well as viral structural and metabolic proteins. Terminal repeat analysis of virus DNA from a KS lesion suggests a monoclonal expansion of KSHV in the KS tumor.
Resumo:
Gephyrin is essential for both the postsynaptic localization of inhibitory neurotransmitter receptors in the central nervous system and the biosynthesis of the molybdenum cofactor (Moco) in different peripheral organs. Several alternatively spliced gephyrin transcripts have been identified in rat brain that differ in their 5′ coding regions. Here, we describe gephyrin splice variants that are differentially expressed in non-neuronal tissues and different regions of the adult mouse brain. Analysis of the murine gephyrin gene indicates a highly mosaic organization, with eight of its 29 exons corresponding to the alternatively spliced regions identified by cDNA sequencing. The N- and C-terminal domains of gephyrin encoded by exons 3–7 and 16–29, respectively, display sequence similarities to bacterial, invertebrate, and plant proteins involved in Moco biosynthesis, whereas the central exons 8, 13, and 14 encode motifs that may mediate oligomerization and tubulin binding. Our data are consistent with gephyrin having evolved from a Moco biosynthetic protein by insertion of protein interaction sequences.
Resumo:
We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ≈270 million years ago and the γ subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ≈1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.
Resumo:
β-Galactosidases (EC 3.2.1.23) constitute a widespread family of enzymes characterized by their ability to hydrolyze terminal, nonreducing β-d-galactosyl residues from β-d-galactosides. Several β-galactosidases, sometimes referred to as exo-galactanases, have been purified from plants and shown to possess in vitro activity against extracted cell wall material via the release of galactose from wall polymers containing β(1→4)-d-galactan. Although β-galactosidase II, a protein present in tomato (Lycopersicon esculentum Mill.) fruit during ripening and capable of degrading tomato fruit galactan, has been purified, cloning of the corresponding gene has been elusive. We report here the cloning of a cDNA, pTomβgal 4 (accession no. AF020390), corresponding to β-galactosidase II, and show that its corresponding gene is expressed during fruit ripening. Northern-blot analysis revealed that the β-galactosidase II gene transcript was detectable at the breaker stage of ripeness, maximum at the turning stage, and present at decreasing levels during the later stages of normal tomato fruit ripening. At the turning stage of ripeness, the transcript was present in all fruit tissues and was highest in the outermost tissues (including the peel). Confirmation that pTomβgal 4 codes for β-galactosidase II was derived from matching protein and deduced amino acid sequences. Furthermore, analysis of the deduced amino acid sequence of pTomβgal 4 suggested a high probability for secretion based on the presence of a hydrophobic leader sequence, a leader-sequence cleavage site, and three possible N-glycosylation sites. The predicted molecular mass and isoelectric point of the pTomβgal 4-encoded mature protein were similar to those reported for the purified β-galactosidase II protein from tomato fruit.
Resumo:
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
Resumo:
We had earlier identified the pcnB locus as the gene for the major Escherichia coli poly(A) polymerase (PAP I). In this report, we describe the disruption and identification of a candidate gene for a second poly(A) polymerase (PAP II) by an experimental strategy which was based on the assumption that the viability of E. coli depends on the presence of either PAP I or PAP II. The coding region thus identified is the open reading frame f310, located at about 87 min on the E. coli chromosome. The following lines of evidence support f310 as the gene for PAP II: (i) the deduced peptide encoded by f310 has a molecular weight of 36,300, similar to the molecular weight of 35,000 estimated by gel filtration of PAP II; (ii) the deduced f310 product is a relatively hydrophobic polypeptide with a pI of 9.4, consistent with the properties of partially purified PAP II; (iii) overexpression of f310 leads to the formation of inclusion bodies whose solubilization and renaturation yields poly(A) polymerase activity that corresponds to a 35-kDa protein as shown by enzyme blotting; and (iv) expression of a f310 fusion construct with hexahistidine at the N-terminus of the coding region allowed purification of a poly(A) polymerase fraction whose major component is a 36-kDa protein. E. coli PAP II has no significant sequence homology either to PAP I or to the viral and eukaryotic poly(A) polymerases, suggesting that the bacterial poly(A) polymerases have evolved independently. An interesting feature of the PAP II sequence is the presence of sets of two paired cysteine and histidine residues that resemble the RNA binding motifs seen in some other proteins.
Resumo:
Few promoters are active at high levels in all cells. Of these, the majority encode structural RNAs transcribed by RNA polymerases I or III and are not accessible for the expression of proteins. An exception are the small nuclear RNAs (snRNAs) transcribed by RNA polymerase II. Although snRNA biosynthesis is unique and thought not to be compatible with synthesis of functional mRNA, we have tested these promoters for their ability to express functional mRNAs. We have used the murine U1a and U1b snRNA gene promoters to express the Escherichia coli lacZ gene and the human alpha-globin gene from either episomal or integrated templates by transfection, or infection into a variety of mammalian cell types. Equivalent expression of beta-galactosidase was obtained from < 250 nucleotides of 5'-flanking sequence containing the complete promoter of either U1 snRNA gene or from the 750-nt cytomegalovirus promoter and enhancer regions. The mRNA was accurately initiated at the U1 start site, efficiently spliced and polyadenylylated, and localized to polyribosomes. Recombinant adenovirus containing the U1b-lacZ chimeric gene transduced and expressed beta-galactosidase efficiently in human 293 cells and airway epithelial cells in culture. Viral vectors containing U1 snRNA promoters may be an attractive alternative to vectors containing viral promoters for persistent high-level expression of therapeutic genes or proteins.
Resumo:
The phenomenon of RNA editing has been found to occur in chloroplasts of several angiosperm plants. Comparative analysis of the entire nucleotide sequence of a gymnosperm [Pinus thunbergii (black pine)] chloroplast genome allowed us to predict several potential editing sites in its transcripts. Forty-nine such sites from 14 genes/ORFs were analyzed by sequencing both cDNAs from the transcripts and the corresponding chloroplast DNA regions, and 26 RNA editing sites were identified in the transcripts from 12 genes/ORFs, indicating that chloroplast RNA editing is not restricted to angiosperms but occurs in the gymnosperm, too. All the RNA editing events are C-to-U conversions; however, many new codon substitutions and creation of stop codons that have not so far been reported in angiosperm chloroplasts were observed. The most striking is that two editing events result in the creation of an initiation and a stop codon within a single transcript, leading to the formation of a new reading frame of 33 codons. The predicted product is highly homologous to that deduced from the ycf7 gene (ORF31), which is conserved in the chloroplast genomes of many other plant species.