948 resultados para pictorial sequences
Resumo:
While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi.shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.
Resumo:
rSNP_Guide is a novel curated database system for analysis of transcription factor (TF) binding to target sequences in regulatory gene regions altered by mutations. It accumulates experimental data on naturally occurring site variants in regulatory gene regions and site-directed mutations. This database system also contains the web tools for SNP analysis, i.e., active applet applying weight matrices to predict the regulatory site candidates altered by a mutation. The current version of the rSNP_Guide is supplemented by six sub-databases: (i) rSNP_DB, on DNA–protein interaction caused by mutation; (ii) SYSTEM, on experimental systems; (iii) rSNP_BIB, on citations to original publications; (iv) SAMPLES, on experimentally identified sequences of known regulatory sites; (v) MATRIX, on weight matrices of known TF sites; (vi) rSNP_Report, on characteristic examples of successful rSNP_Tools implementation. These databases are useful for the analysis of natural SNPs and site-directed mutations. The databases are available through the Web, http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/.
Resumo:
The Homeodomain Resource is an annotated collection of non-redundant protein sequences, three-dimensional structures and genomic information for the homeodomain protein family. Release 3.0 contains 795 full-length homeodomain-containing sequences, 32 experimentally-derived structures and 143 homeobox loci implicated in human genetic disorders. Entries are fully hyperlinked to facilitate easy retrieval of the original records from source databases. A simple search engine with a graphical user interface is provided to query the component databases and assemble customized data sets. A new feature for this release is the addition of DNA recognition sites for all human homeodomain proteins described in the literature. The Homeodomain Resource is freely available through the World Wide Web at http://genome.nhgri.nih.gov/homeodomain.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
Resumo:
When many protein sequences are available for estimating the time of divergence between two species, it is customary to estimate the time for each protein separately and then use the average for all proteins as the final estimate. However, it can be shown that this estimate generally has an upward bias, and that an unbiased estimate is obtained by using distances based on concatenated sequences. We have shown that two concatenation-based distances, i.e., average gamma distance weighted with sequence length (d2) and multiprotein gamma distance (d3), generally give more satisfactory results than other concatenation-based distances. Using these two distance measures for 104 protein sequences, we estimated the time of divergence between mice and rats to be approximately 33 million years ago. Similarly, the time of divergence between humans and rodents was estimated to be approximately 96 million years ago. We also investigated the dependency of time estimates on statistical methods and various assumptions made by using sequence data from eubacteria, protists, plants, fungi, and animals. Our best estimates of the times of divergence between eubacteria and eukaryotes, between protists and other eukaryotes, and between plants, fungi, and animals were 3, 1.7, and 1.3 billion years ago, respectively. However, estimates of ancient divergence times are subject to a substantial amount of error caused by uncertainty of the molecular clock, horizontal gene transfer, errors in sequence alignments, etc.
Resumo:
The product of the herpes simplex virus type 1 UL28 gene is essential for cleavage of concatemeric viral DNA into genome-length units and packaging of this DNA into viral procapsids. To address the role of UL28 in this process, purified UL28 protein was assayed for the ability to recognize conserved herpesvirus DNA packaging sequences. We report that DNA fragments containing the pac1 DNA packaging motif can be induced by heat treatment to adopt novel DNA conformations that migrate faster than the corresponding duplex in nondenaturing gels. Surprisingly, these novel DNA structures are high-affinity substrates for UL28 protein binding, whereas double-stranded DNA of identical sequence composition is not recognized by UL28 protein. We demonstrate that only one strand of the pac1 motif is responsible for the formation of novel DNA structures that are bound tightly and specifically by UL28 protein. To determine the relevance of the observed UL28 protein–pac1 interaction to the cleavage and packaging process, we have analyzed the binding affinity of UL28 protein for pac1 mutants previously shown to be deficient in cleavage and packaging in vivo. Each of the pac1 mutants exhibited a decrease in DNA binding by UL28 protein that correlated directly with the reported reduction in cleavage and packaging efficiency, thereby supporting a role for the UL28 protein–pac1 interaction in vivo. These data therefore suggest that the formation of novel DNA structures by the pac1 motif confers added specificity on recognition of DNA packaging sequences by the UL28-encoded component of the herpesvirus cleavage and packaging machinery.
Resumo:
Three different base paired stems form between U2 and U6 snRNA over the course of the mRNA splicing reaction (helices I, II and III). One possible function of U2/U6 helix II is to facilitate subsequent U2/U6 helix I and III interactions, which participate directly in catalysis. Using an in vitro trans-splicing assay, we investigated the function of sequences located just upstream from the branch site (BS). We find that these upstream sequences are essential for stable binding of U2 to the branch region, and for U2/U6 helix II formation, but not for initial U2/BS pairing. We also show that non-functional upstream sequences cause U2 snRNA stem–loop IIa to be exposed to dimethylsulfate modification, perhaps reflecting a U2 snRNA conformational change and/or loss of SF3b proteins. Our data suggest that initial binding of U2 snRNP to the BS region must be stabilized by an interaction with upstream sequences before U2/U6 helix II can form or U2 stem–loop IIa can participate in spliceosome assembly.
Resumo:
The analysis of a human thyroid serial analysis of gene expression (SAGE) library shows the presence of an abundant SAGE tag corresponding to the mRNA of thyroglobulin (TG). Additional, less abundant tags are present that can not be linked to any other known gene, but show considerable homology to the wild-type TG tag. To determine whether these tags represent TG mRNA molecules with alternative cleavage, 3′-RACE clones were sequenced. The results show that the three putative TG SAGE tags can be attributed to TG transcripts and reflect the use of alternative polyadenylation cleavage sites downstream of a single polyadenylation signal in vivo. By screening more than 300 000 sequences corresponding to human, mouse and rat transcripts for this phenomenon we show that a considerable percentage of mRNA transcripts (44% human, 22% mouse and 22% rat) show cleavage site heterogeneity. When analyzing SAGE-generated expression data, this phenomenon should be considered, since, according to our calculations, 2.8% of human transcripts show two or more different SAGE tags corresponding to a single gene because of alternative cleavage site selection. Both experimental and in silico data show that the selection of the specific cleavage site for poly(A) addition using a given polyadenylation signal is more variable than was previously thought.
Resumo:
Drosophila Armadillo and its mammalian homologue β-catenin are scaffolding proteins involved in the assembly of multiprotein complexes with diverse biological roles. They mediate adherens junction assembly, thus determining tissue architecture, and also transduce Wnt/Wingless intercellular signals, which regulate embryonic cell fates and, if inappropriately activated, contribute to tumorigenesis. To learn more about Armadillo/β-catenin's scaffolding function, we examined in detail its interaction with one of its protein targets, cadherin. We utilized two assay systems: the yeast two-hybrid system to study cadherin binding in the absence of Armadillo/β-catenin's other protein partners, and mammalian cells where interactions were assessed in their presence. We found that segments of the cadherin cytoplasmic tail as small as 23 amino acids bind Armadillo or β-catenin in yeast, whereas a slightly longer region is required for binding in mammalian cells. We used mutagenesis to identify critical amino acids required for cadherin interaction with Armadillo/β-catenin. Expression of such short cadherin sequences in mammalian cells did not affect adherens junctions but effectively inhibited β-catenin–mediated signaling. This suggests that the interaction between β-catenin and T cell factor family transcription factors is a sensitive target for disruption, making the use of analogues of these cadherin derivatives a potentially useful means to suppress tumor progression.
Resumo:
Narrow spectrum antimicrobial activity has been designed to reduce the expression of two essential genes, one coding for the protein subunit of RNase P (C5 protein) and one for gyrase (gyrase A). In both cases, external guide sequences (EGS) have been designed to complex with either mRNA. Using the EGS technology, the level of microbial viability is reduced to less than 10% of the wild-type strain. The EGSs are additive when used together and depend on the number of nucleotides paired when attacking gyrase A mRNA. In the case of gyrase A, three nucleotides unpaired out of a 15-mer EGS still favor complete inhibition by the EGS but five unpaired nucleotides do not.
Resumo:
The terminal regions (last 20 kb) of Saccharomyces cerevisiae chromosomes universally contain blocks of precise sequence similarity to other chromosome terminal regions. The left and right terminal regions are distinct in the sense that the sequence similarities between them are reverse complements. Direct sequence similarity occurs between the left terminal regions and also between the right terminal regions, but not between any left ends and right ends. With minor exceptions the relationships range from 80% to 100% match within blocks. The regions of similarity are composites of familiar and unfamiliar repeated sequences as well as what could be considered “single-copy” (or better “two-copy”) sequences. All terminal regions were compared with all other chromosomes, forward and reverse complement, and 768 comparisons are diagrammed. It appears there has been an extensive history of sequence exchange or copying between terminal regions. The subtelomeric sequences fall into two classes. Seventeen of the chromosome ends terminate with the Y′ repeat, while 15 end with the 800-nt “X2” repeats just adjacent to the telomerase simple repeats. The just-subterminal repeats are very similar to each other except that chromosome 1 right end is more divergent.
Resumo:
Simple phylogenetic tests were applied to a large data set of nucleotide sequences from two nuclear genes and a region of the mitochondrial genome of Trypanosoma cruzi, the agent of Chagas' disease. Incongruent gene genealogies manifest genetic exchange among distantly related lineages of T. cruzi. Two widely distributed isoenzyme types of T. cruzi are hybrids, their genetic composition being the likely result of genetic exchange between two distantly related lineages. The data show that the reference strain for the T. cruzi genome project (CL Brener) is a hybrid. Well-supported gene genealogies show that mitochondrial and nuclear gene sequences from T. cruzi cluster, respectively, in three or four distinct clades that do not fully correspond to the two previously defined major lineages of T. cruzi. There is clear genetic differentiation among the major groups of sequences, but genetic diversity within each major group is low. We estimate that the major extant lineages of T. cruzi have diverged during the Miocene or early Pliocene (3–16 million years ago).
Resumo:
Amino-terminal signal sequences target nascent secretory and membrane proteins to the endoplasmic reticulum for translocation. Subsequent interactions between the signal sequence and components of the translocation machinery at the endoplasmic reticulum are thought to be important for the productive engagement of the translocon by the ribosome-nascent chain complex. However, it is not clear whether all signal sequences carry out these posttargeting steps identically, or if there are differences in the interactions directed by one signal sequence versus another. In this study, we find substantial differences in the ability of signal sequences from different substrates to mediate closure of the ribosome–translocon junction early in translocation. We also show that these differences in some cases necessitate functional coordination between the signal sequence and mature domain for faithful translocation. Accordingly, the translocation of some proteins is sensitive to replacement of their signal sequences. In a particularly dramatic example, the topology of the prion protein was found to depend highly on the choice of signal sequence used to direct its translocation. Taken together, our results reveal an unanticipated degree of substrate-specific functionality encoded in N-terminal signal sequences.
Resumo:
Rearrangements between tandem sequence homologies of various lengths are a major source of genomic change and can be deleterious to the organism. These rearrangements can result in either deletion or duplication of genetic material flanked by direct sequence repeats. Molecular genetic analysis of repetitive sequence instability in Escherichia coli has provided several clues to the underlying mechanisms of these rearrangements. We present evidence for three mechanisms of RecA-independent sequence rearrangements: simple replication slippage, sister-chromosome exchange-associated slippage, and single-strand annealing. We discuss the constraints of these mechanisms and contrast their properties with RecA-dependent homologous recombination. Replication plays a critical role in the two slipped misalignment mechanisms, and difficulties in replication appear to trigger rearrangements via all these mechanisms.