132 resultados para sequence similarity searches

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-incompatibility in Brassica is controlled by a single multi-allelic locus (S locus), which contains at least two highly polymorphic genes expressed in the stigma: an S glycoprotein gene (SLG) and an S receptor kinase gene (SRK). The putative ligand-binding domain of SRK exhibits high homology to the secretory protein SLG, and it is believed that SLG and SRK form an active receptor kinase complex with a self-pollen ligand, which leads to the rejection of self-pollen. Here, we report 31 novel SLG sequences of Brassica oleracea and Brassica campestris. Sequence comparisons of a large number of SLG alleles and SLG-related genes revealed the following points. (i) The striking sequence similarity observed in an inter-specific comparison (95.6% identity between SLG14 of B. oleracea and SLG25 of B. campestris in deduced amino acid sequence) suggests that SLG diversification predates speciation. (ii) A perfect match of the sequences in hypervariable regions, which are thought to determine S specificity in an intra-specific comparison (SLG8 and SLG46 of B. campestris) and the observation that the hypervariable regions of SLG and SRK of the same S haplotype were not necessarily highly similar suggests that SLG and SRK bind different sites of the pollen ligand and that they together determine S specificity. (iii) Comparison of the hypervariable regions of SLG alleles suggests that intragenic recombination, together with point mutations, has contributed to the generation of the high level of sequence variation in SLG alleles. Models for the evolution of SLG/SRK are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have previously shown that both a centromere (CEN) and a replication origin are necessary for plasmid maintenance in the yeast Yarrowia lipolytica (Vernis et al., 1997). Because of this requirement, only a small number of centromere-proximal replication origins have been isolated from Yarrowia. We used a CEN-based plasmid to obtain noncentromeric origins, and several new fragments, some unique and some repetitive sequences, were isolated. Some of them were analyzed by two-dimensional gel electrophoresis and correspond to actual sites of initiation (ORI) on the chromosome. We observed that a 125-bp fragment is sufficient for a functional ORI on plasmid, and that chromosomal origins moved to ectopic sites on the chromosome continue to act as initiation sites. These Yarrowia origins share an 8-bp motif, which is not essential for origin function on plasmids. The Yarrowia origins do not display any obvious common structural features, like bent DNA or DNA unwinding elements, generally present at or near eukaryotic replication origins. Y. lipolytica origins thus share features of those in the unicellular Saccharomyces cerevisiae and in multicellular eukaryotes: they are discrete and short genetic elements without sequence similarity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A whole genome cattle-hamster radiation hybrid cell panel was used to construct a map of 54 markers located on bovine chromosome 5 (BTA5). Of the 54 markers, 34 are microsatellites selected from the cattle linkage map and 20 are genes. Among the 20 mapped genes, 10 are new assignments that were made by using the comparative mapping by annotation and sequence similarity strategy. A LOD-3 radiation hybrid framework map consisting of 21 markers was constructed. The relatively low retention frequency of markers on this chromosome (19%) prevented unambiguous ordering of the other 33 markers. The length of the map is 398.7 cR, corresponding to a ratio of ≈2.8 cR5,000/cM. Type I genes were binned for comparison of gene order among cattle, humans, and mice. Multiple internal rearrangements within conserved syntenic groups were apparent upon comparison of gene order on BTA5 and HSA12 and HSA22. A similarly high number of rearrangements were observed between BTA5 and MMU6, MMU10, and MMU15. The detailed comparative map of BTA5 should facilitate identification of genes affecting economically important traits that have been mapped to this chromosome and should contribute to our understanding of mammalian chromosome evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One gene locus on chromosome I in Saccharomyces cerevisiae encodes a protein (YAB5_YEAST; accession no. P31378) with local sequence similarity to the DNA repair glycosylase endonuclease III from Escherichia coli. We have analyzed the function of this gene, now assigned NTG1 (endonuclease three-like glycosylase 1), by cloning, mutant analysis, and gene expression in E. coli. Targeted gene disruption of NTG1 produces a mutant that is sensitive to H2O2 and menadione, indicating that NTG1 is required for repair of oxidative DNA damage in vivo. Northern blot analysis and expression studies of a NTG1-lacZ gene fusion showed that NTG1 is induced by cell exposure to different DNA damaging agents, particularly menadione, and hence belongs to the DNA damage-inducible regulon in S. cerevisiae. When expressed in E. coli, the NTG1 gene product cleaves plasmid DNA damaged by osmium tetroxide, thus, indicating specificity for thymine glycols in DNA similarly as is the case for EndoIII. However, NTG1 also releases formamidopyrimidines from DNA with high efficiency and, hence, represents a glycosylase with a novel range of substrate recognition. Sequences similar to NTG1 from other eukaryotes, including Caenorhabditis elegans, Schizosaccharomyces pombe, and mammals, have recently been entered in the GenBank suggesting the universal presence of NTG1-like genes in higher organisms. S. cerevisiae NTG1 does not have the [4Fe-4S] cluster DNA binding domain characteristic of the other members of this family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adenosine kinase catalyzes the phosphorylation of adenosine to AMP and hence is a potentially important regulator of extracellular adenosine concentrations. Despite extensive characterization of the kinetic properties of the enzyme, its primary structure has never been elucidated. Full-length cDNA clones encoding catalytically active adenosine kinase were obtained from lymphocyte, placental, and liver cDNA libraries. Corresponding mRNA species of 1.3 and 1.8 kb were noted on Northern blots of all tissues examined and were attributable to alternative polyadenylylation sites at the 3' end of the gene. The encoding protein consists of 345 amino acids with a calculated molecular size of 38.7 kDa and does not contain any sequence similarities to other well-characterized mammalian nucleoside kinases, setting it apart from this family of structurally and functionally related proteins. In contrast, two regions were identified with significant sequence identity to microbial ribokinase and fructokinases and a bacterial inosine/guanosine kinase. Thus, adenosine kinase is a structurally distinct mammalian nucleoside kinase that appears to be akin to sugar kinases of microbial origin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K homology (KH) module is a widespread RNA-binding motif that has been detected by sequence similarity searches in such proteins as heterogeneous nuclear ribonucleoprotein K (hnRNP K) and ribosomal protein S3. Analysis of spatial structures of KH domains in hnRNP K and S3 reveals that they are topologically dissimilar and thus belong to different protein folds. Thus KH motif proteins provide a rare example of protein domains that share significant sequence similarity in the motif regions but possess globally distinct structures. The two distinct topologies might have arisen from an ancestral KH motif protein by N- and C-terminal extensions, or one of the existing topologies may have evolved from the other by extension, displacement and deletion. C-terminal extension (deletion) requires β-sheet rearrangement through the insertion (removal) of a β-strand in a manner similar to that observed in serine protease inhibitors serpins. Current analysis offers a new look on how proteins can change fold in the course of evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cosmids from the 1A3–1A10 region of the complete miniset were individually subcloned by using the vector M13 mp18. Sequences of each cosmid were assembled from about 400 DNA fragments generated from the ends of these phage subclones and merged into one 189-kb contig. About 160 ORFs identified by the CodonUse program were subjected to similarity searches. The biological functions of 80 ORFs could be assigned reliably by using the WIT and Magpie genome investigation tools. Eighty percent of these recognizable ORFs were organized in functional clusters, which simplified assignment decisions and increased the strength of the predictions. A set of 26 genes for cobalamin biosynthesis, genes for polyhydroxyalkanoic acid metabolism, DNA replication and recombination, and DNA gyrase were among those identified. Most of the ORFs lacking significant similarity with reference databases also were grouped. There are two large clusters of these ORFs, one located between 45 and 67 kb of the map, and the other between 150 and 183 kb. Nine of the loosely identified ORFs (of 15) of the first of these clusters match ORFs from phages or transposons. The other cluster also has four ORFs of possible phage origin.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An increasing number of proteins with weak sequence similarity have been found to assume similar three-dimensional fold and often have similar or related biochemical or biophysical functions. We propose a method for detecting the fold similarity between two proteins with low sequence similarity based on their amino acid properties alone. The method, the proximity correlation matrix (PCM) method, is built on the observation that the physical properties of neighboring amino acid residues in sequence at structurally equivalent positions of two proteins of similar fold are often correlated even when amino acid sequences are different. The hydrophobicity is shown to be the most strongly correlated property for all protein fold classes. The PCM method was tested on 420 proteins belonging to 64 different known folds, each having at least three proteins with little sequence similarity. The method was able to detect fold similarities for 40% of the 420 sequences. Compared with sequence comparison and several fold-recognition methods, the method demonstrates good performance in detecting fold similarities among the proteins with low sequence identity. Applied to the complete genome of Methanococcus jannaschii, the method recognized the folds for 22 hypothetical proteins.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the scop database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536–540]. The evaluation tested the programs blast [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403–410], wu-blast2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460–480], fasta [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444–2448], and ssearch [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195–197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of ssearch and fasta are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by blast and wu-blast2 exaggerate significance by orders of magnitude. ssearch, fasta ktup = 1, and wu-blast2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20–30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The terminal regions (last 20 kb) of Saccharomyces cerevisiae chromosomes universally contain blocks of precise sequence similarity to other chromosome terminal regions. The left and right terminal regions are distinct in the sense that the sequence similarities between them are reverse complements. Direct sequence similarity occurs between the left terminal regions and also between the right terminal regions, but not between any left ends and right ends. With minor exceptions the relationships range from 80% to 100% match within blocks. The regions of similarity are composites of familiar and unfamiliar repeated sequences as well as what could be considered “single-copy” (or better “two-copy”) sequences. All terminal regions were compared with all other chromosomes, forward and reverse complement, and 768 comparisons are diagrammed. It appears there has been an extensive history of sequence exchange or copying between terminal regions. The subtelomeric sequences fall into two classes. Seventeen of the chromosome ends terminate with the Y′ repeat, while 15 end with the 800-nt “X2” repeats just adjacent to the telomerase simple repeats. The just-subterminal repeats are very similar to each other except that chromosome 1 right end is more divergent.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Several recent reports indicate that mobile elements are frequently found in and flanking many wild-type plant genes. To determine the extent of this association, we performed computer-based systematic searches to identify mobile elements in the genes of two "model" plants, Oryza sativa (domesticated rice) and Arabidopsis thaliana. Whereas 32 common sequences belonging to nine putative mobile element families were found in the noncoding regions of rice genes, none were found in Arabidopsis genes. Five of the nine families (Gaijin, Castaway, Ditto, Wanderer, and Explorer) are first described in this report, while the other four were described previously (Tourist, Stowaway, p-SINE1, and Amy/LTP). Sequence similarity, structural similarity, and documentation of past mobility strongly suggests that many of the rice common sequences are bona fide mobile elements. Members of four of the new rice mobile element families are similar in some respects to members of the previously identified inverted-repeat element families, Tourist and Stowaway. Together these elements are the most prevalent type of transposons found in the rice genes surveyed and form a unique collection of inverted-repeat transposons we refer to as miniature inverted-repeat transposable elements or MITEs. The sequence and structure of MITEs are clearly distinct from short or long interspersed nuclear elements (SINEs or LINEs), the most common transposable elements associated with mammalian nuclear genes. Mobile elements, therefore, are associated with both animal and plant genes, but the identity of these elements is strikingly different.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Molecular mimicry, normally defined by the level of primary-sequence similarities between self and foreign antigens, has been considered a key element in the pathogenesis of autoimmunity. Here we describe an example of molecular mimicry between two overlapping peptides within a single self-antigen, both of which are recognized by the same human self-reactive T-cell clone. Two intervening peptides did not stimulate the T-cell clone, even though they share nine amino acids with the stimulatory peptides. Molecular modeling of major histocompatibility complex class II-peptide complexes suggests that both of the recognized peptides generate similar antigenic surfaces, although these are composed of different sets of amino acids. The molecular modeling of a peptide shifted one residue from the stimulatory peptide, which was recognized in the context of the same HLA molecule by another T-cell clone, generated a completely different antigenic surface. Functional studies using truncated peptides confirmed that the anchor residues of the two "mimicking" epitopes in the HLA groove differ. Our results show, for two natural epitopes, how molecular mimicry can occur and suggest that studies of potential antigenic surfaces, rather than sequence similarity, are necessary for analyzing suspected peptide mimicry.