8 resultados para Pattern-search methods
em National Center for Biotechnology Information - NCBI
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc.hu/sbase/.
Resumo:
Telomerase, a ribonucleoprotein complex, adds hexameric repeats called “telomeres” to the growing ends of chromosomal DNA. Characterization of mammalian telomerase has been elusive because of its low level of expression. We describe a bioinformatics approach to enrich and characterize the human telomerase complex. Using local sequence homology search methods, we detected similarity of the Tetrahymena p80 subunit of telomerase with the autoantigen Ro60. Antibodies to Ro60 immunoprecipitated the telomerase activity. Ro60 and p80 proteins were cross-recognizable by antibodies to either protein. Telomerase activity and the RNA component of telomerase complex were localized to a doublet in a native gel from the Ro60 antibody-precipitated material. The enriched material showed specific binding to a TTA GGG probe in vitro in an RNA template-dependent manner. Polyclonal antibodies to the doublet also immunoprecipitated the telomerase activity. These results suggest an evolutionary conservation of the telomerase proteins.
Resumo:
Triabin, a 142-residue protein from the saliva of the blood-sucking triatomine bug Triatoma pallidipennis, is a potent and selective thrombin inhibitor. Its stoichiometric complex with bovine α-thrombin was crystallized, and its crystal structure was solved by Patterson search methods and refined at 2.6-Å resolution to an R value of 0.184. The analysis revealed that triabin is a compact one-domain molecule essentially consisting of an eight-stranded β-barrel. The eight strands A to H are arranged in the order A-C-B-D-E-F-G-H, with the first four strands exhibiting a hitherto unobserved up-up-down-down topology. Except for the B-C inversion, the triabin fold exhibits the regular up-and-down topology of lipocalins. In contrast to the typical ligand-binding lipocalins, however, the triabin barrel encloses a hydrophobic core intersected by a unique salt-bridge cluster. Triabin interacts with thrombin exclusively via its fibrinogen-recognition exosite. Surprisingly, most of the interface interactions are hydrophobic. A prominent exception represents thrombin’s Arg-77A side chain, which extends into a hydrophobic triabin pocket forming partially buried salt bridges with Glu-128 and Asp-135 of the inhibitor. The fully accessible active site of thrombin in this complex is in agreement with its retained hydrolytic activity toward small chromogenic substrates. Impairment of thrombin’s fibrinogen converting activity or of its thrombomodulin-mediated protein C activation capacity upon triabin binding is explained by usage of overlapping interaction sites of fibrinogen, thrombomodulin, and triabin on thrombin. These data demonstrate that triabin inhibits thrombin via a novel and unique mechanism that might be of interest in the context of potential therapeutic applications.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
Resumo:
Systemic lupus erythematosus (SLE) is an autoimmune multisystem inflammatory disease characterized by the production of pathogenic autoantibodies. Previous genetic studies have suggested associations with HLA Class II alleles, complement gene deficiencies, and Fc receptor polymorphisms; however, it is likely that other genes contribute to SLE susceptibility and pathogenesis. Here, we report the results of a genome-wide microsatellite marker screen in 105 SLE sib-pair families. By using multipoint nonparametric methods, the strongest evidence for linkage was found near the HLA locus (6p11-p21) [D6S257, logarithm of odds (lod) = 3.90, P = 0.000011] and at three additional regions: 16q13 (D16S415, lod = 3.64, P = 0.000022), 14q21–23 (D14S276, lod = 2.81, P = 0.00016), and 20p12 (D20S186, lod = 2.62, P = 0.00025). Another nine regions (1p36, 1p13, 1q42, 2p15, 2q21–33, 3cent-q11, 4q28, 11p15, and 15q26) were identified with lod scores ≥1.00. These data support the hypothesis that multiple genes, including one in the HLA region, influence susceptibility to human SLE.
Resumo:
To initiate homologous recombination, sequence similarity between two DNA molecules must be searched for and homology recognized. How the search for and recognition of homology occurs remains unproven. We have examined the influences of DNA topology and the polarity of RecA–single-stranded (ss)DNA filaments on the formation of synaptic complexes promoted by RecA. Using two complementary methods and various ssDNA and duplex DNA molecules as substrates, we demonstrate that topological constraints on a small circular RecA–ssDNA filament prevent it from interwinding with its duplex DNA target at the homologous region. We were unable to detect homologous pairing between a circular RecA–ssDNA filament and its relaxed or supercoiled circular duplex DNA targets. However, the formation of synaptic complexes between an invading linear RecA–ssDNA filament and covalently closed circular duplex DNAs is promoted by supercoiling of the duplex DNA. The results imply that a triplex structure formed by non-Watson–Crick hydrogen bonding is unlikely to be an intermediate in homology searching promoted by RecA. Rather, a model in which RecA-mediated homology searching requires unwinding of the duplex DNA coupled with local strand exchange is the likely mechanism. Furthermore, we show that polarity of the invading RecA–ssDNA does not affect its ability to pair and interwind with its circular target duplex DNA.
Resumo:
The emotif database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865–5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T.D. and Brutlag,D.L. (1997) ISMB-97, 5, 202–209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the emotif patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.