980 resultados para Sequence-similarity searches


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Karwath, A. King, R. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics. 23rd April 2002. 3:11 Additional File Describes the title organims species declaration in one string [http://www.biomedcentral.com/content/supplementary/1471- 2105-3-11-S1.doc] Sponsorship: Andreas Karwath and Ross D. King were supported by the EPSRC grant GR/L62849.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cytochrome P450 (CYP450) is a class of enzymes where the substrate identification is particularly important to know. It would help medicinal chemists to design drugs with lower side effects due to drug-drug interactions and to extensive genetic polymorphism. Herein, we discuss the application of the 2D and 3D-similarity searches in identifying reference Structures with higher capacity to retrieve Substrates of three important CYP enzymes (CYP2C9, CYP2D6, and CYP3A4). On the basis of the complementarities of multiple reference structures selected by different similarity search methods, we proposed the fusion of their individual Tanimoto scores into a consensus Tanimoto score (T(consensus)). Using this new score, true positive rates of 63% (CYP2C9) and 81% (CYP2D6) were achieved with false positive rates of 4% for the CYP2C9-CYP2D6 data Set. Extended similarity searches were carried out oil a validation data set, and the results showed that by using the T(consensus) score, not only the area of a ROC graph increased, but also more substrates were recovered at the beginning of a ranked list.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-incompatibility in Brassica is controlled by a single multi-allelic locus (S locus), which contains at least two highly polymorphic genes expressed in the stigma: an S glycoprotein gene (SLG) and an S receptor kinase gene (SRK). The putative ligand-binding domain of SRK exhibits high homology to the secretory protein SLG, and it is believed that SLG and SRK form an active receptor kinase complex with a self-pollen ligand, which leads to the rejection of self-pollen. Here, we report 31 novel SLG sequences of Brassica oleracea and Brassica campestris. Sequence comparisons of a large number of SLG alleles and SLG-related genes revealed the following points. (i) The striking sequence similarity observed in an inter-specific comparison (95.6% identity between SLG14 of B. oleracea and SLG25 of B. campestris in deduced amino acid sequence) suggests that SLG diversification predates speciation. (ii) A perfect match of the sequences in hypervariable regions, which are thought to determine S specificity in an intra-specific comparison (SLG8 and SLG46 of B. campestris) and the observation that the hypervariable regions of SLG and SRK of the same S haplotype were not necessarily highly similar suggests that SLG and SRK bind different sites of the pollen ligand and that they together determine S specificity. (iii) Comparison of the hypervariable regions of SLG alleles suggests that intragenic recombination, together with point mutations, has contributed to the generation of the high level of sequence variation in SLG alleles. Models for the evolution of SLG/SRK are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have previously shown that both a centromere (CEN) and a replication origin are necessary for plasmid maintenance in the yeast Yarrowia lipolytica (Vernis et al., 1997). Because of this requirement, only a small number of centromere-proximal replication origins have been isolated from Yarrowia. We used a CEN-based plasmid to obtain noncentromeric origins, and several new fragments, some unique and some repetitive sequences, were isolated. Some of them were analyzed by two-dimensional gel electrophoresis and correspond to actual sites of initiation (ORI) on the chromosome. We observed that a 125-bp fragment is sufficient for a functional ORI on plasmid, and that chromosomal origins moved to ectopic sites on the chromosome continue to act as initiation sites. These Yarrowia origins share an 8-bp motif, which is not essential for origin function on plasmids. The Yarrowia origins do not display any obvious common structural features, like bent DNA or DNA unwinding elements, generally present at or near eukaryotic replication origins. Y. lipolytica origins thus share features of those in the unicellular Saccharomyces cerevisiae and in multicellular eukaryotes: they are discrete and short genetic elements without sequence similarity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A whole genome cattle-hamster radiation hybrid cell panel was used to construct a map of 54 markers located on bovine chromosome 5 (BTA5). Of the 54 markers, 34 are microsatellites selected from the cattle linkage map and 20 are genes. Among the 20 mapped genes, 10 are new assignments that were made by using the comparative mapping by annotation and sequence similarity strategy. A LOD-3 radiation hybrid framework map consisting of 21 markers was constructed. The relatively low retention frequency of markers on this chromosome (19%) prevented unambiguous ordering of the other 33 markers. The length of the map is 398.7 cR, corresponding to a ratio of ≈2.8 cR5,000/cM. Type I genes were binned for comparison of gene order among cattle, humans, and mice. Multiple internal rearrangements within conserved syntenic groups were apparent upon comparison of gene order on BTA5 and HSA12 and HSA22. A similarly high number of rearrangements were observed between BTA5 and MMU6, MMU10, and MMU15. The detailed comparative map of BTA5 should facilitate identification of genes affecting economically important traits that have been mapped to this chromosome and should contribute to our understanding of mammalian chromosome evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One gene locus on chromosome I in Saccharomyces cerevisiae encodes a protein (YAB5_YEAST; accession no. P31378) with local sequence similarity to the DNA repair glycosylase endonuclease III from Escherichia coli. We have analyzed the function of this gene, now assigned NTG1 (endonuclease three-like glycosylase 1), by cloning, mutant analysis, and gene expression in E. coli. Targeted gene disruption of NTG1 produces a mutant that is sensitive to H2O2 and menadione, indicating that NTG1 is required for repair of oxidative DNA damage in vivo. Northern blot analysis and expression studies of a NTG1-lacZ gene fusion showed that NTG1 is induced by cell exposure to different DNA damaging agents, particularly menadione, and hence belongs to the DNA damage-inducible regulon in S. cerevisiae. When expressed in E. coli, the NTG1 gene product cleaves plasmid DNA damaged by osmium tetroxide, thus, indicating specificity for thymine glycols in DNA similarly as is the case for EndoIII. However, NTG1 also releases formamidopyrimidines from DNA with high efficiency and, hence, represents a glycosylase with a novel range of substrate recognition. Sequences similar to NTG1 from other eukaryotes, including Caenorhabditis elegans, Schizosaccharomyces pombe, and mammals, have recently been entered in the GenBank suggesting the universal presence of NTG1-like genes in higher organisms. S. cerevisiae NTG1 does not have the [4Fe-4S] cluster DNA binding domain characteristic of the other members of this family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adenosine kinase catalyzes the phosphorylation of adenosine to AMP and hence is a potentially important regulator of extracellular adenosine concentrations. Despite extensive characterization of the kinetic properties of the enzyme, its primary structure has never been elucidated. Full-length cDNA clones encoding catalytically active adenosine kinase were obtained from lymphocyte, placental, and liver cDNA libraries. Corresponding mRNA species of 1.3 and 1.8 kb were noted on Northern blots of all tissues examined and were attributable to alternative polyadenylylation sites at the 3' end of the gene. The encoding protein consists of 345 amino acids with a calculated molecular size of 38.7 kDa and does not contain any sequence similarities to other well-characterized mammalian nucleoside kinases, setting it apart from this family of structurally and functionally related proteins. In contrast, two regions were identified with significant sequence identity to microbial ribokinase and fructokinases and a bacterial inosine/guanosine kinase. Thus, adenosine kinase is a structurally distinct mammalian nucleoside kinase that appears to be akin to sugar kinases of microbial origin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Homology-driven proteomics is a major tool to characterize proteomes of organisms with unsequenced genomes. This paper addresses practical aspects of automated homology-driven protein identifications by LC-MS/MS on a hybrid LTQ orbitrap mass spectrometer. All essential software elements supporting the presented pipeline are either hosted at the publicly accessible web server, or are available for free download. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins.