104 resultados para local sequence alignment problem
em University of Queensland eSpace - Australia
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.
Resumo:
The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabdotids elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the beta A4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage. (C) 2000 Elsevier Science Ltd. All rights reserved.
Resumo:
Allergies are a major cause of chronic ill health in industrialised countries with the incidence of reported cases steadily increasing. This Research Focus details how bioinformatics is transforming the field of allergy through providing databases for management of allergen data, algorithms for characterisation of allergic crossreactivity, structural motifs and B- and T-cell epitopes, tools for prediction of allergenicity and techniques for genomic and proteomic analysis of allergens.
Resumo:
Allergy is a major cause of morbidity worldwide. The number of characterized allergens and related information is increasing rapidly creating demands for advanced information storage, retrieval and analysis. Bioinformatics provides useful tools for analysing allergens and these are complementary to traditional laboratory techniques for the study of allergens. Specific applications include structural analysis of allergens, identification of B- and T-cell epitopes, assessment of allergenicity and cross-reactivity, and genome analysis. In this paper, the most important bioinformatic tools and methods with relevance to the study of allergy have been reviewed.
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
The three-dimensional structures of leucine-rich repeat (LRR) -containing proteins from five different families were previously predicted based on the crystal structure of the ribonuclease inhibitor. using an approach that combined homology-based modeling, structure-based sequence alignment of LRRs, and several rational assumptions. The structural models have been produced based on very limited sequence similarity, which, in general. cannot yield trustworthy predictions. Recently, the protein structures from three of these five families have been determined. In this report we estimate the quality of the modeling approach by comparing the models with the experimentally determined structures. The comparison suggests that the general architecture, curvature, interior/exterior orientations of side chains. and backbone conformation of the LRR structures can be predicted correctly. On the other hand. the analysis revealed that, in some cases. it is difficult to predict correctly the twist of the overall super-helical structure. Taking into consideration the conclusions from these comparisons, we identified a new family of bacterial LRR proteins and present its structural model. The reliability of the LRR protein modeling suggests that it would be informative to apply similar modeling approaches to other classes of solenoid proteins.
Resumo:
The recent discovery of isotrichid-like ciliates occurring as endosymbionts in macropodid marsupials posed interesting questions in regard to both their phyletic origin (all previous records confined to eutherian mammals) and their morphological evolution (Australian forms possibly representing missing links between previously described genera). The SSU rRNA gene was sequenced for three species (Dasytricha dehorityi, D. dogieli, and Batricha tasmaniensis) and aligned against representatives of all major ciliate classes. The Australian species did not group with the other isotrichid species but instead formed an independent radiation. Discrepancies between recent global phylogenies of the phylum Ciliophora were examined by manipulation of the aligned sequence data set. Sources of conflict between these studies did not stem from differences in outgroup choice or phylogenetic reconstruction methods. Differences in the application of confidence limits and primary sequence alignment have probably resulted in the reporting of spurious associations which are not supported by more conservative confidence or alignment methodology. At present, the ciliate subphylum Intramacro-nucleata is an unresolved polytomy which may be due to deficiencies in the SSU rRNA gene sequence dataset or indicate that the ciliates radiated into their extant classes by rapid burst-like evolution. (C) 2001 academic Press.
Resumo:
Five ripening-related ACC synthase cDNA isoforms were cloned from 80% ripe papaya cv. 'Sinta' by reverse transcription-PCR using gene-specific primers. Clone 2 had the longest transcript and contained all common exons and three alternative exons. Clones 3 and 4 contained common exons and one alternative exon each, while clone 1, the most common transcript, contained only the common exons. Clone 5 could be due to cloning artifacts and might not be a unique cDNA fragment. Thus, there are only four isoforms of ACC synthase mRNA. Southern blot analysis indicates that all five clones came from only one gene existing as a single copy in the 'Sinta' papaya genome. Multiple sequence alignment indicates that the four isoforms arise from a single gene, possibly through alternative splicing mechanisms. All the putative alternative exons were present at the 5'-end of the gene comprising the N-terminal region of the protein. 'Sinta' ACC synthase cDNAs were of the capacs 1 type and are most closely related to a 1.4 kb capacs 1-type DNA (AJ277160) from Eksotika papaya. No capacs 2-type cDNAs were cloned from 'Sinta' by RT-PCR. This is the first report of possible alternative splicing mechanism in ripening-related ACC synthase genes in hybrid papaya, possibly to modulate or fine-tune gene expression relevant to fruit ripening.
Resumo:
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.
Resumo:
A 16S rRNA gene database (http://greengenes.bl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.