195 resultados para Multiple Sequence Alignment


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Allergies are a major cause of chronic ill health in industrialised countries with the incidence of reported cases steadily increasing. This Research Focus details how bioinformatics is transforming the field of allergy through providing databases for management of allergen data, algorithms for characterisation of allergic crossreactivity, structural motifs and B- and T-cell epitopes, tools for prediction of allergenicity and techniques for genomic and proteomic analysis of allergens.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Allergy is a major cause of morbidity worldwide. The number of characterized allergens and related information is increasing rapidly creating demands for advanced information storage, retrieval and analysis. Bioinformatics provides useful tools for analysing allergens and these are complementary to traditional laboratory techniques for the study of allergens. Specific applications include structural analysis of allergens, identification of B- and T-cell epitopes, assessment of allergenicity and cross-reactivity, and genome analysis. In this paper, the most important bioinformatic tools and methods with relevance to the study of allergy have been reviewed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Five ripening-related ACC synthase cDNA isoforms were cloned from 80% ripe papaya cv. 'Sinta' by reverse transcription-PCR using gene-specific primers. Clone 2 had the longest transcript and contained all common exons and three alternative exons. Clones 3 and 4 contained common exons and one alternative exon each, while clone 1, the most common transcript, contained only the common exons. Clone 5 could be due to cloning artifacts and might not be a unique cDNA fragment. Thus, there are only four isoforms of ACC synthase mRNA. Southern blot analysis indicates that all five clones came from only one gene existing as a single copy in the 'Sinta' papaya genome. Multiple sequence alignment indicates that the four isoforms arise from a single gene, possibly through alternative splicing mechanisms. All the putative alternative exons were present at the 5'-end of the gene comprising the N-terminal region of the protein. 'Sinta' ACC synthase cDNAs were of the capacs 1 type and are most closely related to a 1.4 kb capacs 1-type DNA (AJ277160) from Eksotika papaya. No capacs 2-type cDNAs were cloned from 'Sinta' by RT-PCR. This is the first report of possible alternative splicing mechanism in ripening-related ACC synthase genes in hybrid papaya, possibly to modulate or fine-tune gene expression relevant to fruit ripening.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A 16S rRNA gene database (http://greengenes.bl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

At present, little is known about signal transduction mechanisms in schistosomes, which cause the disease of schistosomiasis. The mitogen-activated protein kinase (MAPK) signaling pathways, which are evolutionarily conserved from yeast to Homo sapiens, play key roles in multiple cellular processes. Here, we reconstructed the hypothetical MAPK signaling pathways in Schistosoma japonicum and compared the schistosome pathways with those of model eukaryote species. We identified 60 homologous components in the S. japoncium MAPK signaling pathways. Among these, 27 were predicted to be full-length sequences. Phylogenetic analysis of these proteins confirmed the evolutionary conservation of the MAPK signaling pathways. Remarkably, we identified S. japonicum homologues of GTP-binding protein beta and alpha-I subunits in the yeast mating pathway, which might be involved in the regulation of different life stages and female sexual maturation processes as well in schistosomes. In addition, several pathway member genes, including ERK, JNK, Sja-DSP, MRAS and RAS, were determined through quantitative PCR analysis to be expressed in a stage-specific manner, with ERK, JNK and their inhibitor Sja-DSP markedly upregulated in adult female schistosomes. (c) 2006 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabdotids elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the beta A4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage. (C) 2000 Elsevier Science Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The three-dimensional structures of leucine-rich repeat (LRR) -containing proteins from five different families were previously predicted based on the crystal structure of the ribonuclease inhibitor. using an approach that combined homology-based modeling, structure-based sequence alignment of LRRs, and several rational assumptions. The structural models have been produced based on very limited sequence similarity, which, in general. cannot yield trustworthy predictions. Recently, the protein structures from three of these five families have been determined. In this report we estimate the quality of the modeling approach by comparing the models with the experimentally determined structures. The comparison suggests that the general architecture, curvature, interior/exterior orientations of side chains. and backbone conformation of the LRR structures can be predicted correctly. On the other hand. the analysis revealed that, in some cases. it is difficult to predict correctly the twist of the overall super-helical structure. Taking into consideration the conclusions from these comparisons, we identified a new family of bacterial LRR proteins and present its structural model. The reliability of the LRR protein modeling suggests that it would be informative to apply similar modeling approaches to other classes of solenoid proteins.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The recent discovery of isotrichid-like ciliates occurring as endosymbionts in macropodid marsupials posed interesting questions in regard to both their phyletic origin (all previous records confined to eutherian mammals) and their morphological evolution (Australian forms possibly representing missing links between previously described genera). The SSU rRNA gene was sequenced for three species (Dasytricha dehorityi, D. dogieli, and Batricha tasmaniensis) and aligned against representatives of all major ciliate classes. The Australian species did not group with the other isotrichid species but instead formed an independent radiation. Discrepancies between recent global phylogenies of the phylum Ciliophora were examined by manipulation of the aligned sequence data set. Sources of conflict between these studies did not stem from differences in outgroup choice or phylogenetic reconstruction methods. Differences in the application of confidence limits and primary sequence alignment have probably resulted in the reporting of spurious associations which are not supported by more conservative confidence or alignment methodology. At present, the ciliate subphylum Intramacro-nucleata is an unresolved polytomy which may be due to deficiencies in the SSU rRNA gene sequence dataset or indicate that the ciliates radiated into their extant classes by rapid burst-like evolution. (C) 2001 academic Press.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Potato type II serine proteinase inhibitors are proteins that consist of multiple sequence repeats, and exhibit a multidomain structure. The structural domains are circular permutations of the repeat sequence.. as a result or intramolecular domain swapping. Structural studies give indications for the origins of this folding behaviour, and the evolution of the inhibitor family.