86 resultados para protein sequence classification

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CyBase is a curated database and information source for backbone-cyclized proteins. The database incorporates naturally occurring cyclic proteins as well as synthetic derivatives, grafted analogues and acyclic permutants. The database provides a centralized repository of information on all aspects of cyclic protein biology and addresses issues pertaining to the management and searching of topologically circular sequences. The database is freely available at http://research.imb.uq.edu.au/cybase.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe two ways of optimizing score functions for protein sequence to structure threading. The first method adjusts parameters to improve sequence to structure alignment. The second adjusts parameters so as to improve a score function's ability to rank alignments calculated in the first score function. Unlike those functions known as knowledge-based force fields, the resulting parameter sets do not rely on Boltzmann statistics, have no claim to representing free energies and are purely constructions for recognizing protein folds. The methods give a small improvement, but suggest that functions can be profitably optimized for very specific aspects of protein fold recognition, Proteins 1999;36:454-461. (C) 1999 Wiley-Liss, Inc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabdotids elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the beta A4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage. (C) 2000 Elsevier Science Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sausage is a protein sequence threading program, but with remarkable run-time flexibility. Using different scripts, it can calculate protein sequence-structure alignments, search structure libraries, swap force fields, create models form alignments, convert file formats and analyse results. There are several different force fields which might be classed as knowledge-based, although they do not rely on Boltzmann statistics. Different force fields are used for alignment calculations and subsequent ranking of calculated models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The SH3 domains of src and other nonreceptor tyrosine kinases have been shown to associate with the motif PXXP, where P and X stand for proline and an unspecified amino acid, but a motif that binds to the SH3 domain of myosin has thus far not been characterized. We previously showed that the SH3 domain of Acanthamoeba myosin-IC interacts with the protein Acan125. We now report that the Acan125 protein sequence contains two tandem consensus PXXP motifs near the C terminus. To test for binding, we expressed a polypeptide, AD3p, which includes 344 residues of native C-terminal sequence and a mutant polypeptide, AD3 Delta 977-994p, which lacks the sequence RPKPVPPPRGAKPAPPPR containing both PXXP motifs. The SH3 domain of Acanthamoeba myosin-IC bound AD3p and not AD3 Delta 977-994p, showing that the PXXP motifs are required for SH3 binding. The sequence of Acan125 is related overall to a protein of unknown function coded by Caenorhabditis elegans gene K07G5.1. The K07G5.1 gene product contains a proline-rich segment similar to the SH3 binding motif found in Acan125. The aligned sequences show considerable conservation of leucines and other hydrophobic residues, including the spacing of these residues, which matches a motif for leucine-rich repeats (LRRs). LRR domains have been demonstrated to be sites for ligand binding. Having an LRR domain and an SH3-binding domain, Acan125 and the C. elegans homologue define a novel family of bifunctional binding proteins.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There is increasing evidence that heterotrimeric G-proteins (G-proteins) are involved in many plant processes including phytohormone response, pathogen defence and stomatal control. In animal systems, each of the three G-protein subunits belong to large multigene families; however, few subunits have been isolated from plants. Here we report the cloning of a second plant G-protein γ-subunit (AGG2) from Arabidopsis thaliana. The predicted AGG2 protein sequence shows 48% identity to the first identified Arabidopsis Gγ-subunit, AGG1. Furthermore, AGG2 contains all of the conserved characteristics of γ-subunits including a small size (100 amino acids, 11.1 kDa), C-terminal CAAX box and a N-terminal α-helix region capable of forming a coiled-coil interaction with the β-subunit. A strong interaction between AGG2 and both the tobacco (TGB1) and Arabidopsis (AGB1) β-subunits was observed in vivo using the yeast two-hybrid system. The strong association between AGG2 and AGB1 was confirmed in vitro. Southern and Northern analyses showed that AGG2 is a single copy gene in Arabidopsis producing two transcripts that are present in all tissues tested. The isolation of a second γ-subunit from A. thaliana indicates that plant G-proteins, like their mammalian counterparts, may form different heterotrimer combinations that presumably regulate multiple signal transduction pathways.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The 101 residue protein early pregnancy factor (EPF), also known as human chaperonin 10, was synthesized from four functionalized, but unprotected, peptide segments by a sequential thioether ligation strategy. The approach exploits the differential reactivity of a peptide-NHCH2CH2SH thiolate with XCH2CO-peptides, where X = Cl or I/Br. Initial model studies with short functionalized (but unprotected) peptides showed a significantly faster reaction of a peptide-NHCH2CH2SH thiolate with a BrCH2CO-peptide than with a CICH2CO-peptide, where thiolate displacement of the halide leads to chemoselective formation of a thioether surrogate for the Gly-Gly peptide bond. This rate difference was used as the basis of a novel sequential ligation approach to the synthesis of large polypeptide chains. Thus, ligation of a model bifunctional N-alpha-chloroacetyl, C-terminal thiolated peptide with a second N-alpha-bromoacetyl peptide demonstrated chemoselective bromide displacement by the thiol group. Further investigations showed that the relatively unreactive N-alpha-chloroacetyl peptides could be activated by halide exchange using saturated KI solutions to yield the highly reactive No-iodoacetyl peptides. These findings were used to formulate a sequential thioether ligation strategy for the synthesis of EPF, a 101 amino acid protein containing three Gly-Gly sites approximately equidistantly spaced within the peptide chain. Four peptide segments or cassettes comprising the EPF protein sequence (BrAc-[EPF 78-101] 12, ClAc-[EPF 58-75]-[NHCH2CH2SH] 13, ClAc-[EPF 30-55]-[NHCH2CH2SH] 14, and Ac-[EPF 1-27]-[NHCH2CH2SH] 15) of EPF were synthesized in high yield and purity using Boc SPPS chemistry. In the stepwise sequential ligation strategy, reaction of peptides 12 and 13 was followed by conversion of the N-terminal chloroacetyl functional group to an iodoacetyl, thus activating the product peptide for further ligation with peptide 14. The process of ligation followed by iodoacetyl activation was repeated to yield an analogue of EPF (EPF psi(CH2S)(28-29,56-57,76-77)) 19 in 19% overall yield.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteornes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the completion of the human and mouse genome sequences, the task now turns to identifying their encoded transcripts and assigning gene function. In this study, we have undertaken a computational approach to identify and classify all of the protein kinases and phosphatases present in the mouse gene complement. A nonredundant set of these sequences was produced by mining Ensembl gene predictions and publicly available cDNA sequences with a panel of InterPro domains. This approach identified 561 candidate protein kinases and 162 candidate protein phosphatases. This cohort was then analyzed using TribeMCL protein sequence similarity clustering followed by CLUSTALV alignment and hierarchical tree generation. This approach allowed us to (1) distinguish between true members of the protein kinase and phosphatase families and enzymes of related biochemistry, (2) determine the structure of the families, and (3) suggest functions for previously uncharacterized members. The classifications obtained by this approach were in good agreement with previous schemes and allowed us to demonstrate domain associations with a number of clusters. Finally, we comment on the complementary nature of cDNA and genome-based gene detection and the impact of the FANTOM2 transcriptome project.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods. (C) 2005 Wiley-Liss, Inc.