221 resultados para PROTEIN SEQUENCES
em University of Queensland eSpace - Australia
Resumo:
Cyclotides are peptides from plants of the Rubiaceae and Violaceae families that have the unusual characteristic of a macrocylic backbone. They are further characterized by their incorporation of a cystine knot in which two disulfides, along with the intervening backbone residues, form a ring through which a third disulfide is threaded. The cyclotides have been found in every Violaceae species screened to date but are apparently present in only a few Rubiaceae species. The selective distribution reported so far raises questions about the evolution of the cyclotides within the plant kingdom. In this study, we use a combined bioinformatics and expression analysis approach to elucidate the evolution and distribution of the cyclotides in the plant kingdom and report the discovery of related sequences widespread in the Poaceae family, including crop plants such as rice ( Oryza sativa), maize ( Zea mays), and wheat ( Triticum aestivum), which carry considerable economic and social importance. The presence of cyclotide-like sequences within these plants suggests that the cyclotides may be derived from an ancestral gene of great antiquity. Quantitative RT-PCR was used to show that two of the discovered cyclotide-like genes from rice and barley ( Hordeum vulgare) have tissue-specific expression patterns.
Resumo:
The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabdotids elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the beta A4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage. (C) 2000 Elsevier Science Ltd. All rights reserved.
Resumo:
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/.
Resumo:
Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi- spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units ( TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our highconfidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome.
Resumo:
Kalata B1 is a member of a new family of polypeptides, isolated from. plants, which have a cystine knot structure embedded within an amide-cyclized backbone. This family of molecules are the largest known cyclic peptides, and thus, the mechanism of synthesis and folding is of great interest. To provide information about both these phenomena, we have synthesized kalata B1 using two distinct strategies. In the first, oxidation of the cysteine residues of a linear precursor peptide to form the correct disulfide bonds results in folding of the three-dimensional structure and preorganization of the termini in close proximity for subsequent cyclization. The second approach involved cyclization prior to oxidation. In the first method, the correctly folded peptide was produced only in the presence of partially hydrophobic solvent conditions. These conditions are presumably required to stabilize the surface-exposed hydrophobic residues. However,; in the synthesis,involving cyclization prior to oxidation, the cyclic reduced peptide folded to a significant degree in the absence of hydrophobic solvents and even more efficiently in the presence of hydrophobic solvents. Cyclization clearly has a major effect on the folding pathway and facilitates formation of the correctly disulfide-bonded form in aqueous solution; In addition to facilitating folding to a compact stable structure cyclization has an important effect on biological activity as assessed by hemolytic activity.
Resumo:
Recombinant protein production in bacteria is efficient except that insoluble inclusion bodies form when some gene sequences are expressed. Such proteins must undergo renaturation, which is an inefficient process due to protein aggregation on dilution from concentrated denaturant. In this study, the protein-protein interactions of eight distinct inclusion-body proteins are quantified, in different solution conditions, by measurement of protein second virial coefficients (SVCs). Protein solubility is shown to decrease as the SVC is reduced (i.e., as protein interactions become more attractive). Plots of SVC versus denaturant concentration demonstrate two clear groupings of proteins: a more aggregative group and a group having higher SVC and better solubility. A correlation of the measured SVC with protein molecular weight and hydropathicity, that is able to predict which group each of the eight proteins falls into, is presented. The inclusion of additives known to inhibit aggregation during renaturation improves solubility and increases the SVC of both protein groups. Furthermore, an estimate of maximum refolding yield (or solubility) using high-performance liquid chromatography was obtained for each protein tested, under different environmental conditions, enabling a relationship between yield and SVC to be demonstrated. Combined, the results enable an approximate estimation of the maximum refolding yield that is attainable for each of the eight proteins examined, under a selected chemical environment. Although the correlations must be tested with a far larger set of protein sequences, this work represents a significant move beyond empirical approaches for optimizing renaturation conditions. The approach moves toward the ideal of predicting maximum refolding yield using simple bioinformatic metrics that can be estimated from the gene sequence. Such a capability could potentially screen, in silico, those sequences suitable for expression in bacteria from those that must be expressed in more complex hosts. (C) 2004 Wiley Periodicals, Inc.
Resumo:
The spectral sensitivities of avian retinal photoreceptors are examined with respect to microspectrophotometric measurements of single cells, spectrophotometric measurements of extracted or in vitro regenerated visual pigments, and molecular genetic analyses of visual pigment opsin protein sequences. Bird species from diverse orders are compared in relation to their evolution, their habitats and the multiplicity of visual tasks they must perform. Birds have five different types of visual pigment and seven different types of photo receptor-rods, double (uneven twin) cones and four types of single cone. The spectral locations of the wavelengths of maximum absorbance (lambda (max)) of the different visual pigments, and the spectral transmittance characteristics of the intraocular spectral filters (cone oil droplets) that also determine photoreceptor spectral sensitivity, vary according to both habitat and phylogenetic relatedness. The primary influence on avian retinal design appears to be the range of wavelengths available for vision, regardless of whether that range is determined by the spectral distribution of the natural illumination or the spectral transmittance of the ocular media (cornea, aqueous humour, lens, vitreous humour). Nevertheless, other variations in spectral sensitivity exist that reflect the variability and complexity of avian visual ecology. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
The four known tropomyosin genes have highly conserved DNA and amino acid sequences, and at least 18 isoforms are generated by alternative RNA splicing in muscle and non-muscle cells. No rabbit tropomyosin nucleotide sequences are known, although protein sequences for alpha- and beta-tropomyosin expressed by rabbit skeletal muscle have been described. Subtractive hybridisation was used to select for genes differentially expressed in rabbit aortic smooth muscle cells (SMC), during the change in cell phenotype in primary culture that is characterised by a loss of cytoskeletal filaments and contractile proteins. This led to the cloning of a tropomyosin gene predominantly expressed in rabbit SMC during this change. The full-length cDNA clone, designated rabbit TM-beta, contains an open reading frame of 284 amino acids, 5' untranslated region (UTR) of I 17 base pairs and 3' UTR of 79 base pairs. It is closely related to the beta-gene isoforms in other species, with the highest homology in DNA and protein sequences to the human fibroblast isoform TM-1 (91.7% identity in 1035 bp and 93.3% identity in the entire 284 amino acid sequence of the protein), It differs from rabbit skeletal muscle P-tropomyosin (81.7% homology at the protein level) mainly in two regions at amino acids 189-213 and 258-283 suggesting alternative splicing of exons 6a for 6b and 9d for 9a. Since this TM-P gene was the only gene strongly enough expressed in SMC changing phenotype to be observed by the subtractive hybridisation screen, it likely plays a significant role in this process. (C) 2002 Published by Elsevier Science Ltd.
Resumo:
A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteornes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins
Resumo:
We have developed a computational strategy to identify the set of soluble proteins secreted into the extracellular environment of a cell. Within the protein sequences predominantly derived from the RIKEN representative transcript and protein set, we identified 2033 unique soluble proteins that are potentially secreted from the cell. These proteins contain a signal peptide required for entry into the secretory pathway and lack any transmembrane domains or intracellular localization signals. This class of proteins, which we have termed the mouse secretome, included >500 novel proteins and 92 proteins
Resumo:
One of the great challenges in biology is to understand how particular complex morphological and physiological characters originated in specific evolutionary lineages. In this article, we address the origin of the vertebrate hypothalamic-pituitary-peripheral gland (H-P-PG) endocrine system, a complex network of specialized tissues, ligands and receptors. Analysis of metazoan nucleotide and protein sequences reveals a patchwork pattern of H-P-PG gene conservation between vertebrates and closely related invertebrates (ascidians). This is consistent with a model of how the vertebrate H-P-PG endocrine system could have emerged in relatively few steps by gene family expansion and by regulatory and structural modifications to genes that are present in a chordate ancestor. Some of these changes might have resulted in new connections between metabolic or signaling pathways, such as the bridging of 'synthesis islands' to form an efficient system for steroid hormone synthesis.
Resumo:
A new method has been developed for prediction of transmembrane helices using support vector machines. Different coding schemes of protein sequences were explored, and their performances were assessed by crossvalidation tests. The best performance method can predict the transmembrane helices with sensitivity of 93.4% and precision of 92.0%. For each predicted transmembrane segment, a score is given to show the strength of transmembrane signal and the prediction reliability. In particular, this method can distinguish transmembrane proteins from soluble proteins with an accuracy of similar to99%. This method can be used to complement current transmembrane helix prediction methods and can be Used for consensus analysis of entire proteomes . The predictor is located at http://genet.imb.uq.edu.au/predictors/ SVMtm. (C) 2004 Wiley Periodicals, Inc.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Background: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. Results: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. Conclusion: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity.