986 resultados para Databases, Protein
Resumo:
A complete cDNA encoding a novel hybrid Pro-rich protein (HyPRP) was identified by differentially screening 3x10(4) recombinant plaques of a Cuscuta reflexa cytokinin-induced haustorial cDNA library constructed in lambda gt10. The nucleotide (nt) sequence consists of: (i) a 424-bp 5'-non coding region having five start codons (ATGs) and three upstream open reading frames (uORFs); (ii) an ORF of 987 bp with coding potential for a 329-amino-acid (aa) protein of M(r), 35203 with a hydrophobic N-terminal region including a stretch of nine consecutive Phe followed by a Pro-rich sequence and a Cys-rich hydrophobic C terminus; and (iii) a 178-bp 3'-UTR (untranslated region). Comparison of the predicted aa sequence with the NBRF and SWISSPROT databases and with a recent report of an embryo-specific protein of maize [Jose-Estanyol et al., Plant Cell 4 (1992) 413-423] showed it to be similar to the class of HyPRPs encoded by genes preferentially expressed in young tomato fruits, maize embryos and in vitro-cultured carrot embryos. Northern analysis revealed an approx. 1.8-kb mRNA of this gene expressed in the subapical region of the C. reflexa vine which exhibited maximum sensitivity to cytokinin in haustorial induction.
Resumo:
Molecular understanding of disease processes can be accelerated if all interactions between the host and pathogen are known. The unavailability of experimental methods for large-scale detection of interactions across host and pathogen organisms hinders this process. Here we apply a simple method to predict protein-protein interactions across a host and pathogen organisms. We use homology detection approaches against the protein-protein interaction databases. DIP and iPfam in order to predict interacting proteins in a host-pathogen pair. In the present work, we first applied this approach to the test cases involving the pairs phage T4 - Escherichia coli and phage lambda - E. coli and show that previously known interactions could be recognized using our approach. We further apply this approach to predict interactions between human and three pathogens E. coli, Salmonella enterica typhimurium and Yersinia pestis. We identified several novel interactions involving proteins of host or pathogen that could be thought of as highly relevant to the disease process. Serendipitously, many interactions involve hypothetical proteins of yet unknown function. Hypothetical proteins are predicted from computational analysis of genome sequences with no laboratory analysis on their functions yet available. The predicted interactions involving such proteins could provide hints to their functions. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Repeats are two or more contiguous segments of amino acid residues that are believed to have arisen as a result of intragenic duplication, recombination and mutation events. These repeats can be utilized for protein structure prediction and can provide insights into the protein evolution and phylogenetic relationship. Therefore, to aid structural biologists and phylogeneticists in their research, a computing resource (a web server and a database), Repeats in Protein Sequences (RPS), has been created. Using RPS, users can obtain useful information regarding identical, similar and distant repeats (of varying lengths) in protein sequences. In addition, users can check the frequency of occurrence of the repeats in sequence databases such as the Genome Database, PIR and SWISS-PROT and among the protein sequences available in the Protein Data Bank archive. Furthermore, users can view the three-dimensional structure of the repeats using the Java visualization plug-in Jmol. The proposed computing resource can be accessed over the World Wide Web at http://bioserver1.physics.iisc.ernet.in/rps/.
Resumo:
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein-ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of similar to 68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC).
Resumo:
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value. Proteins 2015; 83:1005-1013. (c) 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Resumo:
During 11-12 August 2014, a Protein Bioinformatics and Community Resources Retreat was held at the Wellcome Trust Genome Campus in Hinxton, UK. This meeting brought together the principal investigators of several specialized protein resources (such as CAZy, TCDB and MEROPS) as well as those from protein databases from the large Bioinformatics centres (including UniProt and RefSeq). The retreat was divided into five sessions: (1) key challenges, (2) the databases represented, (3) best practices for maintenance and curation, (4) information flow to and from large data centers and (5) communication and funding. An important outcome of this meeting was the creation of a Specialist Protein Resource Network that we believe will improve coordination of the activities of its member resources. We invite further protein database resources to join the network and continue the dialogue.
Resumo:
Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.
Resumo:
BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.RESULTS:We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.CONCLUSION:A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.
Resumo:
Many sequelae associated with endotoxaemic-induced shock result from excessive production of the cytokine mediators, tumour necrosis factor alpha (TNF-alpha), interleukin 1 (IL-1) and IL-6 from lipopolysaccharide (LPS)-activated monocytes. Protein C (PC)/activated protein C (APC) has potent cytokine-modifying properties and is protective in animal models and human clinical trials of sepsis. The precise mechanism by which this anti-inflammatory response is achieved remains unknown; however, the recently described endothelial protein C receptor (EPCR) appears to be essential for this function. The pivotal role that monocytes play in the pathophysiology of septic shock led us to investigate the possible expression of a protein C receptor on the monocyte membrane. We used similarity algorithms to screen human sequence databases for paralogues of the EPCR but found none. However, using reverse transcription-polymerase chain reaction (RT-PCR), we detected an mRNA transcribed in primary human monocytes and THP1 cells that was identical to human EPCR mRNA. We also used immunocytochemical analysis to demonstrate the expression of a protein C receptor on the surface of monocytes encoded by the same gene as EPCR. These results confirm a new member of the protein C pathway involving primary monocytes. Further characterization will be necessary to compare and contrast its biological properties with those of EPCR.
Resumo:
A maioria das funções celulares, incluindo expressão de genes, crescimento e proliferação celulares, metabolismo, morfologia, motilidade, comunicação intercelular e apoptose, é regulada por interações proteína-proteína (IPP). A célula responde a uma variedade de estímulos, como tal a expressão de proteínas é um processo dinâmico e os complexos formados são constituídos transitoriamente mudando de acordo com o seu ciclo funcional, adicionalmente, muitas proteínas são expressas de uma forma dependente do tipo de célula. Em qualquer instante a célula pode conter cerca de centenas de milhares de IPPs binárias, e encontrar os companheiros de interação de uma proteína é um meio de inferir a sua função. Alterações em redes de IPP podem também fornecer informações acerca de mecanismos de doença. O método de identificação binário mais frequentemente usado é o sistema Dois Hibrido de Levedura, adaptado para rastreio em larga escala. Esta metodologia foi aqui usada para identificar os interactomas específicos de isoforma da Proteína Fosfatase 1 (PP1), em cérebro humano. A PP1 é uma proteína fosfatase de Ser/Thr envolvida numa grande variedade de vias e eventos celulares. É uma proteína conservada codificada por três genes, que originam as isoformas α, β, e γ, com a última a originar γ1 e γ2 por splicing alternativo. As diferentes isoformas da PP1 são reguladas pelos companheiros de interação – proteínas que interagem com a PP1 (PIPs). A natureza modular dos complexos da PP1, bem como a sua associação combinacional, gera um largo reportório de complexos reguladores e papéis em circuitos de sinalização celular. Os interactomas da PP1 específicos de isofoma, em cérebro, foram aqui descritos, com um total de 263 interações identificadas e integradas com os dados recolhidos de várias bases de dados de IPPs. Adicionalmente, duas PIPs foram selecionadas para uma caracterização mais aprofundada da interação: Taperina e Sinfilina-1A. A Taperina é uma proteína ainda pouco descrita, descoberta recentemente como sendo uma PIP. A sua interação com as diferentes isoformas da PP1 e localização celulares foram analisadas. Foi descoberto que a Taperina é clivada e que está presente no citoplasma, membrana e núcleo e que aumenta os níveis de PP1, em células HeLa. Na membrana ela co-localiza com a PP1 e a actina e uma forma mutada da Taperina, no motivo de ligação à PP1, está enriquecida no núcleo, juntamente com a actina. Mais, foi descoberto que a Taperina é expressa em testículo e localiza-se na região acrossómica da cabeça do espermatozoide, uma estrutura onde a PP1 e a actina estão também presentes. A Sinfilina-1A, uma isoforma da Sinfilina-1, é uma proteína com tendência para agregar e tóxica, envolvida na doença de Parkinson. Foi mostrado que a Sinfilina-1A liga às isoformas da PP1, por co-transformação em levedura, e que mutação do seu motivo de ligação à PP1 diminuiu significativamente a interação, num ensaio de overlay. Quando sobre-expressa em células Cos-7, a Sinfilina-1A formou corpos de inclusão onde a PP1 estava presente, no entanto a forma mutada da Sinfilina-1A também foi capaz de agregar, indicando que a formação de inclusões não foi dependente de ligação à PP1. Este trabalho dá uma nova perspetiva dos interactomas da PP1, incluindo a identificação de dezenas de companheiros de ligação específicos de isoforma, e enfatiza a importância das PIPs, não apenas na compreensão das funções celulares da PP1 mas também, como alvos de intervenção terapêutica.
Resumo:
Phenylketonuria is an inborn error of metabolism, involving, in most cases, a deficient activity of phenylalanine hydroxylase. Neonatal diagnosis and a prompt special diet (low phenylalanine and natural-protein restricted diets) are essential to the treatment. The lack of data concerning phenylalanine contents of processed foodstuffs is an additional limitation for an already very restrictive diet. Our goals were to quantify protein (Kjeldahl method) and amino acid (18) content (HPLC/fluorescence) in 16 dishes specifically conceived for phenylketonuric patients, and compare the most relevant results with those of several international food composition databases. As might be expected, all the meals contained low protein levels (0.67–3.15 g/100 g) with the highest ones occurring in boiled rice and potatoes. These foods also contained the highest amounts of phenylalanine (158.51 and 62.65 mg/100 g, respectively). In contrast to the other amino acids, it was possible to predict phenylalanine content based on protein alone. Slight deviations were observed when comparing results with the different food composition databases.
Resumo:
Les anomalies du tube neural (ATN) sont des anomalies développementales où le tube neural reste ouvert (1-2/1000 naissances). Afin de prévenir cette maladie, une connaissance accrue des processus moléculaires est nécessaire. L’étiologie des ATN est complexe et implique des facteurs génétiques et environnementaux. La supplémentation en acide folique est reconnue pour diminuer les risques de développer une ATN de 50-70% et cette diminution varie en fonction du début de la supplémentation et de l’origine démographique. Les gènes impliqués dans les ATN sont largement inconnus. Les études génétiques sur les ATN chez l’humain se sont concentrées sur les gènes de la voie métabolique des folates du à leur rôle protecteur dans les ATN et les gènes candidats inférés des souris modèles. Ces derniers ont montré une forte association entre la voie non-canonique Wnt/polarité cellulaire planaire (PCP) et les ATN. Le gène Protein Tyrosine Kinase 7 est un membre de cette voie qui cause l’ATN sévère de la craniorachischisis chez les souris mutantes. Ptk7 interagit génétiquement avec Vangl2 (un autre gène de la voie PCP), où les doubles hétérozygotes montrent une spina bifida. Ces données font de PTK7 comme un excellent candidat pour les ATN chez l’humain. Nous avons re-séquencé la région codante et les jonctions intron-exon de ce gène dans une cohorte de 473 patients atteints de plusieurs types d’ATN. Nous avons identifié 6 mutations rares (fréquence allélique <1%) faux-sens présentes chez 1.1% de notre cohorte, dont 3 sont absentes dans les bases de données publiques. Une variante, p.Gly348Ser, a agi comme un allèle hypermorphique lorsqu'elle est surexprimée dans le modèle de poisson zèbre. Nos résultats impliquent la mutation de PTK7 comme un facteur de risque pour les ATN et supporte l'idée d'un rôle pathogène de la signalisation PCP dans ces malformations.
Resumo:
Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
Resumo:
Homology-driven proteomics is a major tool to characterize proteomes of organisms with unsequenced genomes. This paper addresses practical aspects of automated homology-driven protein identifications by LC-MS/MS on a hybrid LTQ orbitrap mass spectrometer. All essential software elements supporting the presented pipeline are either hosted at the publicly accessible web server, or are available for free download. (C) 2008 Elsevier B.V. All rights reserved.