929 resultados para protein sequence classification
Resumo:
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.
Resumo:
Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi- spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units ( TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our highconfidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome.
Resumo:
We completed the genome sequence of Lettuce necrotic yellows virus (LNYV) by determining the nucleotide sequences of the 4a (putative phosphoprotein), 4b, M (matrix protein), G (glycoprotein) and L (polymerase) genes. The genome consists of 12,807 nucleotides and encodes six genes in the order 3' leader-N-4a(P)-4b-M-G-L-5' trailer. Sequences were derived from clones of a cDNA library from LNYV genomic RNA and from fragments amplified using reverse transcription-polymerase chain reaction. The 4a protein has a low isoelectric point characteristic for rhabdovirus phosphoproteins. The 4b protein has significant sequence similarities with the movement proteins of capillo- and trichoviruses and may be involved in cell-to-cell movement. The putative G protein sequence contains a predicted 25 amino acids signal peptide and endopeptidase cleavage site, three predicted glycosylation sites and a putative transmembrane domain. The deduced L protein sequence shows similarities with the L proteins of other plant rhabdoviruses and contains polymerase module motifs characteristic for RNA-dependent RNA polymerases of negative-strand RNA viruses. Phylogenetic analysis of this motif among rhabdoviruses placed LNYV in a group with other sequenced cytorhabdoviruses, most closely related to Strawberry crinkle virus. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Based on Bayesian Networks, methods were created that address protein sequence-based bacterial subcellular location prediction. Distinct predictive algorithms for the eight bacterial subcellular locations were created. Several variant methods were explored. These variations included differences in the number of residues considered within the query sequence - which ranged from the N-terminal 10 residues to the whole sequence - and residue representation - which took the form of amino acid composition, percentage amino acid composition, or normalised amino acid composition. The accuracies of the best performing networks were then compared to PSORTB. All individual location methods outperform PSORTB except for the Gram+ cytoplasmic protein predictor, for which accuracies were essentially equal, and for outer membrane protein prediction, where PSORTB outperforms the binary predictor. The method described here is an important new approach to method development for subcellular location prediction. It is also a new, potentially valuable tool for candidate subunit vaccine selection.
Resumo:
Electrostatic interactions are of fundamental importance in determining the structure and stability of macromolecules. For example, charge-charge interactions modulate the folding and binding of proteins and influence protein solubility. Electrostatic interactions are highly variable and can be both favorable and unfavorable. The ability to quantify these interactions is challenging but vital to understanding the detailed balance and major roles that they have in different proteins and biological processes. Measuring pKa values of ionizable groups provides a sensitive method for experimentally probing the electrostatic properties of a protein.
pKa values report the free energy of site-specific proton binding and provide a direct means of studying protein folding and pH-dependent stability. Using a combination of NMR, circular dichroism, and fluorescence spectroscopy along with singular value decomposition, we investigated the contributions of electrostatic interactions to the thermodynamic stability and folding of the protein subunit of Bacillus subtilis ribonuclease P, P protein. Taken together, the results suggest that unfavorable electrostatics alone do not account for the fact that P protein is intrinsically unfolded in the absence of ligand because the pKa differences observed between the folded and unfolded state are small. Presumably, multiple factors encoded in the P protein sequence account for its IUP property, which may play an important role in its function.
Resumo:
Purpose: To investigate the expression of Myt272-3 recombinant protein and also to predict a possible protein vaccine candidate against Mycobacterium tuberculosis . Methods: Myt272-3 protein was expressed in pET30a+-Myt272-3 clone. The purity of the protein was determined using Dynabeads® His-Tag Isolation & Pulldown. Protein sequence was analysed in silico using bioinformatics software for the prediction of allergenicity, antigenicity, MHC-I and MHC-II binding, and B-cell epitope binding. Results: The candidate protein was a non-allergen with 15.19 % positive predictive value. It was also predicted to be antigenic, with binding affinity to MHC-I and MHC-II, as well as B-cell epitope binding. Conclusion: The predicted results obtained in this study provide a guide for practical design of a new tuberculosis vaccine.
Resumo:
Guarana seeds have the highest caffeine concentration among plants accumulating purine alkaloids, but in contrast with coffee and tea, practically nothing is known about caffeine metabolism in this Amazonian plant. In this study, the levels of purine alkaloids in tissues of five guarana cultivars were determined. Theobromine was the main alkaloid that accumulated in leaves, stems, inflorescences and pericarps of fruit, while caffeine accumulated in the seeds and reached levels from 3.3% to 5.8%. In all tissues analysed, the alkaloid concentration, whether theobromine or caffeine, was higher in young/immature tissues, then decreasing with plant development/maturation. Caffeine synthase activity was highest in seeds of immature fruit. A nucleotide sequence (PcCS) was assembled with sequences retrieved from the EST database REALGENE using sequences of caffeine synthase from coffee and tea, whose expression was also highest in seeds from immature fruit. The PcCS has 1083bp and the protein sequence has greater similarity and identity with the caffeine synthase from cocoa (BTS1) and tea (TCS1). A recombinant PcCS allowed functional characterization of the enzyme as a bifunctional CS, able to catalyse the methylation of 7-methylxanthine to theobromine (3,7-dimethylxanthine), and theobromine to caffeine (1,3,7-trimethylxanthine), respectively. Among several substrates tested, PcCS showed higher affinity for theobromine, differing from all other caffeine synthases described so far, which have higher affinity for paraxanthine. When compared to previous knowledge on the protein structure of coffee caffeine synthase, the unique substrate affinity of PcCS is probably explained by the amino acid residues found in the active site of the predicted protein.
Resumo:
Glycoproteins from the total vesicular fluid of Taenia crassiceps (VF-Tc) were prepared using three different purification methods, consisting of ConA-lectin affinity chromatography (ConA-Tc), preparative electrophoresis (SDS-PAGE) (14gp-Tc), and monoclonal antibody immunoaffinity chromatography (18/14-Tc). The complex composition represented by the VF-Tc and ConA-Tc antigens revealed peptides ranging from 101 - to 14-kDa and from 92- to 12-kDa, respectively. Immunoblotting using lectins confirmed glucose/mannose (glc/man) residues in the 18- and 14-kDa peptides, which are considered specific and immunodominant for the diagnosis of cysticercosis, and indicated that these fractions are glycoproteins. Serum antibodies from a patient with neurocysticercosis that reacted to the 14gp band from T. crassiceps (Tc) were eluted from immunoblotting membranes and showed reactivity to 14gp from Taenia solium. In order to determine the similar peptide sequence, the N-terminal amino acid was determined and analyzed with sequences available in public databases. This sequence revealed partial homology between T. crassiceps and T solium peptides. In addition, mass spectrometry along with theoretical M(r) and pI of the 14gp-Tc point suggested a close relationship to some peptides of a 150-kDa protein complex of the T solium previously described. The identification of these common immunogenic sites will contribute to future efforts to develop recombinant antigens and synthetic peptides for immunological assays. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
Glycoprotein gp70 is an important intracellular antigen from Paracoccidioides brasillensis that elicits both humoral and cellular immune responses. Herein, the PbGP70 gene cloning from isolate Pb18 using internal peptide sequence information is reported. The deduced protein sequence bears two N-glycosylation sites, antigenic sites and two mouse T-cell epitopes. Anti-recombinant gp70 (rPbgp70) polyclonal antibodies reacted with a 70-kDa component in total cell extract of A brasiliensis, while MAbC5F11 and paracoccidioiclomycosis patients` sera recognized rPbgp70. Confocal microscopy with anti-rPbgp70 and MAbC5F11 showed intense staining and cytoplasmatic co-localization. The protein sequence belongs to the flavoprotein monooxygenase family which groups important anti-oxidative bioactive compounds. We found increased PbGP70 transcript accumulation under oxidative stress induced by H(2)O(2), during fungal growth and in macrophage phagocyted/bound yeasts. Therefore, gp70 might play a dual role in P. brasiliensis by both eliciting immune cellular and humoral responses in the host and protecting the fungus from oxidative stress generated by phagocytic cells. (c) 2009 Elsevier Inc. All rights reserved.
Resumo:
in Escherichia coli, the DnaG primase is the RNA polymerase that synthesizes RNA primers at replication forks. It is composed of three domains, a small N-terminal zinc-binding domain, a larger central domain responsible for RNA synthesis, and a C-terminal domain comprising residues 434-581 [DnaG(434-581)] that interact with the hexameric DnaB helicase. Presumably because of this interaction, it had not been possible previously to express the C-terminal domain in a stably transformed E coli strain. This problem was overcome by expression of DnaG(434-581) under control of tandem bacteriophage gimel-promoters, and the protein was purified in yields of 4-6 mg/L of culture and studied by NMR. A TOCSY spectrum of a 2 mM solution of the protein at pH 7.0, indicated that its structured core comprises residues 444-579. This was consistent with sequence conservation among most-closely related primases. Linewidths in a NOESY spectrum of a 0.5 mM sample in 10 mM phosphate, pH 6.05, 0.1 M NaCl, recorded at 36 degreesC, indicated the protein to be monomeric. Crystals of selenomethionine-substituted DnaG(434-581) obtained by the hanging-drop vapor-diffusion method were body-centered tetragonal, space group I4(1)22, with unit cell parameters a = b 142.2 Angstrom, c = 192.1 Angstrom, and diffracted beyond 2.7 Angstrom resolution with synchrotron radiation. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
Four viruses have been reported from taro; Dasheen mosaic virus (DsMV), Taro bacilliform virus (TaBV) and two putative rhabdoviruses, Colocasia bobone disease virus (CBDV) and Taro vein chlorosis virus (TaVCV). A fifth virus, tentatively named Taro reovirus (TaRV), has also been recently identified. The distribution of these viruses throughout the Pacific Islands, and the symptoms associated with their infection, are unknown in many cases due to a lack of sensitive diagnostic tests. We have used recently developed PCR-based diagnostic tests to survey taro growing in 11 Pacific Island countries for the presence of known viruses. DsMV and TaBV were widespread, whereas TaVCV and TaRV were more restricted in their distribution. CBDV was restricted to PNG and Solomon Islands and was always associated with the two most serious viral diseases of taro; alomae disease and bobone disease, but the causal agent of these two diseases remains unclear.
Resumo:
RAD51 colocalizes with both BRCA1 and BRCA2, and genetic variants in RAD51 would be candidate BRCA1/2 modifiers. We searched for RAD51 polymorphisms by sequencing 20 individuals. We compared the polymorphism allele frequencies between female BRCA1/2 mutation carriers with and without breast or ovarian cancer and between population-based ovarian cancer cases with BRCA1/2 mutations to cases and controls without mutations. We discovered two single nucleotide polymorphisms (SNPs) at positions 135 g-->c and 172 g-->t of the 5' untranslated region. In an initial group of BRCA1/2 mutation carriers, 14 (21%) of 67 breast cancer cases carried a c allele at RAD51:135 g-->c, whereas 8 (7%) of 119 women without breast cancer carried this allele. In a second set of 466 mutation carriers from three centers, the association of RAD51:135 g-->c with breast cancer risk was not confirmed. Analyses restricted to the 216 BRCA2 mutation carriers, however, showed a statistically significant association of the 135 c allele with the risk of breast cancer (adjusted odds ratio, 3.2; 95% confidence limit, 1.4-40). BRCA1/2 mutation carriers with ovarian cancer were only about one half as likely to carry the RAD51:135 g-->c SNP. Analysis of the RAD51:135 g-->c SNP in 738 subjects from an Israeli ovarian cancer case-control study was consistent with a lower risk of ovarian cancer among BRCA1/2 mutation carriers with the c allele. We have identified a RAD51 5' untranslated region SNP that may be associated with an increased risk of breast cancer and a lower risk of ovarian cancer among BRCA2 mutation carriers. The biochemical basis of this risk modifier is currently unknown.
Resumo:
An improved differential display technique was used to search for changes in gene expression in the superior frontal cortex of alcoholics, A cDNA fragment was retrieved and cloned. Further sequence of the cDNA was determined from 5' RACE and screening of a human brain cDNA library. The gene was named hNP22 (human neuronal protein 22). The deduced protein sequence of hNP22 has an estimated molecular mass of 22.4 kDa with a putative calcium-binding site, and phosphorylation sites for casein kinase II and protein kinase C. The deduced amino acid sequence of hNP22 shares homology (from 67% to 42%) with four other proteins, SM22 alpha, calponin, myophilin and mp20. Sequence homology suggests a potential interaction of hNP22 with cytoskeletal elements. hNP22 mRNA was expressed in various brain regions but in alcoholics, greater mRNA expression occurred in the superior frontal cortex, but not in the primary motor cortex or cerebellum. The results suggest that hNP22 may have a role in alcohol-related adaptations and may mediate regulatory signal transduction pathways in neurones.
Resumo:
Dimethyl sulfide dehydrogenase from the purple phototrophic bacterium Rhodovulum sulfidophilum catalyzes the oxidation of dimethyl sulfide to dimethyl sulfoxide. Recent DNA sequence analysis of the ddh operon, encoding dimethyl sulfide dehydrogenase (ddhABC), and biochemical analysis (1) have revealed that it is a member of the DMSO reductase family of molybdenum enzymes and is closely related to respiratory nitrate reductase (NarGHI). Variable temperature X-band EPR spectra (120122 K) of purified heterotrimeric dimethyl sulfide dehydrogenase showed resonances arising from multiple redox centers, Mo(V), [3Fe-4S](+), [4Fe-4S](+), and a b-type heme. A pH-dependent EPR study of the Mo(V) center in (H2O)-H-1 and (H2O)-H-2 revealed the presence of three Mo(V) species in equilibrium, Mo(V)-OH2, Mo(v)-anion, and Mo(V)-OH. Above pH 8.2 the dominant species was Mo(V)-OH. The maximum specific activity occurred at pH 9.27. Comparison of the rhombicity and anisotropy parameters for the Mo(V) species in DMS dehydrogenase with other molybdenum enzymes of the DMSO reductase family showed that it was most similar to the low-pH nitrite spectrum of Escherichia coli nitrate reductase (NarGHI), consistent with previous sequence analysis of DdhA and NarG. A sequence comparison of DdhB and NarH has predicted the presence of four [Fe-S] clusters in DdhB. A [3Fe-4S](+) cluster was identified in dimethyl sulfide dehydrogenase whose properties resembled those of center 2 of NarH. A [4Fe-4S](+) cluster was also identified with unusual spin Hamiltonian parameters, suggesting that one of the iron atoms may have a fifth non-sulfur ligand. The g matrix for this cluster is very similar to that found for the minor conformation of center 1 in NarH [Guigliarelli, B., Asso, M., More, C., Augher, V., Blasco, F., Pommier, J., Giodano, G., and Bertrand, P. (1992) Eur. J. Biochem. 307,63-68]. Analysis of a ddhC mutant showed that this gene encodes the b-type cytochrome in dimethyl sulfide dehydrogenase. Magnetic circular dichroism studies revealed that the axial ligands to the iron in this cytochrome are a histidine and methionine, consistent with predictions from protein sequence analysis. Redox potentiometry showed that the b-type cytochrome has a high midpoint redox potential (E-o = +315 mV, pH 8).