86 resultados para protein sequence classification


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi- spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units ( TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our highconfidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We completed the genome sequence of Lettuce necrotic yellows virus (LNYV) by determining the nucleotide sequences of the 4a (putative phosphoprotein), 4b, M (matrix protein), G (glycoprotein) and L (polymerase) genes. The genome consists of 12,807 nucleotides and encodes six genes in the order 3' leader-N-4a(P)-4b-M-G-L-5' trailer. Sequences were derived from clones of a cDNA library from LNYV genomic RNA and from fragments amplified using reverse transcription-polymerase chain reaction. The 4a protein has a low isoelectric point characteristic for rhabdovirus phosphoproteins. The 4b protein has significant sequence similarities with the movement proteins of capillo- and trichoviruses and may be involved in cell-to-cell movement. The putative G protein sequence contains a predicted 25 amino acids signal peptide and endopeptidase cleavage site, three predicted glycosylation sites and a putative transmembrane domain. The deduced L protein sequence shows similarities with the L proteins of other plant rhabdoviruses and contains polymerase module motifs characteristic for RNA-dependent RNA polymerases of negative-strand RNA viruses. Phylogenetic analysis of this motif among rhabdoviruses placed LNYV in a group with other sequenced cytorhabdoviruses, most closely related to Strawberry crinkle virus. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

in Escherichia coli, the DnaG primase is the RNA polymerase that synthesizes RNA primers at replication forks. It is composed of three domains, a small N-terminal zinc-binding domain, a larger central domain responsible for RNA synthesis, and a C-terminal domain comprising residues 434-581 [DnaG(434-581)] that interact with the hexameric DnaB helicase. Presumably because of this interaction, it had not been possible previously to express the C-terminal domain in a stably transformed E coli strain. This problem was overcome by expression of DnaG(434-581) under control of tandem bacteriophage gimel-promoters, and the protein was purified in yields of 4-6 mg/L of culture and studied by NMR. A TOCSY spectrum of a 2 mM solution of the protein at pH 7.0, indicated that its structured core comprises residues 444-579. This was consistent with sequence conservation among most-closely related primases. Linewidths in a NOESY spectrum of a 0.5 mM sample in 10 mM phosphate, pH 6.05, 0.1 M NaCl, recorded at 36 degreesC, indicated the protein to be monomeric. Crystals of selenomethionine-substituted DnaG(434-581) obtained by the hanging-drop vapor-diffusion method were body-centered tetragonal, space group I4(1)22, with unit cell parameters a = b 142.2 Angstrom, c = 192.1 Angstrom, and diffracted beyond 2.7 Angstrom resolution with synchrotron radiation. (C) 2003 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Four viruses have been reported from taro; Dasheen mosaic virus (DsMV), Taro bacilliform virus (TaBV) and two putative rhabdoviruses, Colocasia bobone disease virus (CBDV) and Taro vein chlorosis virus (TaVCV). A fifth virus, tentatively named Taro reovirus (TaRV), has also been recently identified. The distribution of these viruses throughout the Pacific Islands, and the symptoms associated with their infection, are unknown in many cases due to a lack of sensitive diagnostic tests. We have used recently developed PCR-based diagnostic tests to survey taro growing in 11 Pacific Island countries for the presence of known viruses. DsMV and TaBV were widespread, whereas TaVCV and TaRV were more restricted in their distribution. CBDV was restricted to PNG and Solomon Islands and was always associated with the two most serious viral diseases of taro; alomae disease and bobone disease, but the causal agent of these two diseases remains unclear.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

RAD51 colocalizes with both BRCA1 and BRCA2, and genetic variants in RAD51 would be candidate BRCA1/2 modifiers. We searched for RAD51 polymorphisms by sequencing 20 individuals. We compared the polymorphism allele frequencies between female BRCA1/2 mutation carriers with and without breast or ovarian cancer and between population-based ovarian cancer cases with BRCA1/2 mutations to cases and controls without mutations. We discovered two single nucleotide polymorphisms (SNPs) at positions 135 g-->c and 172 g-->t of the 5' untranslated region. In an initial group of BRCA1/2 mutation carriers, 14 (21%) of 67 breast cancer cases carried a c allele at RAD51:135 g-->c, whereas 8 (7%) of 119 women without breast cancer carried this allele. In a second set of 466 mutation carriers from three centers, the association of RAD51:135 g-->c with breast cancer risk was not confirmed. Analyses restricted to the 216 BRCA2 mutation carriers, however, showed a statistically significant association of the 135 c allele with the risk of breast cancer (adjusted odds ratio, 3.2; 95% confidence limit, 1.4-40). BRCA1/2 mutation carriers with ovarian cancer were only about one half as likely to carry the RAD51:135 g-->c SNP. Analysis of the RAD51:135 g-->c SNP in 738 subjects from an Israeli ovarian cancer case-control study was consistent with a lower risk of ovarian cancer among BRCA1/2 mutation carriers with the c allele. We have identified a RAD51 5' untranslated region SNP that may be associated with an increased risk of breast cancer and a lower risk of ovarian cancer among BRCA2 mutation carriers. The biochemical basis of this risk modifier is currently unknown.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An improved differential display technique was used to search for changes in gene expression in the superior frontal cortex of alcoholics, A cDNA fragment was retrieved and cloned. Further sequence of the cDNA was determined from 5' RACE and screening of a human brain cDNA library. The gene was named hNP22 (human neuronal protein 22). The deduced protein sequence of hNP22 has an estimated molecular mass of 22.4 kDa with a putative calcium-binding site, and phosphorylation sites for casein kinase II and protein kinase C. The deduced amino acid sequence of hNP22 shares homology (from 67% to 42%) with four other proteins, SM22 alpha, calponin, myophilin and mp20. Sequence homology suggests a potential interaction of hNP22 with cytoskeletal elements. hNP22 mRNA was expressed in various brain regions but in alcoholics, greater mRNA expression occurred in the superior frontal cortex, but not in the primary motor cortex or cerebellum. The results suggest that hNP22 may have a role in alcohol-related adaptations and may mediate regulatory signal transduction pathways in neurones.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dimethyl sulfide dehydrogenase from the purple phototrophic bacterium Rhodovulum sulfidophilum catalyzes the oxidation of dimethyl sulfide to dimethyl sulfoxide. Recent DNA sequence analysis of the ddh operon, encoding dimethyl sulfide dehydrogenase (ddhABC), and biochemical analysis (1) have revealed that it is a member of the DMSO reductase family of molybdenum enzymes and is closely related to respiratory nitrate reductase (NarGHI). Variable temperature X-band EPR spectra (120122 K) of purified heterotrimeric dimethyl sulfide dehydrogenase showed resonances arising from multiple redox centers, Mo(V), [3Fe-4S](+), [4Fe-4S](+), and a b-type heme. A pH-dependent EPR study of the Mo(V) center in (H2O)-H-1 and (H2O)-H-2 revealed the presence of three Mo(V) species in equilibrium, Mo(V)-OH2, Mo(v)-anion, and Mo(V)-OH. Above pH 8.2 the dominant species was Mo(V)-OH. The maximum specific activity occurred at pH 9.27. Comparison of the rhombicity and anisotropy parameters for the Mo(V) species in DMS dehydrogenase with other molybdenum enzymes of the DMSO reductase family showed that it was most similar to the low-pH nitrite spectrum of Escherichia coli nitrate reductase (NarGHI), consistent with previous sequence analysis of DdhA and NarG. A sequence comparison of DdhB and NarH has predicted the presence of four [Fe-S] clusters in DdhB. A [3Fe-4S](+) cluster was identified in dimethyl sulfide dehydrogenase whose properties resembled those of center 2 of NarH. A [4Fe-4S](+) cluster was also identified with unusual spin Hamiltonian parameters, suggesting that one of the iron atoms may have a fifth non-sulfur ligand. The g matrix for this cluster is very similar to that found for the minor conformation of center 1 in NarH [Guigliarelli, B., Asso, M., More, C., Augher, V., Blasco, F., Pommier, J., Giodano, G., and Bertrand, P. (1992) Eur. J. Biochem. 307,63-68]. Analysis of a ddhC mutant showed that this gene encodes the b-type cytochrome in dimethyl sulfide dehydrogenase. Magnetic circular dichroism studies revealed that the axial ligands to the iron in this cytochrome are a histidine and methionine, consistent with predictions from protein sequence analysis. Redox potentiometry showed that the b-type cytochrome has a high midpoint redox potential (E-o = +315 mV, pH 8).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies. (C) 2003 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the sequencing and annotation of genomes and transcriptomes of several eukaryotes, the importance of noncoding RNA (ncRNA)-RNA molecules that are not translated to protein products-has become more evident. A subclass of ncRNA transcripts are encoded by highly regulated, multi-exon, transcriptional units, are processed like typical protein-coding mRNAs and are increasingly implicated in regulation of many cellular functions in eukaryotes. This study describes the identification of candidate functional ncRNAs from among the RIKEN mouse full-length cDNA collection, which contains 60,770 sequences, by using a systematic computational filtering approach. We initially searched for previously reported ncRNAs and found nine murine ncRNAs and homologs of several previously described nonmouse ncRNAs. Through our computational approach to filter artifact-free clones that lack protein coding potential, we extracted 4280 transcripts as the largest-candidate set. Many clones in the set had EST hits, potential CpG islands surrounding the transcription start sites, and homologies with the human genome. This implies that many candidates are indeed transcribed in a regulated manner. Our results demonstrate that ncRNAs are a major functional subclass of processed transcripts in mammals.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sulfadoxine is predominantly used in combination with pyrimethamine, commonly known as Fansidar, for the treatment of Plasmodium falciparum. This combination is usually less effective against Plasmodium vivax, probably due to the innate refractoriness of parasites to the sulfadoxine component. To investigate this mechanism of resistance by P. vivax to sulfadoxine, we cloned and sequenced the P. vivax dhps (pvdhps) gene. The protein sequence was determined, and three-dimensional homology models of dihydropteroate synthase (DHPS) from P. vivax as well as P. falciparum were created. The docking of sulfadoxine to the two DHPS models allowed us to compare contact residues in the putative sulfadoxine-binding site in both species. The predicted sulfadoxine-binding sites between the species differ by one residue, V585 in P. vivax, equivalent to A613 in P. falciparum. V585 in P. vivax is predicted by energy minimization to cause a reduction in binding of sulfadoxine to DHPS in P. vivax compared to P. falciparum. Sequencing dhps genes from a limited set of geographically different P. vivax isolates revealed that V585 was present in all of the samples, suggesting that V585 may be responsible for innate resistance of P. vivax to sulfadoxine. Additionally, amino acid mutations were observed in some P. vivax isolates in positions known to cause resistance in P. falciparum, suggesting that, as in P. falciparum, these mutations are responsible for acquired increases in resistance of P. vivax to sulfadoxine.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A key component of the venom of many Australian snakes belonging to the elapid family is a toxin that is structurally and functionally similar to that of the mammalian prothrombinase complex. In mammals, this complex is responsible for the cleavage of prothrombin to thrombin and is composed of factor Xa in association with its cofactors calcium, phospholipids, and factor Va. The snake prothrombin activators have been classified on the basis of their requirement for cofactors for activity. The two major subgroups described in Australian elapid snakes, groups C and D, are differentiated by their requirement for mammalian coagulation factor Va. In this study, we describe the cloning, characterization, and comparative analysis of the factor X- and factor V-like components of the prothrombin activators from the venom glands of snakes possessing either group C or D prothrombin activators. The overall domain arrangement in these proteins was highly conserved between all elapids and with the corresponding mammalian clotting factors. The deduced protein sequence for the factor X-like protease precursor, identified in elapids containing either group C or D prothrombin activators, demonstrated a remarkable degree of relatedness to each other (80%-97%). The factor V-like component of the prothrombin activator, present only in snakes containing group C complexes, also showed a very high degree of homology (96%-98%). Expression of both the factor X- and factor V-like proteins determined by immunoblotting provided an additional means of separating these two groups at the molecular level. The molecular phylogenetic analysis described here represents a new approach for distinguishing group C and D snake prothrombin activators and correlates well with previous classifications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Orphan nuclear receptors: therapeutic opportunities in skeletal muscle. Am J Physiol Cell Physiol 291: C203-C217, 2006; doi: 10.1152/ajpcell. 00476.2005.-Nuclear hormone receptors (NRs) are ligand-dependent transcription factors that bind DNA and translate physiological signals into gene regulation. The therapeutic utility of NRs is underscored by the diversity of drugs created to manage dysfunctional hormone signaling in the context of reproductive biology, inflammation, dermatology, cancer, and metabolic disease. For example, drugs that target nuclear receptors generate over $10 billion in annual sales. Almost two decades ago, gene products were identified that belonged to the NR superfamily on the basis of DNA and protein sequence identity. However, the endogenous and synthetic small molecules that modulate their action were not known, and they were denoted orphan NRs. Many of the remaining orphan NRs are highly enriched in energy-demanding major mass tissues, including skeletal muscle, brown and white adipose, brain, liver, and kidney. This review focuses on recently adopted and orphan NR function in skeletal muscle, a tissue that accounts for similar to 35% of the total body mass and energy expenditure, and is a major site of fatty acid and glucose utilization. Moreover, this lean tissue is involved in cholesterol efflux and secretes that control energy expenditure and adiposity. Consequently, muscle has a significant role in insulin sensitivity, the blood lipid profile, and energy balance. Accordingly, skeletal muscle plays a considerable role in the progression of dyslipidemia, diabetes, and obesity. These are risk factors for cardiovascular disease, which is the the foremost cause of global mortality (> 16.7 million deaths in 2003). Therefore, it is not surprising that orphan NRs and skeletal muscle are emerging as therapeutic candidates in the battle against dyslipidemia, diabetes, obesity, and cardiovascular disease.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Large-scale gene discovery has been performed for the grass fungal endophytes Neotyphodium coenophialum, Neotyphodium lolii, and Epichloe festucae. The resulting sequences have been annotated by comparison with public DNA and protein sequence databases and using intermediate gene ontology annotation tools. Endophyte sequences have also been analysed for the presence of simple sequence repeat and single nucleotide polymorphism molecular genetic markers. Sequences and annotation are maintained within a MySQL database that may be queried using a custom web interface. Two cDNA-based microarrays have been generated from this genome resource, They permit the interrogation of 3806 Neotyphodium genes (Nchip (TM) rnicroarray), and 4195 Neotyphodium and 920 Epichloe genes (EndoChip (TM) microarray), respectively. These microarrays provide tools for high-throughput transcriptome analysis, including genome-specific gene expression studies, profiling of novel endophyte genes, and investigation of the host grass-symbiont interaction. Comparative transcriptome analysis in Neotyphodium and Epichloe was performed. (c) 2006 Elsevier