929 resultados para protein sequence classification
Resumo:
Insulin-like growth factor binding protein 2 (IGFBP2) is a protein known to be overexpressed in a majority of glioblastoma multiforme (GBM) tumors. While it is known the IGFBP2 is involved in promoting GBM tumor cell invasion, no mechanism exists for how the protein is involved in signal transduction pathways leading to enhanced cell invasion. ^ We follow up on preliminary microarray data on IGFBP2-overexpressing GBM cells and protein sequence analysis of IGFBP2 in generating the hypothesis that IGFBP2 interacts with integnn α5 in regulating cell mobility. Microarray data showing upregulation of integrin α5 by IGFBP2 is validated and evidence of protein-protein interaction between IGFBP2 and integrin α5 is found. The exact binding domain on IGFBP2 responsible for its interaction with integrin α5 is also determined, confirming our initial findings and reaffirming that the IGFBP2/integrin α5 interaction is specific. Disruption of this interaction resulted in attenuation of IGFBP2-enhanced cell mobility. Further, we found that cell mobility is only enhanced when IGFBP2 and integrin α5 are both overexpressed and able to interact with each other. ^ We also determined fibronectin to be a critical player in the activation of the IGFBP2/integrin α5 pathway. The activation of this pathway appears to be progressive and initiates once GBM cells have sufficiently established anchorage. ^
Resumo:
El trigo blando (Triticum aestivum ssp vulgare L., AABBDD, 2n=6x=42) presenta propiedades viscoélasticas únicas debidas a la presencia en la harina de las prolaminas: gluteninas y gliadinas. Ambos tipos de proteínas forman parte de la red de gluten. Basándose en la movilidad en SDS-PAGE, las gluteninas se clasifican en dos grupos: gluteninas de alto peso molecular (HMW-GS) y gluteninas de bajo peso molecular (LMW-GS). Los genes que codifican para las HMW-GS se encuentran en tres loci del grupo 1 de cromosomas: Glu-A1, Glu-B1 y Glu-D1. Cada locus codifica para uno o dos polipéptidos o subunidades. La variación alélica de las HMW-GS es el principal determinante de de la calidad harino-panadera y ha sido ampliamente estudiado tanto a nivel de proteína como de ADN. El conocimiento de estas proteínas ha contribuido sustancialmente al progreso de los programas de mejora para la calidad del trigo. Comparadas con las HMW-GS, las LMW-GS forman una familia proteica mucho más compleja. La mayoría de los genes LMW se localizan en el grupo 1 de cromosomas en tres loci: Glu-A3, Glu-B3 y Glu-D3 que se encuentran estrechamente ligados a los loci que codifican para gliadinas. El número de copias de estos genes ha sido estimado entre 10-40 en trigo hexaploide, pero el número exacto aún se desconoce debido a la ausencia de un método eficiente para diferenciar los miembros de esta familia multigénica. La nomenclatura de los alelos LMW-GS por electroforesis convencional es complicada, y diferentes autores asignan distintos alelos a la misma variedad lo que dificulta aún más el estudio de esta compleja familia. El uso de marcadores moleculares para la discriminación de genes LMW, aunque es una tarea dificil, puede ser muy útil para los programas de mejora. El objetivo de este trabajo ha sido profundizar en la relación entre las gluteninas y la calidad panadera y desarrollar marcadores moleculares que permitan ayudar en la correcta clasificación de HMW-GS y LMW-GS. Se han obtenido dos poblaciones de líneas avanzadas F4:6 a partir de los cruzamientos entre las variedades ‘Tigre’ x ‘Gazul’ y ‘Fiel’ x ‘Taber’, seleccionándose para los análisis de calidad las líneas homogéneas para HMW-GS, LMW-GS y gliadinas. La determinación alélica de HMW-GS se llevó a cabo por SDS-PAGE, y se complementó con análisis moleculares, desarrollándose un nuevo marcador de PCR para diferenciar entre las subunidades Bx7 y Bx7*del locus Glu-B1. Resumen 2 La determinación alélica para LMW-GS se llevó a cabo mediante SDS-PAGE siguiendo distintas nomenclaturas y utilizando variedades testigo para cada alelo. El resultado no fue concluyente para el locus Glu-B3, así que se recurrió a marcadores moleculares. El ADN de los parentales y de los testigos se amplificó usando cebadores diseñados en regiones conservadas de los genes LMW y fue posteriormente analizado mediante electroforesis capilar. Los patrones de amplificación obtenidos fueron comparados entre las distintas muestras y permitieron establecer una relación con los alelos de LMW-GS. Con este método se pudo aclarar la determinación alélica de este locus para los cuatro parentales La calidad de la harina fue testada mediante porcentaje de contenido en proteína, prueba de sedimentación (SDSS) y alveógrafo de Chopin (parámetros P, L, P/L y W). Los valores fueron analizados en relación a la composición en gluteninas. Las líneas del cruzamiento ‘Fiel’ x ‘Taber’ mostraron una clara influencia del locus Glu-A3 en la variación de los valores de SDSS. Las líneas que llevaban el nuevo alelo Glu-A3b’ presentaron valores significativamente mayores que los de las líneas con el alelo Glu-A3f. En las líneas procedentes del cruzamiento ‘Tigre ’x ‘Gazul’, los loci Glu-B1 y Glu-B3 loci mostraron ambos influencia en los parámetros de calidad. Los resultados indicaron que: para los valores de SDSS y P, las líneas con las HMW-GS Bx7OE+By8 fueron significativamente mejores que las líneas con Bx17+By18; y las líneas que llevaban el alelo Glu-B3ac presentaban valores de P significativamente superiores que las líneas con el alelo Glu-B3ad y significativamente menores para los valores de L . El análisis de los valores de calidad en relación a los fragmentos LMW amplificados, reveló un efecto significativo entre dos fragmentos (2-616 y 2-636) con los valores de P. La presencia del fragmento 2-636 estaba asociada a valores de P mayores. Estos fragmentos fueron clonados y secuenciados, confirmándose que correspondían a genes del locus Glu-B3. El estudio de la secuencia reveló que la diferencia entre ambos se hallaba en algunos SNPs y en una deleción de 21 nucleótidos que en la proteína correspondería a un InDel de un heptapéptido en la región repetida de la proteína. En este trabajo, la utilización de líneas que difieren en el locus Glu-B3 ha permitido el análisis de la influencia de este locus (el peor caracterizado hasta la fecha) en la calidad panadera. Además, se ha validado el uso de marcadores moleculares en la determinación alélica de las LMW-GS y su relación con la calidad panadera. Summary 3 Bread wheat (Triticum aestivum ssp vulgare L., AABBDD, 2n=6x=42) flour has unique dough viscoelastic properties conferred by prolamins: glutenins and gliadins. Both types of proteins are cross-linked to form gluten polymers. On the basis of their mobility in SDS-PAGE, glutenins can be classified in two groups: high molecular weight glutenins (HMW-GS) and low molecular weight glutenins (LMW-GS). Genes encoding HMW-GS are located on group 1 chromosomes in three loci: Glu-A1, Glu-B1 and Glu-D1, each one encoding two polypeptides, named subunits. Allelic variation of HMW-GS is the most important determinant for bread making quality, and has been exhaustively studied at protein and DNA level. The knowledge of these proteins has substantially contributed to genetic improvement of bread quality in breeding programs. Compared to HMW-GS, LMW-GS are a much more complex family. Most genes encoded LMW-GS are located on group 1 chromosomes. Glu-A3, Glu-B3 and Glu-D3 loci are closely linked to the gliadin loci. The total gene copy number has been estimated to vary from 10–40 in hexaploid wheat. However, the exact copy number of LMW-GS genes is still unknown, mostly due to lack of efficient methods to distinguish members of this multigene family. Nomenclature of LMW-GS alleles is also unclear, and different authors can assign different alleles to the same variety increasing confusion in the study of this complex family. The use of molecular markers for the discrimination of LMW-GS genes might be very useful in breeding programs, but their wide application is not easy. The objective of this work is to gain insight into the relationship between glutenins and bread quality, and the developing of molecular markers that help in the allele classification of HMW-GS and LMW-GS. Two populations of advanced lines F4:6 were obtained from the cross ‘Tigre’ x ‘Gazul’ and ‘Fiel’ x ‘Taber’. Lines homogeneous for HMW-GS, LMW-GS and gliadins pattern were selected for quality analysis. The allele classification of HMW-GS was performed by SDS-PAGE, and then complemented by PCR analysis. A new PCR marker was developed to undoubtedly differentiate between two similar subunits from Glu-B1 locus, Bx7 and Bx7*. The allele classification of LMW-GS was initially performed by SDS-PAGE following different established nomenclatures and using standard varieties. The results were not completely concluding for Glu-B3 locus, so a molecular marker system was applied. DNA from parental lines and standard varieties was amplified using primers designed in conserved domains of LMW genes and analyzed by capillary electrophoresis. The pattern of amplification products obtained was compared among samples and related to the protein allele classification. It was possible to establish a correspondence between specific amplification products and almost all LMW alleles analyzed. With this method, the allele classification of the four parental lines was clarified. Flour quality of F4:6 advanced lines were tested by protein content, sedimentation test (SDSS) and alveograph (P, L, P/L and W). The values were analyzed in relation to the lines prolamin composition. In the ‘Fiel’ x ‘Taber’ population, Glu-A3 locus showed an influence in SDSS values. Lines carrying new allele Glu-A3b’, presented a significantly higher SDSS value than lines with Glu-A3f allele. In the ‘Tigre ’x ‘Gazul’ population, the Glu-B1 and Glu-B3 loci also showed an effect in quality parameters, in SDSS, and P and L values. Results indicated that: for SDSS and P, lines with Bx7OE+By8 were significantly better than lines with Bx17+By18; lines carrying Glu-B3ac allele had a significantly higher P values than Glu-B3ad allele values. lines with and lower L The analysis of quality parameters and amplified LMW fragments revealed a significant influence of two peaks (2-616 y 2-636) in P values. The presence of 2-636 peak gave higher P values than 2-616. These fragments had been cloned and sequenced and identified as Glu-B3 genes. The sequence analysis revealed that the molecular difference between them was some SNPs and a small deletion of 21 nucleotides that in the protein would produce an InDel of a heptapeptide in the repetitive region. In this work, the analysis of two crosses with differences in Glu-3 composition has made possible to study the influence of LMG-GS in quality parameters. Specifically, the influence of Glu-B3, the most interesting and less studied loci has been possible. The results have shown that Glu-B3 allele composition influences the alveograph parameter P (tenacity). The existence of different molecular variants of Glu-B3 alleles have been assessed by using a molecular marker method. This work supports the use of molecular approaches in the study of the very complex LMW-GS family, and validates their application in the analysis of advanced recombinant lines for quality studies.
Resumo:
We have identified and characterized CLARP, a caspase-like apoptosis-regulatory protein. Sequence analysis revealed that human CLARP contains two amino-terminal death effector domains fused to a carboxyl-terminal caspase-like domain. The structure and amino acid sequence of CLARP resemble those of caspase-8, caspase-10, and DCP2, a Drosophila melanogaster protein identified in this study. Unlike caspase-8, caspase-10, and DCP2, however, two important residues predicted to be involved in catalysis were lost in the caspase-like domain of CLARP. Analysis with fluorogenic substrates for caspase activity confirmed that CLARP is catalytically inactive. CLARP was found to interact with caspase-8 but not with FADD/MORT-1, an upstream death effector domain-containing protein of the Fas and tumor necrosis factor receptor 1 signaling pathway. Expression of CLARP induced apoptosis, which was blocked by the viral caspase inhibitor p35, dominant negative mutant caspase-8, and the synthetic caspase inhibitor benzyloxycarbonyl-Val-Ala-Asp-(OMe)-fluoromethylketone (zVAD-fmk). Moreover, CLARP augmented the killing ability of caspase-8 and FADD/MORT-1 in mammalian cells. The human clarp gene maps to 2q33. Thus, CLARP represents a regulator of the upstream caspase-8, which may play a role in apoptosis during tissue development and homeostasis.
Resumo:
Site-directed mutagenesis and combinatorial libraries are powerful tools for providing information about the relationship between protein sequence and structure. Here we report two extensions that expand the utility of combinatorial mutagenesis for the quantitative assessment of hypotheses about the determinants of protein structure. First, we show that resin-splitting technology, which allows the construction of arbitrarily complex libraries of degenerate oligonucleotides, can be used to construct more complex protein libraries for hypothesis testing than can be constructed from oligonucleotides limited to degenerate codons. Second, using eglin c as a model protein, we show that regression analysis of activity scores from library data can be used to assess the relative contributions to the specific activity of the amino acids that were varied in the library. The regression parameters derived from the analysis of a 455-member sample from a library wherein four solvent-exposed sites in an α-helix can contain any of nine different amino acids are highly correlated (P < 0.0001, R2 = 0.97) to the relative helix propensities for those amino acids, as estimated by a variety of biophysical and computational techniques.
Resumo:
The discovery of cyanobacterial phytochrome histidine kinases, together with the evidence that phytochromes from higher plants display protein kinase activity, bind ATP analogs, and possess C-terminal domains similar to bacterial histidine kinases, has fueled the controversial hypothesis that the eukaryotic phytochrome family of photoreceptors are light-regulated enzymes. Here we demonstrate that purified recombinant phytochromes from a higher plant and a green alga exhibit serine/threonine kinase activity similar to that of phytochrome isolated from dark grown seedlings. Phosphorylation of recombinant oat phytochrome is a light- and chromophore-regulated intramolecular process. Based on comparative protein sequence alignments and biochemical cross-talk experiments with the response regulator substrate of the cyanobacterial phytochrome Cph1, we propose that eukaryotic phytochromes are histidine kinase paralogs with serine/threonine specificity whose enzymatic activity diverged from that of a prokaryotic ancestor after duplication of the transmitter module.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
New approach for inhibiting Rev function and HIV-1 production using the influenza virus NS1 protein.
Resumo:
The Rev protein of HIV-1, which facilitates the nuclear export of HIV-1 pre-mRNAs, has been a target for antiviral therapy. Here we describe a new strategy for inhibiting Rev function and HIV-1 replication. In contrast to previous approaches, we use a wild-type rather than a mutant Rev protein and covalently link this Rev sequence to the NS1 protein of influenza A virus, a protein that inhibits the nuclear export of mRNAs. The NS1 protein contains an RNA-binding domain mutation (RM), so that the only functional RNA-binding domain in the chimeric protein (NS1RM-Rev) is in the Rev protein sequence. In the presence of the NS1RM-Rev chimeric protein, HIV-1 pre-mRNAs were retained in, rather than exported from, the nucleus. In addition, this chimeric protein effectively inhibited Rev function in trans in transfection experiments and effectively inhibited the production of HIV-1 in tissue culture cells transfected with an infectious molecular clone of HIV-1 DNA. The inhibitory activities of the NS1RM-Rev chimera were at least equivalent to those of the Rev M10 mutant protein, which has been considered to be the prototype trans inhibitor of Rev function and is currently in phase I clinical trials for the treatment of AIDS patients. We discuss (i) the potential for increasing the inhibitory activity of NS1-Rev chimeras against HIV-1 and (ii) the need for additional studies to evaluate these chimeras for the treatment of AIDS.
Resumo:
Recently, a large family of transducer proteins in the Archaeon Halobacterium salinarium was identified. On the basis of the comparison of the predicted structural domains of these transducers, three distinct subfamilies of transducers were proposed. Here we report isolation, complete gene sequences, and analysis of the encoded primary structures of transducer gene htrII, a member of family B, and its blue light receptor gene (sopII) of sensory rhodopsin II (SRII). The start codon ATG of the 714-bp sopII gene is one nucleotide beyond the termination codon TGA of the 2298-bp htrII gene. The deduced protein sequence of HtrII predicts a eubacterial chemotaxis transducer type with two hydrophobic membrane-spanning segments connecting sizable domains in the periplasm and cytoplasm. HtrII has a common feature with HtrI, the sensory rhodopsin I transducer; like HtrI, HtrII possesses a hydrophilic loop structure just after the second transmembrane segment. The C-terminal 299 residues (765 amino acid residues total) of HtrII show strong homology to the signaling and methylation domain of eubacterial transducer Tsr. The hydropathy plot of the primary structure of SRII indicates seven membrane-spanning alpha-helical segments, a characteristic feature of retinylidene proteins ("rhodopsins") from a widespread family of photoactive pigments. SRII shows high identity with SRI (42%), bacteriorhodopsin (BR) (32%), and halorhodopsin (24%). The crucial positions for retinal binding sites in these proteins are nearly identical, with the exception of Met-118 (numbering according to the mature BR sequence), which is replaced by Val in SRII. In BR, residues Asp-85 and Asp-96 are crucial in proton pumping. In SRII, the position corresponding to Asp-85 in BR is conserved, but the corresponding position of Asp-96 is replaced by an aromatic Tyr. Coexpression of the htrII and sopII genes restores SRII phototaxis to a mutant (Pho81) that contains a deletion in the htrI/sopI and insertion in htrII/sopII regions. This paper describes the first example that both HtrI and HtrII exist in the same halobacterial cell, confirming that different sensory rhodopsins SRI and SRII in the same organism have their own distinct transducers.
Resumo:
Life falls into three fundamental domains--Archaea, Bacteria, and Eucarya (formerly archaebacteria, eubacteria, and eukaryotes,. respectively). Though Archaea lack nuclei and share many morphological features with Bacteria, molecular analyses, principally of the transcription and translation machineries, have suggested that Archaea are more related to Eucarya than to Bacteria. Currently, little is known about the archaeal cell division apparatus. In Bacteria, a crucial component of the cell division machinery is FtsZ, a GTPase that localizes to a ring at the site of septation. Interestingly, FtsZ is distantly related in sequence to eukaryotic tubulins, which also interact with GTP and are components of the eukaryotic cell cytoskeleton. By screening for the ability to bind radiolabeled nucleotides, we have identified a protein of the hyperthermophilic archaeon Pyrococcus woesei that interacts tightly and specifically with GTP. Furthermore, through screening an expression library of P. woesei genomic DNA, we have cloned the gene encoding this protein. Sequence comparisons reveal that the P. woesei GTP-binding protein is strikingly related in sequence to eubacterial FtsZ and is marginally more similar to eukaryotic tubulins than are bacterial FtsZ proteins. Phylogenetic analyses reinforce the notion that there is an evolutionary linkage between FtsZ and tubulins. These findings suggest that the archaeal cell division apparatus may be fundamentally similar to that of Bacteria and lead us to consider the evolutionary relationships between Archaea, Bacteria, and Eucarya.
Resumo:
The three-dimensional structure of protein kinase C interacting protein 1 (PKCI-1) has been solved to high resolution by x-ray crystallography using single isomorphous replacement with anomalous scattering. The gene encoding human PKCI-1 was cloned from a cDNA library by using a partial sequence obtained from interactions identified in the yeast two-hybrid system between PKCI-1 and the regulatory domain of protein kinase C-beta. The PKCI-1 protein was expressed in Pichia pastoris as a dimer of two 13.7-kDa polypeptides. PKCI-1 is a member of the HIT family of proteins, shown by sequence identity to be conserved in a broad range of organisms including mycoplasma, plants, and humans. Despite the ubiquity of this protein sequence in nature, no distinct function has been shown for the protein product in vitro or in vivo. The PKCI-1 protomer has an alpha+beta meander fold containing a five-stranded antiparallel sheet and two helices. Two protomers come together to form a 10-stranded antiparallel sheet with extensive contacts between a helix and carboxy terminal amino acids of a protomer with the corresponding amino acids in the other protomer. PKCI-1 has been shown to interact specifically with zinc. The three-dimensional structure has been solved in the presence and absence of zinc and in two crystal forms. The structure of human PKCI-1 provides a model of this family of proteins which suggests a stable fold conserved throughout nature.
Resumo:
A novel cDNA, IA-2beta, was isolated from a mouse neonatal brain library. The predicted protein sequence revealed an extracellular domain, a transmembrane region, and an intracellular domain. The intracellular domain is 376 amino acids long and 74% identical to the intracellular domain of IA-2, a major autoantigen in insulin-dependent diabetes mellitus (IDDM). A partial sequence of the extracellular domain of IA-2beta indicates that it differs substantially (only 26% identical) from that of IA-2. Both molecules are expressed in islets and brain tissue. Forty-six percent (23 of 50) of the IDDM sera but none of the sera from normal controls (0 of 50) immunoprecipitated the intracellular domain of IA-2beta. Competitive inhibition experiments showed that IDDM sera have autoantibodies that recognize both common and distinct determinants on IA-2 and IA-2beta. Many IDDM sera are known to immunoprecipitate 37-kDa and 40-kDa tryptic fragments from islet cells, but the identity of the precursor protein(s) has remained elusive. The current study shows that treatment of recombinant IA-2beta and IA-2 with trypsin yields a 37-kDa fragment and a 40-kDa fragment, respectively, and that these fragments can be immunoprecipitated with diabetic sera. Absorption of diabetic sera with unlabeled recombinant IA-2 or IA-2beta, prior to incubation with radiolabeled 37-kDa and 40-kDa tryptic fragments derived from insulinoma or glucagonoma cells, blocks the immunoprecipitation of both of these radiolabeled tryptic fragments. We conclude that IA-2beta and IA-2 are the precursors of the 37-kDa and 40-kDa islet cell autoantigens, respectively, and that both IA-2 and IA-2beta are major autoantigens in IDDM.
Resumo:
Fas, a member of the tumor necrosis factor receptor family, can induce apoptosis when activated by Fas ligand binding or anti-Fas antibody crosslinking. Genetic studies have shown that a defect in Fas-mediated apoptosis resulted in abnormal development and function of the immune system in mice. A point mutation in the cytoplasmic domain of Fas (a single base change from T to A at base 786), replacing isoleucine with asparagine, abolishes the signal transducing property of Fas. Mice homozygous for this mutant allele (lprcg/lprcg mice) develop lymphadenopathy and a lupus-like autoimmune disease. Little is known about the mechanism of signal transduction in Fas-mediated apoptosis. In this study, we used the two-hybrid screen in yeast to isolate a Fas-associated protein factor, FAF1, which specifically interacts with the cytoplasmic domain of wild-type Fas but not the lprcg-mutated Fas protein. This interaction occurs not only in yeast but also in mammalian cells. When transiently expressed in L cells, FAF1 potentiated Fas-induced apoptosis. A search of available DNA and protein sequence data banks did not reveal significant homology between FAF1 and known proteins. Therefore, FAF1 is an unusual protein that binds to the wild type but not the inactive point mutant of Fas. FAF1 potentiates Fas-induced cell killing and is a candidate signal transducing molecule in the regulation of apoptosis.
Resumo:
An approach was developed for the isolation and characterization of soybean plasma membrane-associated proteins by immunoscreening of a cDNA expression library. An antiserum was raised against purified plasma membrane vesicles. In a differential screening of approximately 500,000 plaque-forming units with the anti-(plasma membrane) serum and DNA probes derived from highly abundant clones isolated in a preliminary screening, 261 clones were selected from approximately 1,200 antiserum-positive plaques. These clones were classified into 40 groups by hybridization analysis and 5'- and 3'-terminal sequencing. By searching nucleic acid and protein sequence data bases, 11 groups of cDNAs were identified, among which valosin-containing protein (VCP), clathrin heavy chain, phospholipase C, and S-adenosylmethionine:delta 24-sterol-C-methyltransferase have not to date been cloned from plants. The remaining 29 groups did not match any current data base entries and may, therefore, represent additional or yet uncharacterized genes. A full-length cDNA encoding the soybean VCP was sequenced. The high level of amino acid identity with vertebrate VCP and yeast CDC48 protein indicates that the soybean protein is a plant homolog of vertebrate VCP and yeast CDC48 protein.
Resumo:
With the completion of the human and mouse genome sequences, the task now turns to identifying their encoded transcripts and assigning gene function. In this study, we have undertaken a computational approach to identify and classify all of the protein kinases and phosphatases present in the mouse gene complement. A nonredundant set of these sequences was produced by mining Ensembl gene predictions and publicly available cDNA sequences with a panel of InterPro domains. This approach identified 561 candidate protein kinases and 162 candidate protein phosphatases. This cohort was then analyzed using TribeMCL protein sequence similarity clustering followed by CLUSTALV alignment and hierarchical tree generation. This approach allowed us to (1) distinguish between true members of the protein kinase and phosphatase families and enzymes of related biochemistry, (2) determine the structure of the families, and (3) suggest functions for previously uncharacterized members. The classifications obtained by this approach were in good agreement with previous schemes and allowed us to demonstrate domain associations with a number of clusters. Finally, we comment on the complementary nature of cDNA and genome-based gene detection and the impact of the FANTOM2 transcriptome project.
Resumo:
The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods. (C) 2005 Wiley-Liss, Inc.