89 resultados para Protein Sequence Analysis
Resumo:
Dimethyl sulfide dehydrogenase from the purple phototrophic bacterium Rhodovulum sulfidophilum catalyzes the oxidation of dimethyl sulfide to dimethyl sulfoxide. Recent DNA sequence analysis of the ddh operon, encoding dimethyl sulfide dehydrogenase (ddhABC), and biochemical analysis (1) have revealed that it is a member of the DMSO reductase family of molybdenum enzymes and is closely related to respiratory nitrate reductase (NarGHI). Variable temperature X-band EPR spectra (120122 K) of purified heterotrimeric dimethyl sulfide dehydrogenase showed resonances arising from multiple redox centers, Mo(V), [3Fe-4S](+), [4Fe-4S](+), and a b-type heme. A pH-dependent EPR study of the Mo(V) center in (H2O)-H-1 and (H2O)-H-2 revealed the presence of three Mo(V) species in equilibrium, Mo(V)-OH2, Mo(v)-anion, and Mo(V)-OH. Above pH 8.2 the dominant species was Mo(V)-OH. The maximum specific activity occurred at pH 9.27. Comparison of the rhombicity and anisotropy parameters for the Mo(V) species in DMS dehydrogenase with other molybdenum enzymes of the DMSO reductase family showed that it was most similar to the low-pH nitrite spectrum of Escherichia coli nitrate reductase (NarGHI), consistent with previous sequence analysis of DdhA and NarG. A sequence comparison of DdhB and NarH has predicted the presence of four [Fe-S] clusters in DdhB. A [3Fe-4S](+) cluster was identified in dimethyl sulfide dehydrogenase whose properties resembled those of center 2 of NarH. A [4Fe-4S](+) cluster was also identified with unusual spin Hamiltonian parameters, suggesting that one of the iron atoms may have a fifth non-sulfur ligand. The g matrix for this cluster is very similar to that found for the minor conformation of center 1 in NarH [Guigliarelli, B., Asso, M., More, C., Augher, V., Blasco, F., Pommier, J., Giodano, G., and Bertrand, P. (1992) Eur. J. Biochem. 307,63-68]. Analysis of a ddhC mutant showed that this gene encodes the b-type cytochrome in dimethyl sulfide dehydrogenase. Magnetic circular dichroism studies revealed that the axial ligands to the iron in this cytochrome are a histidine and methionine, consistent with predictions from protein sequence analysis. Redox potentiometry showed that the b-type cytochrome has a high midpoint redox potential (E-o = +315 mV, pH 8).
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Epstein-Barr virus (EBV)-encoded oncogene latent membrane protein (LMP) 1, which is consistently expressed in multiple EBV-associated malignancies, has been proposed as a potential target antigen for any future vaccine designed to control these malignancies. However, the high degree of genetic variation in the LMP1 sequence has been considered a major impediment for its use as a potential immunotherapeutic target for the treatment of EBV-associated malignancies. In the present study, we have employed a highly efficient strategy, based on ex vivo functional assays, to conduct an extensive sequence-wide analysis of LMP1-specific T-cell responses in a large panel of healthy virus carriers of diverse ethnic origin and nasopharyngeal carcinoma patients. By comparing the frequencies of T cells specific for overlapping peptides spanning LMP1, we mapped a number of novel HLA class I- and class II-restricted LMP1 T-cell epitopes, including an epitope with dual HLA class I restriction. More importantly, extensive sequence analysis of LMP1 revealed that the majority of the T-cell epitopes were highly conserved in EBV isolates from Caucasian, Papua New Guinean, African, and Southeast Asian populations, while unique geographically constrained genetic variation was observed within one HLA A2 supertype-restricted epitope. These findings indicate that conserved LMP1 epitopes should be considered in designing epitope-based immunotherapeutic strategies against EBV-associated malignancies in different ethnic populations.
Resumo:
The Alzheimer's disease amyloid protein precursor (APP) gene is part of a multi-gene super-family from which sixteen homologous amyloid precursor-like proteins (APLP) and APP species homologues have been isolated and characterised. Comparison of exon structure (including the uncharacterised APL-1 gene), construction of phylogenetic trees, and analysis of the protein sequence alignment of known homologues of the APP super-family were performed to reconstruct the evolution of the family and to assess the functional significance of conserved protein sequences between homologues. This analysis supports an adhesion function for all members of the APP super family, with specificity determined by those sequences which are not conserved between APLP lineages, and provides evidence for an increasingly complex APP superfamily during evolution. The analysis also suggests that Drosophila APPL and Caenorhabdotids elegans APL-1 may be a fourth APLP lineage indicating that these proteins, while not functional homologues of human APP, are similarly likely to regulate cell adhesion. Furthermore, the beta A4 sequence is highly conserved only in APP orthologues, strongly suggesting this sequence is of significant functional importance in this lineage. (C) 2000 Elsevier Science Ltd. All rights reserved.
Resumo:
A genomic region containing the fatty acid biosynthetic (fab) genes was isolated from the sugarcane leaf-scald pathogen Xanthomonasalbilineans. The order and predicted products of fabG (beta -ketoacyl reductase), acpP (acyl carrier protein), fabF(ketoacyl synthase II) and downstream genes in X. albilineans are very similar to those in Escherichia coli, with one exception. Sequence analysis, confirmed by insertional knockout and specific substrate feeding experiments, shows that the position occupied by pabC (encoding aminodeoxychorismate lyase) in other bacteria is occupied instead by pabB (encoding aminodeoxychorismate synthase component I) in X. albilineans. Downstream of pabB, X. albilineans resumes the arrangement common to characterized Gram-negative bacteria, with three transcriptionally coupled genes, encoding an ORF340 protein of undefined function, thymidylate kinase and delta' subunit of DNA polymerase III holoenzyme (HolB). Different species may obtain a common advantage from coordinated regulation of the same biosynthetic pathways using different genes in this region. (C) 2000 Federation of European Microbiological Societies. Published by Elsevier Science B.V. All rights reserved.
Resumo:
Within steroid receptor heterocomplexes the large tetraticopeptide repeat-containing immunophilins, cyclophilin 40 (CyP40), FKBP51, and FKBP52, target a common interaction site in heat shock protein 90 (HspSO) and act coordinately with HspSO to modulate receptor activity. The reversible nature of the interaction between the immunophilins and HspSO suggests that relative cellular abundance might be a key determinant of the immunophilin component within steroid receptor complexes. To investigate CyP40 gene regulation, we have isolated a fi-kilobase (kb) 5 ' -flanking region of the human gene and demonstrated that a similar to 50 base pair (bp) sequence adjacent to the transcription start site is essential for CyP40 basal expression. Three tandemly arranged Ets sites within this critical region were identified as binding elements for the multimeric Ets-related transcription factor, GA binding protein (GABP). Functional studies of this proximal promoter sequence, in combination with mutational analysis, confirmed these sites to be crucial for basal promoter function. Furthermore, overexpression of both GABP alpha and GABP beta subunits in Cos1 cells resulted in increased endogenous CyP40 mRNA levels. Significantly, a parallel increase in FKBP52 mRNA expression was not observed, highlighting an important difference in the mode of regulation of the CyP40 and FKBP52 genes. Our results identify GABP as a key regulator of CyP40 expression. GAFF is a common target of mitogen and stress-activated pathways and may integrate these diverse extracellular signals to regulate CyP40 gene expression.
Resumo:
Fragile sites appear visually as nonstaining gaps on chromosomes that are inducible by specific cell culture conditions. Expansion of CGG/ CCG repeats has been shown to be the molecular basis of all five folate-sensitive fragile sites characterized molecularly so far, i.e., FRAXA, FRAXE, FRAXF, FRA11B, and FRA16A. In the present study we have refined the localization of the FRA10A folate-sensitive fragile site by fluorescence in situ hybridization. Sequence analysis of a BAC clone spanning FRA10A identified a single, imperfect, but polymorphic CGG repeat that is part of a CpG island in the 5'UTR of a novel gene named FRA10ACl. The number of CGG repeats varied in the population from 8 to 13. Expansions exceeding 200 repeat units were methylated in all FRA10A fragile site carriers tested. The FRA10ACl gene consists of 19 exons and is transcribed in the centromeric direction from the FRA10A repeat. The major transcript of similar to 1450 nt is ubiquitously expressed and codes for a highly conserved protein, FRA10ACl, of unknown function. Several splice variants leading to alternative 3' ends were identified (particularly in testis). These give rise to FRA10ACl proteins with altered COOH-termini. Immunofluorescence analysis of full-length, recombinant EGFP-tagged FRA10ACl protein showed that it was present exclusively in the nucleoplasm. We show that the expression of FRA10A, in parallel to the other cloned folate-sensitive fragile sites, is caused by an expansion and subsequent methylation of an unstable CGG trinucleotide repeat. Taking advantage of three cSNPs within the FRA10ACl gene we demonstrate that one allele of the gene is not transcribed in a FRA10A carrier. Our data also suggest that in the heterozygous state FRA10A is likely a benign folate-sensitive fragile site. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
This report describes the identification of a murine cytomegalovirus (MCMV) G protein-coupled receptor (GCR) homolog. This open reading frame (M33) is most closely related to, and collinear with, human cytomegalovirus UL33, and homologs are also present in human herpesvirus 6 and 7 (U12 for both viruses). Conserved counterparts in the sequenced alpha- or gammaherpesviruses have not been identified to date, suggesting that these genes encode proteins which are important for the biological characteristics of betaherpesviruses. We have detected transcripts for both UL33 and M33 as early as 3 or 4 h postinfection, and these reappear at late times. In addition, we have identified N-terminal splicing for both the UL33 and M33 RNA transcripts. For both open reading frames, splicing results in the introduction of amino acids which are highly conserved among known GCRs. To characterise the function of the M33 in the natural host, two independent MCMV recombinant viruses were prepared, each of which possesses an M33 open reading frame which has been disrupted with the beta-galactosidase gene. While the recombinant M33 null viruses showed no phenotypic differences in replication from wild-type MCMV in primary mouse embryo fibroblasts in vitro, they showed severely restricted growth in the salivary glands of infected mice. These data suggest that M33 plays an important role in vivo, in particular in the dissemination to or replication in the salivary gland, and provide the first evidence for the function of a viral GCR homolog in vivo.
Resumo:
A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteornes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins
Resumo:
Four male cone-specific promoters were isolated from the genome of Pinus radiata D. Don, fused to the beta-glucuronidase (GUS) reporter gene and analysed in the heterologous host Arabidopsis thaliana (L.) Heynh. The temporal and spatial activities of the promoters PrCHS1, PrLTP2, PrMC2 and PrMALE1 during seven anther developmental stages are described in detail. The two promoters PrMC2 and PrMALE1 confer an identical GUS expression pattern on Arabidopsis anthers. DNA sequence analysis of the PrMC2 and PrMALE1 promoters revealed an 88% sequence identity over 276 bp and divergence further upstream (
Resumo:
Classic Hodgkin's lymphoma (HL) tissue contains a small population of morphologically distinct malignant cells called Hodgkin and Reed-Sternberg (HRS) cells, associated with the development of HL. Using 3'-rapid amplification of cDNA ends ( RACE) we identified an alternative mRNA for the DEC-205 multilectin receptor in the HRS cell line L428. Sequence analysis revealed that the mRNA encodes a fusion protein between DEC-205 and a novel C-type lectin DCL-1. Although the 7.5-kb DEC-205 and 4.2-kb DCL-1 mRNA were expressed independently in myeloid and B lymphoid cell lines, the DEC-205/DCL-1 fusion mRNA (9.5 kb) predominated in the HRS cell lines ( L428, KM-H2, and HDLM-2). The DEC-205 and DCL-1 genes comprising 35 and 6 exons, respectively, are juxtaposed on chromosome band 2q24 and separated by only 5.4 kb. We determined the DCL-1 transcription initiation site within the intervening sequence by 5'-RACE, confirming that DCL-1 is an independent gene. Two DEC-205/DCL-1 fusion mRNA variants may result from cotranscription of DEC-205 and DCL-1, followed by splicing DEC-205 exon 35 or 34-35 along with DCL-1 exon 1. The resulting reading frames encode the DEC-205 ectodomain plus the DCL-1 ectodomain, the transmembrane, and the cytoplasmic domain. Using DCL-1 cytoplasmic domain-specific polyclonal and DEC-205 monoclonal antibodies for immunoprecipitation/Western blot analysis, we showed that the fusion mRNA is translated into a DEC-205/DCL-1 fusion protein, expressed in the HRS cell lines. These results imply an unusual transcriptional control mechanism in HRS cells, which cotranscribe an mRNA containing DEC-205 and DCL-1 prior to generating the intergenically spliced mRNA to produce a DEC-205/DCL-1 fusion protein.
Resumo:
The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods. (C) 2005 Wiley-Liss, Inc.
Resumo:
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.
Resumo:
A key component of the venom of many Australian snakes belonging to the elapid family is a toxin that is structurally and functionally similar to that of the mammalian prothrombinase complex. In mammals, this complex is responsible for the cleavage of prothrombin to thrombin and is composed of factor Xa in association with its cofactors calcium, phospholipids, and factor Va. The snake prothrombin activators have been classified on the basis of their requirement for cofactors for activity. The two major subgroups described in Australian elapid snakes, groups C and D, are differentiated by their requirement for mammalian coagulation factor Va. In this study, we describe the cloning, characterization, and comparative analysis of the factor X- and factor V-like components of the prothrombin activators from the venom glands of snakes possessing either group C or D prothrombin activators. The overall domain arrangement in these proteins was highly conserved between all elapids and with the corresponding mammalian clotting factors. The deduced protein sequence for the factor X-like protease precursor, identified in elapids containing either group C or D prothrombin activators, demonstrated a remarkable degree of relatedness to each other (80%-97%). The factor V-like component of the prothrombin activator, present only in snakes containing group C complexes, also showed a very high degree of homology (96%-98%). Expression of both the factor X- and factor V-like proteins determined by immunoblotting provided an additional means of separating these two groups at the molecular level. The molecular phylogenetic analysis described here represents a new approach for distinguishing group C and D snake prothrombin activators and correlates well with previous classifications.