2 resultados para Specific protein(s)
em Bucknell University Digital Commons - Pensilvania - USA
Resumo:
Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.
Resumo:
The TM0727 gene of Thermotoga maritima is responsible for encoding what has been reported to be a modulator of DNA gyrase (pmbA). Although the function of pmbA is still unknown, it is believedto be involved in cell division, carbon storage regulation, and the synthesis of the antibiotic peptide microcin B17. It is suggested that it serves together with tldD, a known zinc dependent protease, tomodulate DNA gyrase. TM0727 is believed to be a zinc dependent protease that binds zinc in the central active site of the molecule, located between two equivalent monomeric units. However, thecrystal structure determined by Wilson et al. (2005) did not contain zinc. It therefore remains to be seen if TM0727 requires zinc for activity, or regulation, and if the protein is indeed a protease. To begin studying this protein, the gene was expressed in BL21(DE3) pLysS cells and the induction time was optimized. Using affinity and ion exchange chromatography, the protein has been successfully purified. The purification procedure can be replicated to obtain sufficient protein for characterization. Purification results show that the protein loses stability after 24 hours and remains stable under an imidazole-free lysis workup. Preliminary characterization of TM0727 has focused on understanding the protein’s structuralproperties through tryptophan fluorescence anisotropy measurements. The four tryptophan residues located within the TM0727 dimer fluoresce at different maximum wavelengths and with differentintensities upon excitation with 295nm light. These emission properties are highly sensitive to the environment (solvent, surrounding residues) of each tryptophan residue. The low number oftryptophans allows for a specific monitoring of the protein’s structure as it denatures. As more denaturant is added to the protein, its tryptophan environments have clearly altered. This is indicative of unfolding and increased solvent exposure of the protein. This unfolding has been confirmed with the addition of a fluorescent quencher. Additionally, fluorescence anisotropy measurements have been carried out on the protein to gain a preliminary understanding of the rotational dynamics of the tryptophan residues. These experiments excite the tryptophan residues within the sample using a polarized light source. Polarized emission is then detected, the degree of which depends on the rotational dynamics and local environment of the tryptophan residues. The protein was denatured and the changes in emission were recorded to detect these structural changes. Results have shown a large change in quaternary structure, consistent with a dimer to monomer transition, occurs at 1.5M Guandidine HCl. There has also been an examination of the crystal structure for the location of a potential active site. The inner cavity of the protein was inspected visually to locate a potential location for a catalytic triad, specifically the amino acids found in the active sites of serine, cyteine, and aspartateproteases. It was found that a potential aspartic protease active site may be located between the Asparate286 and Aspartate287 residues. Further investigation is warranted to test this remotepossibility.