34 resultados para alignment-free methods
Resumo:
G-protein coupled receptors (GPCRs) are a superfamily of membrane integral proteins responsible for a large number of physiological functions. Approximately 50% of marketed drugs are targeted toward a GPCR. Despite showing a high degree of structural homology, there is a large variance in sequence within the GPCR superfamily which has lead to difficulties in identifying and classifying potential new GPCR proteins. Here the various computational techniques that can be used to characterize a novel GPCR protein are discussed, including both alignment-based and alignment-free approaches. In addition, the application of homology modeling to building the three-dimensional structures of GPCRs is described.
Resumo:
MOTIVATION: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. RESULTS: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases.
Resumo:
The G-protein coupled receptor (GPCR) superfamily fulfils various metabolic functions and interacts with a diverse range of ligands. There is a lack of sequence similarity between the six classes that comprise the GPCR superfamily. Moreover, most novel GPCRs found have low sequence similarity to other family members which makes it difficult to infer properties from related receptors. Many different approaches have been taken towards developing efficient and accurate methods for GPCR classification, ranging from motif-based systems to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of their amino acid sequences. This review describes the inherent difficulties in developing a GPCR classification algorithm and includes techniques previously employed in this area.
Resumo:
Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.
Resumo:
Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function, and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from “first passage probability distribution” to summarize statistics of ensemble averaged amino acid propensity values. In this paper, we introduce and elaborate this approach.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
Phosphatase and tensin homolog (PTEN) is a redox-sensitive, dual-specificity protein phosphatase involved in regulating a number of cellular processes including metabolism, apoptosis, cell proliferation and survival. It acts as a tumor suppressor by negatively regulating the PI3K/Akt pathway. While direct evidence of a redox regulation of PTEN downstream signaling has been reported, the effect of cellular oxidative stress or direct PTEN oxidation on the PTEN interactome is still poorly defined. To investigate this, PTEN-GST fusion protein was prepared in its reduced form and an H2O2-oxidized form that was reversible by DTT treatment, and these were immobilized on a glutathione-sepharose-based support. The immobilized protein was incubated with cell lysate to capture interacting proteins. Captured proteins were eluted from the beads, analyzed by LC-MSMS and comparatively quantified using label-free methods. After subtraction of interactors that were also present in the resin and GST controls, 97 individual protein interactors were identified, including several that are novel. Fourteen interactors that varied significantly with the redox status of PTEN were identified, including thioredoxin and peroxiredoxin-1. Except for one interactor, their binding was higher for oxidized PTEN. Using western blotting, altered binding to PTEN was confirmed for 3 selected interactors (Prdx1, Trx, and Anxa2) and DDB1 was validated as a novel interactor with unaltered binding. Our results suggest that the redox status of PTEN causes a functional variation in the PTEN interactome which is important for the cellular function of PTEN. The resin capture method developed had distinct advantages in that the redox status of PTEN could be directly controlled and measured.
Resumo:
Accurate protein structure prediction remains an active objective of research in bioinformatics. Membrane proteins comprise approximately 20% of most genomes. They are, however, poorly tractable targets of experimental structure determination. Their analysis using bioinformatics thus makes an important contribution to their on-going study. Using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we have addressed the alignment-free discrimination of membrane from non-membrane proteins. The method successfully identifies prokaryotic and eukaryotic α-helical membrane proteins at 94.4% accuracy, β-barrel proteins at 72.4% accuracy, and distinguishes assorted non-membranous proteins with 85.9% accuracy. The method here is an important potential advance in the computational analysis of membrane protein structure. It represents a useful tool for the characterisation of membrane proteins with a wide variety of potential applications.
Resumo:
MOTIVATION: There is much interest in reducing the complexity inherent in the representation of the 20 standard amino acids within bioinformatics algorithms by developing a so-called reduced alphabet. Although there is no universally applicable residue grouping, there are numerous physiochemical criteria upon which one can base groupings. Local descriptors are a form of alignment-free analysis, the efficiency of which is dependent upon the correct selection of amino acid groupings. RESULTS: Within the context of G-protein coupled receptor (GPCR) classification, an optimization algorithm was developed, which was able to identify the most efficient grouping when used to generate local descriptors. The algorithm was inspired by the relatively new computational intelligence paradigm of artificial immune systems. A number of amino acid groupings produced by this algorithm were evaluated with respect to their ability to generate local descriptors capable of providing an accurate classification algorithm for GPCRs.
Resumo:
Phosphatase and tensin homolog (PTEN) is involved in a number of different cellular processes including metabolism, apoptosis, cell proliferation and survival. It is a redox-sensitive dual-specificity protein phosphatase that acts as a tumor suppressor by negatively regulating the PI3K/Akt pathway. While direct evidence of redox regulation of PTEN downstream signaling has been reported, the effect of PTEN redox status on its protein-protein interactions is poorly understood. PTEN-GST in its reduced and a DTT-reversible H2O2-oxidized form was immobilized on a glutathione-sepharose support and incubated with cell lysate to capture interacting proteins. Captured proteins were analyzed by LC-MSMS and comparatively quantified using label-free methods. 97 Potential protein interactors were identified, including a significant number that are novel. The abundance of fourteen interactors was found to vary significantly with the redox status of PTEN. Altered binding to PTEN was confirmed by affinity pull-down and Western blotting for Prdx1, Trx, and Anxa2, while DDB1 was validated as a novel interactor with unaltered binding. These results suggest that the redox status of PTEN causes a functional variation in the PTEN interactome. The resin capture method developed had distinct advantages in that the redox status of PTEN could be directly controlled and measured.
Resumo:
The research described in this PhD thesis focuses on proteomics approaches to study the effect of oxidation on the modification status and protein-protein interactions of PTEN, a redox-sensitive phosphatase involved in a number of cellular processes including metabolism, apoptosis, cell proliferation, and survival. While direct evidence of a redox regulation of PTEN and its downstream signaling has been reported, the effect of cellular oxidative stress or direct PTEN oxidation on PTEN structure and interactome is still poorly defined. In a first study, GST-tagged PTEN was directly oxidized over a range of hypochlorous acid (HOCl) concentration, assayed for phosphatase activity, and oxidative post-translational modifications (oxPTMs) were quantified using LC-MS/MS-based label-free methods. In a second study, GSTtagged PTEN was prepared in a reduced and reversibly H2O2-oxidized form, immobilized on a resin support and incubated with HCT116 cell lysate to capture PTEN interacting proteins, which were analyzed by LC-MS/MS and comparatively quantified using label-free methods. In parallel experiments, HCT116 cells transfected with a GFP-tagged PTEN were treated with H2O2 and PTENinteracting proteins immunoprecipitated using standard methods. Several high abundance HOCl-induced oxPTMs were mapped, including those taking place at amino acids known to be important for PTEN phosphatase activity and protein-protein interactions, such as Met35, Tyr155, Tyr240 and Tyr315. A PTEN redox interactome was also characterized, which identified a number of PTEN-interacting proteins that vary with the reversible inactivation of PTEN caused by H2O2 oxidation. These included new PTEN interactors as well as the redox proteins peroxiredoxin-1 (Prdx1) and thioredoxin (Trx), which are known to be involved in the recycling of PTEN active site following H2O2-induced reversible inactivation. The results suggest that the oxidative modification of PTEN causes functional alterations in PTEN structure and interactome, with fundamental implications for the PTEN signaling role in many cellular processes, such as those involved in the pathophysiology of disease and ageing.
Resumo:
In inflammatory diseases, release of oxidants leads to oxidative damage to biomolecules. HOCl (hypochlorous acid), released by the myeloperoxidase/H2O2/Cl- system, can cause formation of phospholipid chlorohydrins, or alpha-chloro-fatty aldehydes from plasmalogens. It can attack several amino acid residues in proteins, causing post-translational oxidative modifications of proteins, but the formation of 3-chlorotyrosine is one of the most stable markers of HOCl-induced damage. Soft-ionization MS has proved invaluable for detecting the occurrence of oxidative modifications to both phospholipids and proteins, and characterizing the products generated by HOCl-induced attack. For both phospholipids and proteins, the application of advanced mass spectrometric methods such as product or precursor ion scanning and neutral loss analysis can yield information both about the specific nature of the oxidative modification and the biomolecule modified. The ideal is to be able to apply these methods to complex biological or clinical samples, to determine the site-specific modifications of particular cellular components. This is important for understanding disease mechanisms and offers potential for development of novel biomarkers of inflammatory diseases. In the present paper, we review some of the progress that has been made towards this goal.
Resumo:
Matrix application continues to be a critical step in sample preparation for matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI). Imaging of small molecules such as drugs and metabolites is particularly problematic because the commonly used washing steps to remove salts are usually omitted as they may also remove the analyte, and analyte spreading is more likely with conventional wet matrix application methods. We have developed a method which uses the application of matrix as a dry, finely divided powder, here referred to as dry matrix application, for the imaging of drug compounds. This appears to offer a complementary method to wet matrix application for the MALDI-MSI of small molecules, with the alternative matrix application techniques producing different ion profiles, and allows the visualization of compounds not observed using wet matrix application methods. We demonstrate its value in imaging clozapine from rat kidney and 4-bromophenyl-1,4-diazabicyclo(3.2.2)nonane-4-carboxylic acid from rat brain. In addition, exposure of the dry matrix coated sample to a saturated moist atmosphere appears to enhance the visualization of a different set of molecules.
Resumo:
Using current software engineering technology, the robustness required for safety critical software is not assurable. However, different approaches are possible which can help to assure software robustness to some extent. For achieving high reliability software, methods should be adopted which avoid introducing faults (fault avoidance); then testing should be carried out to identify any faults which persist (error removal). Finally, techniques should be used which allow any undetected faults to be tolerated (fault tolerance). The verification of correctness in system design specification and performance analysis of the model, are the basic issues in concurrent systems. In this context, modeling distributed concurrent software is one of the most important activities in the software life cycle, and communication analysis is a primary consideration to achieve reliability and safety. By and large fault avoidance requires human analysis which is error prone; by reducing human involvement in the tedious aspect of modelling and analysis of the software it is hoped that fewer faults will persist into its implementation in the real-time environment. The Occam language supports concurrent programming and is a language where interprocess interaction takes place by communications. This may lead to deadlock due to communication failure. Proper systematic methods must be adopted in the design of concurrent software for distributed computing systems if the communication structure is to be free of pathologies, such as deadlock. The objective of this thesis is to provide a design environment which ensures that processes are free from deadlock. A software tool was designed and used to facilitate the production of fault-tolerant software for distributed concurrent systems. Where Occam is used as a design language then state space methods, such as Petri-nets, can be used in analysis and simulation to determine the dynamic behaviour of the software, and to identify structures which may be prone to deadlock so that they may be eliminated from the design before the program is ever run. This design software tool consists of two parts. One takes an input program and translates it into a mathematical model (Petri-net), which is used for modeling and analysis of the concurrent software. The second part is the Petri-net simulator that takes the translated program as its input and starts simulation to generate the reachability tree. The tree identifies `deadlock potential' which the user can explore further. Finally, the software tool has been applied to a number of Occam programs. Two examples were taken to show how the tool works in the early design phase for fault prevention before the program is ever run.