2 resultados para reconhecimento molecular
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
The plant metabolism consists of a complex network of physical and chemical events resulting in photosynthesis, respiration, synthesis and degradation of organic compounds. This is only possible due to the different kinds of responses to many environmental variations that a plant could be subject through evolution, leading also to conquering new surroundings. The glyoxylate cycle is a metabolic pathway found in glyoxysomes plant, which has unique role in the seedling establishment. Considered as a variation of the citric acid cycle, it uses an acetyl coenzyme A molecule, derived from lipids beta-oxidation to synthesize compounds which are used in carbohydrate synthesis. The Malate synthase (MLS) and Isocitrate lyase (ICL) enzyme of this cycle are unique and essential in regulating the biosynthesis of carbohydrates. Because of the absence of decarboxylation steps as rate-limiting steps, detailed studies of molecular phylogeny and evolution of these proteins enables the elucidation of the effects of this route presence in the evolutionary processes involved in their distribution across the genome from different plant species. Therefore, the aim of this study was to establish a relationship between the molecular evolution of the characteristics of enzymes from the glyoxylate cycle (isocitrate lyase and malate synthase) and their molecular phylogeny, among green plants (Viridiplantae). For this, amino acid and nucleotide sequences were used, from online repositories as UniProt and Genbank. Sequences were aligned and then subjected to an analysis of the best-fit substitution models. The phylogeny was rebuilt by distance methods (neighbor-joining) and discrete methods (maximum likelihood, maximum parsimony and Bayesian analysis). The identification of structural patterns in the evolution of the enzymes was made through homology modeling and structure prediction from protein sequences. Based on comparative analyzes of in silico models and from the results of phylogenetic inferences, both enzymes show significant structure conservation and their topologies in agreement with two processes of selection and specialization of the genes. Thus, confirming the relevance of new studies to elucidate the plant metabolism from an evolutionary perspective