3 resultados para Random Forests Classifier

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The power-law size distributions obtained experimentally for neuronal avalanches are an important evidence of criticality in the brain. This evidence is supported by the fact that a critical branching process exhibits the same exponent t~3=2. Models at criticality have been employed to mimic avalanche propagation and explain the statistics observed experimentally. However, a crucial aspect of neuronal recordings has been almost completely neglected in the models: undersampling. While in a typical multielectrode array hundreds of neurons are recorded, in the same area of neuronal tissue tens of thousands of neurons can be found. Here we investigate the consequences of undersampling in models with three different topologies (two-dimensional, small-world and random network) and three different dynamical regimes (subcritical, critical and supercritical). We found that undersampling modifies avalanche size distributions, extinguishing the power laws observed in critical systems. Distributions from subcritical systems are also modified, but the shape of the undersampled distributions is more similar to that of a fully sampled system. Undersampled supercritical systems can recover the general characteristics of the fully sampled version, provided that enough neurons are measured. Undersampling in two-dimensional and small-world networks leads to similar effects, while the random network is insensitive to sampling density due to the lack of a well-defined neighborhood. We conjecture that neuronal avalanches recorded from local field potentials avoid undersampling effects due to the nature of this signal, but the same does not hold for spike avalanches. We conclude that undersampled branching-process-like models in these topologies fail to reproduce the statistics of spike avalanches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The pattern classification is one of the machine learning subareas that has the most outstanding. Among the various approaches to solve pattern classification problems, the Support Vector Machines (SVM) receive great emphasis, due to its ease of use and good generalization performance. The Least Squares formulation of SVM (LS-SVM) finds the solution by solving a set of linear equations instead of quadratic programming implemented in SVM. The LS-SVMs provide some free parameters that have to be correctly chosen to achieve satisfactory results in a given task. Despite the LS-SVMs having high performance, lots of tools have been developed to improve them, mainly the development of new classifying methods and the employment of ensembles, in other words, a combination of several classifiers. In this work, our proposal is to use an ensemble and a Genetic Algorithm (GA), search algorithm based on the evolution of species, to enhance the LSSVM classification. In the construction of this ensemble, we use a random selection of attributes of the original problem, which it splits the original problem into smaller ones where each classifier will act. So, we apply a genetic algorithm to find effective values of the LS-SVM parameters and also to find a weight vector, measuring the importance of each machine in the final classification. Finally, the final classification is obtained by a linear combination of the decision values of the LS-SVMs with the weight vector. We used several classification problems, taken as benchmarks to evaluate the performance of the algorithm and compared the results with other classifiers