861 resultados para Support vectors machine
Resumo:
Autism Spectrum Disorder (ASD) is diagnosed on the basis of behavioral symptoms, but cognitive abilities may also be useful in characterizing individuals with ASD. One hundred seventy-eight high-functioning male adults, half with ASD and half without, completed tasks assessing IQ, a broad range of cognitive skills, and autistic and comorbid symptomatology. The aims of the study were, first, to determine whether significant differences existed between cases and controls on cognitive tasks, and whether cognitive profiles, derived using a multivariate classification method with data from multiple cognitive tasks, could distinguish between the two groups. Second, to establish whether cognitive skill level was correlated with degree of autistic symptom severity, and third, whether cognitive skill level was correlated with degree of comorbid psychopathology. Fourth, cognitive characteristics of individuals with Asperger Syndrome (AS) and high-functioning autism (HFA) were compared. After controlling for IQ, ASD and control groups scored significantly differently on tasks of social cognition, motor performance, and executive function (P's < 0.05). To investigate cognitive profiles, 12 variables were entered into a support vector machine (SVM), which achieved good classification accuracy (81%) at a level significantly better than chance (P < 0.0001). After correcting for multiple correlations, there were no significant associations between cognitive performance and severity of either autistic or comorbid symptomatology. There were no significant differences between AS and HFA groups on the cognitive tasks. Cognitive classification models could be a useful aid to the diagnostic process when used in conjunction with other data sources-including clinical history.
Resumo:
An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.
An LDA and probability-based classifier for the diagnosis of Alzheimer's Disease from structural MRI
Resumo:
In this paper a custom classification algorithm based on linear discriminant analysis and probability-based weights is implemented and applied to the hippocampus measurements of structural magnetic resonance images from healthy subjects and Alzheimer’s Disease sufferers; and then attempts to diagnose them as accurately as possible. The classifier works by classifying each measurement of a hippocampal volume as healthy controlsized or Alzheimer’s Disease-sized, these new features are then weighted and used to classify the subject as a healthy control or suffering from Alzheimer’s Disease. The preliminary results obtained reach an accuracy of 85.8% and this is a similar accuracy to state-of-the-art methods such as a Naive Bayes classifier and a Support Vector Machine. An advantage of the method proposed in this paper over the aforementioned state of the art classifiers is the descriptive ability of the classifications it produces. The descriptive model can be of great help to aid a doctor in the diagnosis of Alzheimer’s Disease, or even further the understand of how Alzheimer’s Disease affects the hippocampus.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.
Resumo:
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
Resumo:
This thesis presents a system to recognise and classify road and traffic signs for the purpose of developing an inventory of them which could assist the highway engineers’ tasks of updating and maintaining them. It uses images taken by a camera from a moving vehicle. The system is based on three major stages: colour segmentation, recognition, and classification. Four colour segmentation algorithms are developed and tested. They are a shadow and highlight invariant, a dynamic threshold, a modification of de la Escalera’s algorithm and a Fuzzy colour segmentation algorithm. All algorithms are tested using hundreds of images and the shadow-highlight invariant algorithm is eventually chosen as the best performer. This is because it is immune to shadows and highlights. It is also robust as it was tested in different lighting conditions, weather conditions, and times of the day. Approximately 97% successful segmentation rate was achieved using this algorithm.Recognition of traffic signs is carried out using a fuzzy shape recogniser. Based on four shape measures - the rectangularity, triangularity, ellipticity, and octagonality, fuzzy rules were developed to determine the shape of the sign. Among these shape measures octangonality has been introduced in this research. The final decision of the recogniser is based on the combination of both the colour and shape of the sign. The recogniser was tested in a variety of testing conditions giving an overall performance of approximately 88%.Classification was undertaken using a Support Vector Machine (SVM) classifier. The classification is carried out in two stages: rim’s shape classification followed by the classification of interior of the sign. The classifier was trained and tested using binary images in addition to five different types of moments which are Geometric moments, Zernike moments, Legendre moments, Orthogonal Fourier-Mellin Moments, and Binary Haar features. The performance of the SVM was tested using different features, kernels, SVM types, SVM parameters, and moment’s orders. The average classification rate achieved is about 97%. Binary images show the best testing results followed by Legendre moments. Linear kernel gives the best testing results followed by RBF. C-SVM shows very good performance, but ?-SVM gives better results in some case.
Resumo:
A resistência a múltiplos fármacos é um grande problema na terapia anti-cancerígena, sendo a glicoproteína-P (P-gp) uma das responsáveis por esta resistência. A realização deste trabalho incidiu principalmente no desenvolvimento de modelos matemáticos/estatísticos e “químicos”. Para os modelos matemáticos/estatísticos utilizamos métodos de Machine Learning como o Support Vector Machine (SVM) e o Random Forest, (RF) em relação aos modelos químicos utilizou-se farmacóforos. Os métodos acima mencionados foram aplicados a diversas proteínas P-gp, p53 e complexo p53-MDM2, utilizando duas famílias: as pifitrinas para a p53 e flavonóides para P-gp e, em menor medida, um grupo diversificado de moléculas de diversas famílias químicas. Nos modelos obtidos pelo SVM quando aplicados à P-gp e à família dos flavonóides, obtivemos bons valores através do kernel Radial Basis Function (RBF), com precisão de conjunto de treino de 94% e especificidade de 96%. Quanto ao conjunto de teste com previsão de 70% e especificidade de 67%, sendo que o número de falsos negativos foi o mais baixo comparativamente aos restantes kernels. Aplicando o RF à família dos flavonóides verificou-se que o conjunto de treino apresenta 86% de precisão e uma especificidade de 90%, quanto ao conjunto de teste obtivemos uma previsão de 70% e uma especificidade de 60%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Repetindo o procedimento anterior (RF) e utilizando um total de 63 descritores, os resultados apresentaram valores inferiores obtendo-se para o conjunto de treino 79% de precisão e 82% de especificidade. Aplicando o modelo ao conjunto de teste obteve-se 70% de previsão e 60% de especificidade. Comparando os dois métodos, escolhemos o método SVM com o kernel RBF como modelo que nos garante os melhores resultados de classificação. Aplicamos o método SVM à P-gp e a um conjunto de moléculas não flavonóides que são transportados pela P-gp, obteve-se bons valores através do kernel RBF, com precisão de conjunto de treino de 95% e especificidade de 93%. Quanto ao conjunto de teste, obtivemos uma previsão de 70% e uma especificidade de 69%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Aplicou-se o método do farmacóforo a três alvos, sendo estes, um conjunto de inibidores flavonóides e de substratos não flavonóides para a P-gp, um grupo de piftrinas para a p53 e um conjunto diversificado de estruturas para a ligação da p53-MDM2. Em cada um dos quatro modelos de farmacóforos obtidos identificou-se três características, sendo que as características referentes ao anel aromático e ao dador de ligações de hidrogénio estão presentes em todos os modelos obtidos. Realizando o rastreio em diversas bases de dados utilizando os modelos, obtivemos hits com uma grande diversidade estrutural.
Resumo:
The use of the maps obtained from remote sensing orbital images submitted to digital processing became fundamental to optimize conservation and monitoring actions of the coral reefs. However, the accuracy reached in the mapping of submerged areas is limited by variation of the water column that degrades the signal received by the orbital sensor and introduces errors in the final result of the classification. The limited capacity of the traditional methods based on conventional statistical techniques to solve the problems related to the inter-classes took the search of alternative strategies in the area of the Computational Intelligence. In this work an ensemble classifiers was built based on the combination of Support Vector Machines and Minimum Distance Classifier with the objective of classifying remotely sensed images of coral reefs ecosystem. The system is composed by three stages, through which the progressive refinement of the classification process happens. The patterns that received an ambiguous classification in a certain stage of the process were revalued in the subsequent stage. The prediction non ambiguous for all the data happened through the reduction or elimination of the false positive. The images were classified into five bottom-types: deep water; under-water corals; inter-tidal corals; algal and sandy bottom. The highest overall accuracy (89%) was obtained from SVM with polynomial kernel. The accuracy of the classified image was compared through the use of error matrix to the results obtained by the application of other classification methods based on a single classifier (neural network and the k-means algorithm). In the final, the comparison of results achieved demonstrated the potential of the ensemble classifiers as a tool of classification of images from submerged areas subject to the noise caused by atmospheric effects and the water column
Resumo:
The automatic speech recognition by machine has been the target of researchers in the past five decades. In this period have been numerous advances, such as in the field of recognition of isolated words (commands), which has very high rates of recognition, currently. However, we are still far from developing a system that could have a performance similar to the human being (automatic continuous speech recognition). One of the great challenges of searches for continuous speech recognition is the large amount of pattern. The modern languages such as English, French, Spanish and Portuguese have approximately 500,000 words or patterns to be identified. The purpose of this study is to use smaller units than the word such as phonemes, syllables and difones units as the basis for the speech recognition, aiming to recognize any words without necessarily using them. The main goal is to reduce the restriction imposed by the excessive amount of patterns. In order to validate this proposal, the system was tested in the isolated word recognition in dependent-case. The phonemes characteristics of the Brazil s Portuguese language were used to developed the hierarchy decision system. These decisions are made through the use of neural networks SVM (Support Vector Machines). The main speech features used were obtained from the Wavelet Packet Transform. The descriptors MFCC (Mel-Frequency Cepstral Coefficient) are also used in this work. It was concluded that the method proposed in this work, showed good results in the steps of recognition of vowels, consonants (syllables) and words when compared with other existing methods in literature
Resumo:
The Support Vector Machines (SVM) has attracted increasing attention in machine learning area, particularly on classification and patterns recognition. However, in some cases it is not easy to determinate accurately the class which given pattern belongs. This thesis involves the construction of a intervalar pattern classifier using SVM in association with intervalar theory, in order to model the separation of a pattern set between distinct classes with precision, aiming to obtain an optimized separation capable to treat imprecisions contained in the initial data and generated during the computational processing. The SVM is a linear machine. In order to allow it to solve real-world problems (usually nonlinear problems), it is necessary to treat the pattern set, know as input set, transforming from nonlinear nature to linear problem. The kernel machines are responsible to do this mapping. To create the intervalar extension of SVM, both for linear and nonlinear problems, it was necessary define intervalar kernel and the Mercer s theorem (which caracterize a kernel function) to intervalar function
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification
Sistema inteligente para detecção de manchas de óleo na superfície marinha através de imagens de SAR
Resumo:
Oil spill on the sea, accidental or not, generates enormous negative consequences for the affected area. The damages are ambient and economic, mainly with the proximity of these spots of preservation areas and/or coastal zones. The development of automatic techniques for identification of oil spots on the sea surface, captured through Radar images, assist in a complete monitoring of the oceans and seas. However spots of different origins can be visualized in this type of imaging, which is a very difficult task. The system proposed in this work, based on techniques of digital image processing and artificial neural network, has the objective to identify the analyzed spot and to discern between oil and other generating phenomena of spot. Tests in functional blocks that compose the proposed system allow the implementation of different algorithms, as well as its detailed and prompt analysis. The algorithms of digital image processing (speckle filtering and gradient), as well as classifier algorithms (Multilayer Perceptron, Radial Basis Function, Support Vector Machine and Committe Machine) are presented and commented.The final performance of the system, with different kind of classifiers, is presented by ROC curve. The true positive rates are considered agreed with the literature about oil slick detection through SAR images presents
Resumo:
This paper presents two approaches of Artificial Immune System for Pattern Recognition (CLONALG and Parallel AIRS2) to classify automatically the well drilling operation stages. The classification is carried out through the analysis of some mud-logging parameters. In order to validate the performance of AIS techniques, the results were compared with others classification methods: neural network, support vector machine and lazy learning.