887 resultados para Support Vector machines
Resumo:
In this project the Pattern Recognition Problem is approached with the Support Vector Machines (SVM) technique, a binary method of classification that provides the best solution separating the data in the better way with a hiperplan and an extension of the input space dimension, as a Machine Learning solution. The system aims to classify two classes of pixels chosen by the user in the interface in the interest selection phase and in the background selection phase, generating all the data to be used in the LibSVM library, a library that implements the SVM, illustrating the library operation in a casual way. The data provided by the interface is organized in three types, RGB (Red, Green and Blue color system), texture (calculated) or RGB + texture. At last the project showed successful results, where the classification of the image pixels was showed as been from one of the two classes, from the interest selection area or from the background selection area. The simplest user view of results classification is the RGB type of data arrange, because it’s the most concrete way of data acquisition
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Traditional supervised data classification considers only physical features (e. g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Resumo:
In this article we propose an efficient and accurate method for fault location in underground distribution systems by means of an Optimum-Path Forest (OPF) classifier. We applied the time domains reflectometry method for signal acquisition, which was further analyzed by OPF and several other well-known pattern recognition techniques. The results indicated that OPF and support vector machines outperformed artificial neural networks and a Bayesian classifier, but OPF was much more efficient than all classifiers for training, and the second fastest for classification.
Resumo:
Permitida la difusión del código bajo los términos de la licencia BSD de tres cláusulas.
Resumo:
[EN]This work makes an extensive experimental study of smile detection testing the Local Binary Patterns (LBP) combined with self similarity (LAC) as main descriptors of the image, along with the powerful Support Vector Machines classifier. Results show that error rates can be acceptable and the self similarity approach for the detection of smiles is suitable for real-time interaction, although there is still room for improvement.
Resumo:
In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.
Resumo:
Il lavoro è parte integrante di un progetto di ricerca del Ministero della Salute ed è stato sviluppato presso la Fisica Sanitaria ed il reparto di Radioterapia Oncologica dell’Azienda Ospedaliero Universitaria di Modena. L’obiettivo è la realizzazione di modelli predittivi e di reti neurali per tecniche di warping in ambito clinico. Modifiche volumetrico-spaziali di organi a rischio e target tumorali, durante trattamenti tomoterapici, possono alterare la distribuzione di dose rispetto ai constraints delineati in fase di pianificazione. Metodologie radioterapiche per la valutazione di organ motion e algoritmi di registrazione ibrida permettono di generare automaticamente ROI deformate e quantificare la divergenza dal piano di trattamento iniziale. Lo studio si focalizzata sulle tecniche di Adaptive Radiation Therapy (ART) mediante la meta-analisi di 51 pazienti sottoposti a trattamento mediante Tomotherapy. Studiando il comportamento statistico del campione, sono state generate analisi predittive per quantificare in tempo reale divergenze anatomico dosimetriche dei pazienti rispetto al piano originale e prevedere la loro ripianificazione terapeutica. I modelli sono stati implementati in MATLAB, mediante Cluster Analysis e Support Vector Machines; l’analisi del dataset ha evidenziato il valore aggiunto apportabile dagli algoritmi di deformazione e dalle tecniche di ART. La specificità e sensibilità della metodica è stata validata mediante l’utilizzo di analisi ROC. Gli sviluppi del presente lavoro hanno aperto una prospettiva di ricerca e utilizzo in trattamenti multicentrici e per la valutazione di efficacia ed efficienza delle nuove tecnologie in ambito RT.
Resumo:
La sonnolenza durante la guida è un problema di notevole entità e rappresenta la causa di numerosi incidenti stradali. Rilevare i segnali che precedono la sonnolenza è molto importante in quanto, é possibile mettere in guardia i conducenti dei mezzi adottando misure correttive e prevenendo gli incidenti. Attualmente non esiste una metodica efficace in grado di misurare la sonnolenza in maniera affidabile, e che risulti di facile applicazione. La si potrebbe riconoscere da mutazioni di tipo comportamentale del soggetto come: presenza di sbadigli, chiusura degli occhi o movimenti di caduta della testa. I soggetti in stato di sonnolenza presentano dei deficit nelle loro capacità cognitive e psicomotorie. Lo stesso vale per i conducenti i quali, quando sono mentalmente affaticati non sono in grado di mantenere un elevato livello di attenzione. I tempi di reazione si allungano e la capacità decisionale si riduce. Ciò è associato a cambiamenti delle attività delta, theta e alfa di un tracciato EEG. Tramite lo studio dei segnali EEG è possibile ricavare informazioni utili sullo stato di veglia e sull'insorgenza del sonno. Come strumento di classificazione per elaborare e interpretare tali segnali, in questo studio di tesi sono state utilizzate le support vector machines(SVM). Le SVM rappresentano un insieme di metodi di apprendimento che permettono la classicazione di determinati pattern. Necessitano di un set di dati di training per creare un modello che viene testato su un diverso insieme di dati per valutarne le prestazioni. L'obiettivo è quello di classicare in modo corretto i dati di input. Una caratteristica delle SVM è una buona capacità di generalizzare indipendentemente dalla dimensione dello spazio di input. Questo le rende particolarmente adatte per l'analisi di dati biomedici come le registrazioni EEG multicanale caratterizzate da una certa ridondanza intrinseca dei dati. Nonostante sia abbastanza semplice distinguere lo stato di veglia dallo stato di sonno, i criteri per valutarne la transizione non sono ancora stati standardizzati. Sicuramente l'attività elettro-oculografica (EOG) riesce a dare informazioni utili riguardo l'insorgenza del sonno, in quanto essa è caratterizzata dalla presenza di movimenti oculari lenti rotatori (Slow Eye Movements, SEM) tipici della transizione dalla veglia alla sonno. L'attività SEM inizia prima dello stadio 1 del sonno, continua lungo tutta la durata dello stesso stadio 1, declinando progressivamente nei primi minuti dello stadio 2 del sonno fino a completa cessazione. In questo studio, per analizzare l'insorgere della sonnolenza nei conducenti di mezzi, sono state utilizzate registrazioni provenienti da un solo canale EEG e da due canali EOG. Utilizzare un solo canale EEG impedisce una definizione affidabile dell'ipnogramma da parte dei clinici. Quindi l'obiettivo che ci si propone, in primo luogo, è quello di realizzare un classificatore del sonno abbastanza affidabile, a partire da un solo canale EEG, al fine di verificare come si dispongono i SEM a cavallo dell'addormentamento. Quello che ci si aspetta è che effettivamente l'insorgere della sonnolenza sia caratterizzata da una massiccia presenza di SEM.
Resumo:
Intelligent Transport Systems (ITS) consists in the application of ICT to transport to offer new and improved services to the mobility of people and freights. While using ITS, travellers produce large quantities of data that can be collected and analysed to study their behaviour and to provide information to decision makers and planners. The thesis proposes innovative deployments of classification algorithms for Intelligent Transport System with the aim to support the decisions on traffic rerouting, bus transport demand and behaviour of two wheelers vehicles. The first part of this work provides an overview and a classification of a selection of clustering algorithms that can be implemented for the analysis of ITS data. The first contribution of this thesis is an innovative use of the agglomerative hierarchical clustering algorithm to classify similar travels in terms of their origin and destination, together with the proposal for a methodology to analyse drivers’ route choice behaviour using GPS coordinates and optimal alternatives. The clusters of repetitive travels made by a sample of drivers are then analysed to compare observed route choices to the modelled alternatives. The results of the analysis show that drivers select routes that are more reliable but that are more expensive in terms of travel time. Successively, different types of users of a service that provides information on the real time arrivals of bus at stop are classified using Support Vector Machines. The results shows that the results of the classification of different types of bus transport users can be used to update or complement the census on bus transport flows. Finally, the problem of the classification of accidents made by two wheelers vehicles is presented together with possible future application of clustering methodologies aimed at identifying and classifying the different types of accidents.
Resumo:
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Resumo:
The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.
Resumo:
Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.