12 resultados para semi-supervised machine learning

em Archivo Digital para la Docencia y la Investigación - Repositorio Institucional de la Universidad del País Vasco


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Study of emotions in human-computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. Neuronal classification has been a difficult problem because it is unclear what a neuronal cell class actually is and what are the best characteristics are to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological or molecular characteristics, when applied to selected datasets, have provided quantitative and unbiased identification of distinct neuronal subtypes. However, better and more robust classification methods are needed for increasingly complex and larger datasets. We explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. In fact, using a combined anatomical/physiological dataset, our algorithm differentiated parvalbumin from somatostatin interneurons in 49 out of 50 cases. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deep neural networks have recently gained popularity for improv- ing state-of-the-art machine learning algorithms in diverse areas such as speech recognition, computer vision and bioinformatics. Convolutional networks especially have shown prowess in visual recognition tasks such as object recognition and detection in which this work is focused on. Mod- ern award-winning architectures have systematically surpassed previous attempts at tackling computer vision problems and keep winning most current competitions. After a brief study of deep learning architectures and readily available frameworks and libraries, the LeNet handwriting digit recognition network study case is developed, and lastly a deep learn- ing network for playing simple videogames is reviewed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DNA microarray, or DNA chip, is a technology that allows us to obtain the expression level of many genes in a single experiment. The fact that numerical expression values can be easily obtained gives us the possibility to use multiple statistical techniques of data analysis. In this project microarray data is obtained from Gene Expression Omnibus, the repository of National Center for Biotechnology Information (NCBI). Then, the noise is removed and data is normalized, also we use hypothesis tests to find the most relevant genes that may be involved in a disease and use machine learning methods like KNN, Random Forest or Kmeans. For performing the analysis we use Bioconductor, packages in R for the analysis of biological data, and we conduct a case study in Alzheimer disease. The complete code can be found in https://github.com/alberto-poncelas/ bioc-alzheimer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En este proyecto se analiza y compara el comportamiento del algoritmo CTC diseñado por el grupo de investigación ALDAPA usando bases de datos muy desbalanceadas. En concreto se emplea un conjunto de bases de datos disponibles en el sitio web asociado al proyecto KEEL (http://sci2s.ugr.es/keel/index.php) y que han sido ya utilizadas con diferentes algoritmos diseñados para afrontar el problema de clases desbalanceadas (Class imbalance problem) en el siguiente trabajo: A. Fernandez, S. García, J. Luengo, E. Bernadó-Mansilla, F. Herrera, "Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study". IEEE Transactions on Evolutionary Computation 14:6 (2010) 913-941, http://dx.doi.org/10.1109/TEVC.2009.2039140 Las bases de datos (incluidas las muestras del cross-validation), junto con los resultados obtenidos asociados a la experimentación de este trabajo se pueden encontrar en un sitio web creado a tal efecto: http://sci2s.ugr.es/gbml/. Esto hace que los resultados del CTC obtenidos con estas muestras sean directamente comparables con los obtenidos por todos los algoritmos obtenidos en este trabajo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work presented here is part of a larger study to identify novel technologies and biomarkers for early Alzheimer disease (AD) detection and it focuses on evaluating the suitability of a new approach for early AD diagnosis by non-invasive methods. The purpose is to examine in a pilot study the potential of applying intelligent algorithms to speech features obtained from suspected patients in order to contribute to the improvement of diagnosis of AD and its degree of severity. In this sense, Artificial Neural Networks (ANN) have been used for the automatic classification of the two classes (AD and control subjects). Two human issues have been analyzed for feature selection: Spontaneous Speech and Emotional Response. Not only linear features but also non-linear ones, such as Fractal Dimension, have been explored. The approach is non invasive, low cost and without any side effects. Obtained experimental results were very satisfactory and promising for early diagnosis and classification of AD patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Singular Value Decomposition (SVD) is a key linear algebraic operation in many scientific and engineering applications. In particular, many computational intelligence systems rely on machine learning methods involving high dimensionality datasets that have to be fast processed for real-time adaptability. In this paper we describe a practical FPGA (Field Programmable Gate Array) implementation of a SVD processor for accelerating the solution of large LSE problems. The design approach has been comprehensive, from the algorithmic refinement to the numerical analysis to the customization for an efficient hardware realization. The processing scheme rests on an adaptive vector rotation evaluator for error regularization that enhances convergence speed with no penalty on the solution accuracy. The proposed architecture, which follows a data transfer scheme, is scalable and based on the interconnection of simple rotations units, which allows for a trade-off between occupied area and processing acceleration in the final implementation. This permits the SVD processor to be implemented both on low-cost and highend FPGAs, according to the final application requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The CTC algorithm, Consolidated Tree Construction algorithm, is a machine learning paradigm that was designed to solve a class imbalance problem, a fraud detection problem in the area of car insurance [1] where, besides, an explanation about the classification made was required. The algorithm is based on a decision tree construction algorithm, in this case the well-known C4.5, but it extracts knowledge from data using a set of samples instead of a single one as C4.5 does. In contrast to other methodologies based on several samples to build a classifier, such as bagging, the CTC builds a single tree and as a consequence, it obtains comprehensible classifiers. The main motivation of this implementation is to make public and available an implementation of the CTC algorithm. With this purpose we have implemented the algorithm within the well-known WEKA data mining environment http://www.cs.waikato.ac.nz/ml/weka/). WEKA is an open source project that contains a collection of machine learning algorithms written in Java for data mining tasks. J48 is the implementation of C4.5 algorithm within the WEKA package. We called J48Consolidated to the implementation of CTC algorithm based on the J48 Java class.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.