9 resultados para multi-class classification

em Archivo Digital para la Docencia y la Investigación - Repositorio Institucional de la Universidad del País Vasco


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques-Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate and fast decoding of speech imagery from electroencephalographic (EEG) data could serve as a basis for a new generation of brain computer interfaces (BCIs), more portable and easier to use. However, decoding of speech imagery from EEG is a hard problem due to many factors. In this paper we focus on the analysis of the classification step of speech imagery decoding for a three-class vowel speech imagery recognition problem. We empirically show that different classification subtasks may require different classifiers for accurately decoding and obtain a classification accuracy that improves the best results previously published. We further investigate the relationship between the classifiers and different sets of features selected by the common spatial patterns method. Our results indicate that further improvement on BCIs based on speech imagery could be achieved by carefully selecting an appropriate combination of classifiers for the subtasks involved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. Neuronal classification has been a difficult problem because it is unclear what a neuronal cell class actually is and what are the best characteristics are to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological or molecular characteristics, when applied to selected datasets, have provided quantitative and unbiased identification of distinct neuronal subtypes. However, better and more robust classification methods are needed for increasingly complex and larger datasets. We explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. In fact, using a combined anatomical/physiological dataset, our algorithm differentiated parvalbumin from somatostatin interneurons in 49 out of 50 cases. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Familial hypercholesterolemia (FH) is a common autosomal codominant disease with a frequency of 1:500 individuals in its heterozygous form. The genetic basis of FH is most commonly mutations within the LDLR gene. Assessing the pathogenicity of LDLR variants is particularly important to give a patient a definitive diagnosis of FH. Current studies of LDLR activity ex vivo are based on the analysis of I-125-labeled lipoproteins (reference method) or fluorescent-labelled LDL. The main purpose of this study was to compare the effectiveness of these two methods to assess LDLR functionality in order to validate a functional assay to analyse LDLR mutations. LDLR activity of different variants has been studied by flow cytometry using FITC-labelled LDL and compared with studies performed previously with I-125-labeled lipoproteins. Flow cytometry results are in full agreement with the data obtained by the I-125 methodology. Additionally confocal microscopy allowed the assignment of different class mutation to the variants assayed. Use of fluorescence yielded similar results than I-125-labeled lipoproteins concerning LDLR activity determination, and also allows class mutation classification. The use of FITC-labelled LDL is easier in handling and disposal, cheaper than radioactivity and can be routinely performed by any group doing LDLR functional validations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.