882 resultados para Supervised and Unsupervised Classification


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision trees and self organising feature maps (SOFM) are frequently used to identify groups. This research aims to compare the similarities between any groupings found between supervised (Classification and Regression Trees - CART) and unsupervised classification (SOFM), and to identify insights into factors associated with doctor-patient stability. Although CART and SOFM uses different learning paradigms to produce groupings, both methods came up with many similar groupings. Both techniques showed that self perceived health and age are important indicators of stability. In addition, this study has indicated profiles of patients that are at risk which might be interesting to general practitioners.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new semi-supervised method to effectively improve traffic classification performance when few supervised training data are available. Existing semi supervised methods label a large proportion of testing flows as unknown flows due to limited supervised information, which severely affects the classification performance. To address this problem, we propose to incorporate flow correlation into both training and testing stages. At the training stage, we make use of flow correlation to extend the supervised data set by automatically labeling unlabeled flows according to their correlation to the pre-labeled flows. Consequently, the traffic classifier has better performance due to the extended size and quality of the supervised data sets. At the testing stage, the correlated flows are identified and classified jointly by combining their individual predictions, so as to further boost the classification accuracy. The empirical study on the real-world network traffic shows that the proposed method outperforms the state-of-the-art flow statistical feature based classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new semi-supervised method to effectively improve traffic classification performance when very few supervised training data are available. Existing semisupervised methods label a large proportion of testing flows as unknown flows due to limited supervised information, which severely affects the classification performance. To address this problem, we propose to incorporate flow correlation into both training and testing stages. At the training stage, we make use of flow correlation to extend the supervised data set by automatically labelling unlabelled flows according to their correlation to the pre-labelled flows. Consequently, a traffic classifier achieves excellent performance because of the enhanced training data set. At the testing stage, the correlated flows are identified and classified jointly by combining their individual predictions, so as to further boost the classification accuracy. The empirical study on the real-world network traffic shows that the proposed method significantly outperforms the state-of-the-art flow statistical feature based classification methods. Copyright © 2012 Inderscience Enterprises Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper focuses on optimisation algorithms inspired by swarm intelligence for satellite image classification from high resolution satellite multi- spectral images. Amongst the multiple benefits and uses of remote sensing, one of the most important has been its use in solving the problem of land cover mapping. As the frontiers of space technology advance, the knowledge derived from the satellite data has also grown in sophistication. Image classification forms the core of the solution to the land cover mapping problem. No single classifier can prove to satisfactorily classify all the basic land cover classes of an urban region. In both supervised and unsupervised classification methods, the evolutionary algorithms are not exploited to their full potential. This work tackles the land map covering by Ant Colony Optimisation (ACO) and Particle Swarm Optimisation (PSO) which are arguably the most popular algorithms in this category. We present the results of classification techniques using swarm intelligence for the problem of land cover mapping for an urban region. The high resolution Quick-bird data has been used for the experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A malária é uma doença infecciosa complexa, que resulta do “vírus” plasmodium, e manifesta-se sob cinco tipos distintos de espécies protozoários (plasmodium vivax, plasmodium ovale, plasmodium falciparum, plasmodium malariae e plasmodium Knowlesi), atacando sobretudo os glóbulos vermelhos. Considerada a quinta maior causa de morte por doenças infecciosas em todo o mundo após doenças respiratórias, VIH/SIDA, doenças diarreicas e tuberculose, no continente africano, a malária é considerada a segunda causa do aumento da mortalidade, após VIH/SIDA. No caso particular da Guiné-Bissau, esta constitui a principal causa do incremento da morbilidade e da mortalidade naquele país, onde, em 2012 foram notificados 129.684 casos de paludismo, dos quais 370 resultaram em óbitos. Partindo da realidade acima constatada, em particular, da complexidade e o impacto global da doença associada a uma forte mortalidade e morbilidade, concluiu-se ser necessário abordar esta temática, utilizando os SIG e a DR no sentido de determinar as regiões de elevado risco. Entendeu-se serem necessárias novas abordagens e novas ferramentas de análise dos dados epidemiológicos e consequentemente novas metodologias que possibilitem a determinação de áreas de risco por malária. O presente estudo, pretende demonstrar o papel dos SIG e DR na determinação das regiões de risco por malária. A metodologia utilizada centrou-se numa abordagem quantitativa baseada na hierarquização das variáveis. Pretende-se, assim abordar os impactos da malária e simultaneamente demonstrar as potencialidades dos SIG e das ferramentas de Análise Espacial no estudo da disseminação da mesma na Guiné-Bissau.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given a set of images of scenes containing different object categories (e.g. grass, roads) our objective is to discover these objects in each image, and to use this object occurrences to perform a scene classification (e.g. beach scene, mountain scene). We achieve this by using a supervised learning algorithm able to learn with few images to facilitate the user task. We use a probabilistic model to recognise the objects and further we classify the scene based on their object occurrences. Experimental results are shown and evaluated to prove the validity of our proposal. Object recognition performance is compared to the approaches of He et al. (2004) and Marti et al. (2001) using their own datasets. Furthermore an unsupervised method is implemented in order to evaluate the advantages and disadvantages of our supervised classification approach versus an unsupervised one

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper is to show that Dempster-Shafer evidence theory may be successfully applied to unsupervised classification in multisource remote sensing. Dempster-Shafer formulation allows for consideration of unions of classes, and to represent both imprecision and uncertainty, through the definition of belief and plausibility functions. These two functions, derived from mass function, are generally chosen in a supervised way. In this paper, the authors describe an unsupervised method, based on the comparison of monosource classification results, to select the classes necessary for Dempster-Shafer evidence combination and to define their mass functions. Data fusion is then performed, discarding invalid clusters (e.g. corresponding to conflicting information) thank to an iterative process. Unsupervised multisource classification algorithm is applied to MAC-Europe'91 multisensor airborne campaign data collected over the Orgeval French site. Classification results using different combinations of sensors (TMS and AirSAR) or wavelengths (L- and C-bands) are compared. Performance of data fusion is evaluated in terms of identification of land cover types. The best results are obtained when all three data sets are used. Furthermore, some other combinations of data are tried, and their ability to discriminate between the different land cover types is quantified

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The early stages of dieting to lose weight have been associated with neuro-psychological impairments. Previous work has not elucidated whether these impairments are a function solely of unsupported or supported dieting. Raised cortico-steroid levels have been implicated as a possible causal mechanism. Healthy, overweight, pre-menopausal women were randomised to one of three conditions in which they dieted either as part of a commercially available weight loss group, dieted without any group support or acted as non-dieting controls for 8 weeks. Testing occurred at baseline and at 1, 4 and 8 weeks post baseline. During each session, participants completed measures of simple reaction time, motor speed, vigilance, immediate verbal recall, visuo-spatial processing and (at Week 1 only) executive function. Cortisol levels were gathered at the beginning and 30 min into each test session, via saliva samples. Also, food intake was self-recorded prior to each session and fasting body weight and percentage body fat were measured at each session. Participants in the unsupported diet condition displayed poorer vigilance performance (p=0.001) and impaired executive planning function (p=0.013) (along with a marginally significant trend for poorer visual recall (p=0.089)) after 1 week of dieting. No such impairments were observed in the other two groups. In addition, the unsupported dieters experienced a significant rise in salivary cortisol levels after 1 week of dieting (p<0.001). Both dieting groups lost roughly the same amount of body mass (p=0.011) over the course of the 8 weeks of dieting, although only the unsupported dieters experienced a significant drop in percentage body fat over the course of dieting (p=0.016). The precise causal nature of the relationship between stress, cortisol, unsupported dieting and cognitive function is, however, uncertain and should be the focus of further research. © 2005 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Comments constitute an important part of Web 2.0. In this paper, we consider comments on news articles. To simplify the task of relating the comment content to the article content the comments are about, we propose the idea of showing comments alongside article segments and explore automatic mapping of comments to article segments. This task is challenging because of the vocabulary mismatch between the articles and the comments. We present supervised and unsupervised techniques for aligning comments to segments the of article the comments are about. More specifically, we provide a novel formulation of supervised alignment problem using the framework of structured classification. Our experimental results show that structured classification model performs better than unsupervised matching and binary classification model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient.