918 resultados para Supervised classifiers
Resumo:
Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output. © 2012 IEEE.
Resumo:
Natural odors are usually mixtures; yet, humans and animals can experience them as unitary percepts. Olfaction also enables stimulus categorization and generalization. We studied how these computations are performed with the responses of 168 locust antennal lobe projection neurons (PNs) to varying mixtures of two monomolecular odors, and of 174 PNs and 209 mushroom body Kenyon cells (KCs) to mixtures of up to eight monomolecular odors. Single-PN responses showed strong hypoadditivity and population trajectories clustered by odor concentration and mixture similarity. KC responses were much sparser on average than those of PNs and often signaled the presence of single components in mixtures. Linear classifiers could read out the responses of both populations in single time bins to perform odor identification, categorization, and generalization. Our results suggest that odor representations in the mushroom body may result from competing optimization constraints to facilitate memorization (sparseness) while enabling identification, classification, and generalization.
Resumo:
This work applies a variety of multilinear function factorisation techniques to extract appropriate features or attributes from high dimensional multivariate time series for classification. Recently, a great deal of work has centred around designing time series classifiers using more and more complex feature extraction and machine learning schemes. This paper argues that complex learners and domain specific feature extraction schemes of this type are not necessarily needed for time series classification, as excellent classification results can be obtained by simply applying a number of existing matrix factorisation or linear projection techniques, which are simple and computationally inexpensive. We highlight this using a geometric separability measure and classification accuracies obtained though experiments on four different high dimensional multivariate time series datasets. © 2013 IEEE.
Resumo:
Natural odors are usually mixtures; yet, humans and animals can experience them as unitary percepts. Olfaction also enables stimulus categorization and generalization. We studied how these computations are performed with the responses of 168 locust antennal lobe projection neurons (PNs) to varying mixtures of two monomolecular odors, and of 174 PNs and 209 mushroom body Kenyon cells (KCs) to mixtures of up to eight monomolecular odors. Single-PN responses showed strong hypoadditivity and population trajectories clustered by odor concentration and mixture similarity. KC responses were much sparser on average than those of PNs and often signaled the presence of single components in mixtures. Linear classifiers could read out the responses of both populations in single time bins to perform odor identification, categorization, and generalization. Our results suggest that odor representations in the mushroom body may result from competing optimization constraints to facilitate memorization (sparseness) while enabling identification, classification, and generalization
Resumo:
The importance of properly exploiting a classifier's inherent geometric characteristics when developing a classification methodology is emphasized as a prerequisite to achieving near optimal performance when carrying out thematic mapping. When used properly, it is argued that the long-standing maximum likelihood approach and the more recent support vector machine can perform comparably. Both contain the flexibility to segment the spectral domain in such a manner as to match inherent class separations in the data, as do most reasonable classifiers. The choice of which classifier to use in practice is determined largely by preference and related considerations, such as ease of training, multiclass capabilities, and classification cost. © 1980-2012 IEEE.
Resumo:
McCullagh and Yang (2006) suggest a family of classification algorithms based on Cox processes. We further investigate the log Gaussian variant which has a number of appealing properties. Conditioned on the covariates, the distribution over labels is given by a type of conditional Markov random field. In the supervised case, computation of the predictive probability of a single test point scales linearly with the number of training points and the multiclass generalization is straightforward. We show new links between the supervised method and classical nonparametric methods. We give a detailed analysis of the pairwise graph representable Markov random field, which we use to extend the model to semi-supervised learning problems, and propose an inference method based on graph min-cuts. We give the first experimental analysis on supervised and semi-supervised datasets and show good empirical performance.
Resumo:
词义消歧一直是自然语言理解中的一个关键问题,该问题解决的好坏直接关系到自然语言处理中诸多应用问题的效果优劣.由于自然语言知识表示的困难,在手工规则的词义消歧难以达到理想效果的情况下,各种有导机器学习方法被应用于词义消歧任务中.借鉴前人的成果引入信息检索领域中向量空间模型文档词语权重计算技术来解决多义词义项的知识表示问题,并提出了上下文位置权重的计算方法,给出了一种基于向量空间模型的词义消歧有导机器学习方法.该方法将多义词的义项和上下文分别映射到向量空间中,通过计算多义词上下文向量与义项向量的距离,采用k-NN(k=1)方法来确定上下文向量的义项分类.在9个汉语高频多义词的开放和封闭测试中均取得了突出的成绩(封闭测试平均正确率为96.31% ,开放测试平均正确率为92.98%),验证了该方法的有效性.
Resumo:
随着P2P技术的发展,网络上充满了大量的P2P应用。协议加密技术的发展,使得P2P应用的识别和管理变得非常困难。描述了如何运用半监督的机器学习理论,根据传输层的特征,用聚类算法训练数据并建立一个高效的在线协议识别器,用于在内核协议层对协议特别是P2P协议进行识别,并对BitComet和Emule进行了实验,得到了很高的识别准确率(80%)。研究并解决了将选取好的特征用于聚类并高效地实现最后的协议识别器。
Resumo:
The aim of this paper is to show that Dempster-Shafer evidence theory may be successfully applied to unsupervised classification in multisource remote sensing. Dempster-Shafer formulation allows for consideration of unions of classes, and to represent both imprecision and uncertainty, through the definition of belief and plausibility functions. These two functions, derived from mass function, are generally chosen in a supervised way. In this paper, the authors describe an unsupervised method, based on the comparison of monosource classification results, to select the classes necessary for Dempster-Shafer evidence combination and to define their mass functions. Data fusion is then performed, discarding invalid clusters (e.g. corresponding to conflicting information) thank to an iterative process. Unsupervised multisource classification algorithm is applied to MAC-Europe'91 multisensor airborne campaign data collected over the Orgeval French site. Classification results using different combinations of sensors (TMS and AirSAR) or wavelengths (L- and C-bands) are compared. Performance of data fusion is evaluated in terms of identification of land cover types. The best results are obtained when all three data sets are used. Furthermore, some other combinations of data are tried, and their ability to discriminate between the different land cover types is quantified
Resumo:
Decision tree classification algorithms have significant potential for land cover mapping problems and have not been tested in detail by the remote sensing community relative to more conventional pattern recognition techniques such as maximum likelihood classification. In this paper, we present several types of decision tree classification algorithms arid evaluate them on three different remote sensing data sets. The decision tree classification algorithms tested include an univariate decision tree, a multivariate decision tree, and a hybrid decision tree capable of including several different types of classification algorithms within a single decision tree structure. Classification accuracies produced by each of these decision tree algorithms are compared with both maximum likelihood and linear discriminant function classifiers. Results from this analysis show that the decision tree algorithms consistently outperform the maximum likelihood and linear discriminant function classifiers in regard to classf — cation accuracy. In particular, the hybrid tree consistently produced the highest classification accuracies for the data sets tested. More generally, the results from this work show that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure. Further, decision tree algorithms are strictly nonparametric and, therefore, make no assumptions regarding the distribution of input data, and are flexible and robust with respect to nonlinear and noisy relations among input features and class labels.
Resumo:
Over last two decades, numerous studies have used remotely sensed data from the Advanced Very High Resolution Radiometer (AVHRR) sensors to map land use and land cover at large spatial scales, but achieved only limited success. In this paper, we employed an approach that combines both AVHRR images and geophysical datasets (e.g. climate, elevation). Three geophysical datasets are used in this study: annual mean temperature, annual precipitation, and elevation. We first divide China into nine bio-climatic regions, using the long-term mean climate data. For each of nine regions, the three geophysical data layers are stacked together with AVHRR data and AVHRR-derived vegetation index (Normalized Difference Vegetation Index) data, and the resultant multi-source datasets were then analysed to generate land-cover maps for individual regions, using supervised classification algorithms. The nine land-cover maps for individual regions were assembled together for China. The existing land-cover dataset derived from Landsat Thematic Mapper (TM) images was used to assess the accuracy of the classification that is based on AVHRR and geophysical data. Accuracy of individual regions varies from 73% to 89%, with an overall accuracy of 81% for China. The results showed that the methodology used in this study is, in general, feasible for large-scale land-cover mapping in China.
Resumo:
Four models are employed in the landscape change detection of the newly created wetland. The models include ones for patch connectivity. ecological diversity, human impact intensity and mean center of land cover. The landscape data of the newly created wetland in Yellow River Delta in 1984, 1991, and 1996 are produced from the unsupervised classification and the supervised classification on the basis of integrating Landsat TM images of the newly created wetland in the four seasons of the each year. The result from operating the models into the data shows that the newly created wetland landscape in Yellow River Delta had a great chance. The driving focus of the change are mainly from natural evolution of the newly created wetland and rapid population growth, especially non-peasant population growth in Yellow River Delta because a considerable amount of oil and gas fields have been found in the Yellow River Delta. For preventing the newly created wetland from more destruction and conserving benign Succession of the ecosystems in the newly created wetland, six measures are suggested on the basis of research results. (C) 2003 Elsevier Science B.V. All rights reserved.
Resumo:
Orthogonal neighborhood-preserving projection (ONPP) is a recently developed orthogonal linear algorithm for overcoming the out-of-sample problem existing in the well-known manifold learning algorithm, i.e., locally linear embedding. It has been shown that ONPP is a strong analyzer of high-dimensional data. However, when applied to classification problems in a supervised setting, ONPP only focuses on the intraclass geometrical information while ignores the interaction of samples from different classes. To enhance the performance of ONPP in classification, a new algorithm termed discriminative ONPP (DONPP) is proposed in this paper. DONPP 1) takes into account both intraclass and interclass geometries; 2) considers the neighborhood information of interclass relationships; and 3) follows the orthogonality property of ONPP. Furthermore, DONPP is extended to the semisupervised case, i.e., semisupervised DONPP (SDONPP). This uses unlabeled samples to improve the classification accuracy of the original DONPP. Empirical studies demonstrate the effectiveness of both DONPP and SDONPP.
Resumo:
针对用于服务机器人的脑机接口系统中脑电信号模式识别精度不高,不能满足机器人多任务要求的问题,提出一种基于C-支持向量多分类机的多类复杂手操作EEG信号模式识别方法,并将其应用到复杂手操作的EEG信号模式识别试验中,实现一个4类复杂手操作的模式识别,实验结果表明,与之前用BP神经网络进行识别相比,识别率由85%提高到了90%。