855 resultados para optimal feature selection


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering is a process of partitioning a given set of patterns into meaningful groups. The clustering process can be viewed as consisting of the following three phases: (i) feature selection phase, (ii) classification phase, and (iii) description generation phase. Conventional clustering algorithms implicitly use knowledge about the clustering environment to a large extent in the feature selection phase. This reduces the need for the environmental knowledge in the remaining two phases, permitting the usage of simple numerical measure of similarity in the classification phase. Conceptual clustering algorithms proposed by Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] and Stepp and Michalski [Artif. Intell., pp. 43–69 (1986)] make use of the knowledge about the clustering environment in the form of a set of predefined concepts to compute the conceptual cohesiveness during the classification phase. Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] have argued that the results obtained with the conceptual clustering algorithms are superior to conventional methods of numerical classification. However, this claim was not supported by the experimental results obtained by Dale [IEEE Trans. PAMI, PAMI-7, 241–244 (1985)]. In this paper a theoretical framework, based on an intuitively appealing set of axioms, is developed to characterize the equivalence between the conceptual clustering and conventional clustering. In other words, it is shown that any classification obtained using conceptual clustering can also be obtained using conventional clustering and vice versa.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features. Copyright © 2010 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This project introduces an improvement of the vision capacity of the robot Robotino operating under ROS platform. A method for recognizing object class using binary features has been developed. The proposed method performs a binary classification of the descriptors of each training image to characterize the appearance of the object class. It presents the use of the binary descriptor based on the difference of gray intensity of the pixels in the image. It shows that binary features are suitable to represent object class in spite of the low resolution and the weak information concerning details of the object in the image. It also introduces the use of a boosting method (Adaboost) of feature selection al- lowing to eliminate redundancies and noise in order to improve the performance of the classifier. Finally, a kernel classifier SVM (Support Vector Machine) is trained with the available database and applied for predictions on new images. One possible future work is to establish a visual servo-control that is to say the reac- tion of the robot to the detection of the object.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[EN]Fundación Zain is developing new built heritage assessment protocols. The goal is to objectivize and standardize the analysis and decision process that leads to determining the degree of protection of built heritage in the Basque Country. The ultimate step in this objectivization and standardization effort will be the development of an information and communication technology (ICT) tool for the assessment of built heritage. This paper presents the ground work carried out to make this tool possible: the automatic, image-based delineation of stone masonry. This is a necessary first step in the development of the tool, as the built heritage that will be assessed consists of stone masonry construction, and many of the features analyzed can be characterized according to the geometry and arrangement of the stones. Much of the assessment is carried out through visual inspection. Thus, this process will be automated by applying image processing on digital images of the elements under inspection. The principal contribution of this paper is the automatic delineation the framework proposed. The other contribution is the performance evaluation of this delineation as the input to a classifier for a geometrically characterized feature of a built heritage object. The element chosen to perform this evaluation is the stone arrangement of masonry walls. The validity of the proposed framework is assessed on real images of masonry walls.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundacion Zain is developing new built heritage assessment protocols. The goal is to objectivize and standardize the analysis and decision process that leads to determining the degree of protection of built heritage in the Basque Country. The ultimate step in this objectivization and standardization effort will be the development of an information and communication technology (ICT) tool for the assessment of built heritage. This paper presents the ground work carried out to make this tool possible: the automatic, image-based delineation of stone masonry. This is a necessary first step in the development of the tool, as the built heritage that will be assessed consists of stone masonry construction, and many of the features analyzed can be characterized according to the geometry and arrangement of the stones. Much of the assessment is carried out through visual inspection. Thus, this process will be automated by applying image processing on digital images of the elements under inspection. The principal contribution of this paper is the automatic delineation the framework proposed. The other contribution is the performance evaluation of this delineation as the input to a classifier for a geometrically characterized feature of a built heritage object. The element chosen to perform this evaluation is the stone arrangement of masonry walls. The validity of the proposed framework is assessed on real images of masonry walls.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

比起传统的统计方法,人工神经网络具有很好的非线性处理和并行计算能力,在植被遥感信息处理中得到广泛的应用。本研究系统地介绍了人工神经网络理论及其在植被遥感信息处理中的应用现状。并就如何提高人工神经网络的相干被遥感影像的分类能力进行了详细研究。首次提出了结合植被指数和组成分分析的神经网络分类方法。过去这方面的研究工作大都集中在通过选择一个合适的神经网络模型来提高植被分类精度,而我们认为:根据植被遥感自身的规律,结合统计方法,确定合适的网络输入模式的特征变量,也可以提高分类精度。 研究结果表明,尽管一般的神经网络分类器不需要对输入的模式做明显的特征提取,网络的隐层就具有特征提取的功能。但对TM影像七个波段和常用的五个植被指数(PVI、NDVI、WDVI、PVI、MSAVI2),分别做主成分分析,从而获得人工神经网络输入的特征变量,使用这样一种结合VI、PCA的神经网络对遥感TM多波段影像进行植被分类,能大大提高分类的精度。

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The uncertainty associated with a rainfall-runoff and non-point source loading (NPS) model can be attributed to both the parameterization and model structure. An interesting implication of the areal nature of NPS models is the direct relationship between model structure (i.e. sub-watershed size) and sample size for the parameterization of spatial data. The approach of this research is to find structural limitations in scale for the use of the conceptual NPS model, then examine the scales at which suitable stochastic depictions of key parameter sets can be generated. The overlapping regions are optimal (and possibly the only suitable regions) for conducting meaningful stochastic analysis with a given NPS model. Previous work has sought to find optimal scales for deterministic analysis (where, in fact, calibration can be adjusted to compensate for sub-optimal scale selection); however, analysis of stochastic suitability and uncertainty associated with both the conceptual model and the parameter set, as presented here, is novel; as is the strategy of delineating a watershed based on the uncertainty distribution. The results of this paper demonstrate a narrow range of acceptable model structure for stochastic analysis in the chosen NPS model. In the case examined, the uncertainties associated with parameterization and parameter sensitivity are shown to be outweighed in significance by those resulting from structural and conceptual decisions. © 2011 Copyright IAHS Press.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper presents a new copula based method for measuring dependence between random variables. Our approach extends the Maximum Mean Discrepancy to the copula of the joint distribution. We prove that this approach has several advantageous properties. Similarly to Shannon mutual information, the proposed dependence measure is invariant to any strictly increasing transformation of the marginal variables. This is important in many applications, for example in feature selection. The estimator is consistent, robust to outliers, and uses rank statistics only. We derive upper bounds on the convergence rate and propose independence tests too. We illustrate the theoretical contributions through a series of experiments in feature selection and low-dimensional embedding of distributions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, a novel MPC strategy is proposed, and referred to as asso MPC. The new paradigm features an 1-regularised least squares loss function, in which the control error variance competes with the sum of input channels magnitude (or slew rate) over the whole horizon length. This cost choice is motivated by the successful development of LASSO theory in signal processing and machine learning. In the latter fields, sum-of-norms regularisation have shown a strong capability to provide robust and sparse solutions for system identification and feature selection. In this paper, a discrete-time dual-mode asso MPC is formulated, and its stability is proven by application of standard MPC arguments. The controller is then tested for the problem of ship course keeping and roll reduction with rudder and fins, in a directional stochastic sea. Simulations show the asso MPC to inherit positive features from its corresponding regressor: extreme reduction of decision variables' magnitude, namely, actuators' magnitude (or variations), with a finite energy error, being particularly promising for over-actuated systems. © 2012 AACC American Automatic Control Council).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Compared with other existing methods, the feature point-based image watermarking schemes can resist to global geometric attacks and local geometric attacks, especially cropping and random bending attacks (RBAs), by binding watermark synchronization with salient image characteristics. However, the watermark detection rate remains low in the current feature point-based watermarking schemes. The main reason is that both of feature point extraction and watermark embedding are more or less related to the pixel position, which is seriously distorted by the interpolation error and the shift problem during geometric attacks. In view of these facts, this paper proposes a geometrically robust image watermarking scheme based on local histogram. Our scheme mainly consists of three components: (1) feature points extraction and local circular regions (LCRs) construction are conducted by using Harris-Laplace detector; (2) a mechanism of grapy theoretical clustering-based feature selection is used to choose a set of non-overlapped LCRs, then geometrically invariant LCRs are completely formed through dominant orientation normalization; and (3) the histogram and mean statistically independent of the pixel position are calculated over the selected LCRs and utilized to embed watermarks. Experimental results demonstrate that the proposed scheme can provide sufficient robustness against geometric attacks as well as common image processing operations. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Feature-based image watermarking schemes, which aim to survive various geometric distortions, have attracted great attention in recent years. Existing schemes have shown robustness against rotation, scaling, and translation, but few are resistant to cropping, nonisotropic scaling, random bending attacks (RBAs), and affine transformations. Seo and Yoo present a geometrically invariant image watermarking based on affine covariant regions (ACRs) that provide a certain degree of robustness. To further enhance the robustness, we propose a new image watermarking scheme on the basis of Seo's work, which is insensitive to geometric distortions as well as common image processing operations. Our scheme is mainly composed of three components: 1) feature selection procedure based on graph theoretical clustering algorithm is applied to obtain a set of stable and nonoverlapped ACRs; 2) for each chosen ACR, local normalization, and orientation alignment are performed to generate a geometrically invariant region, which can obviously improve the robustness of the proposed watermarking scheme; and 3) in order to prevent the degradation in image quality caused by the normalization and inverse normalization, indirect inverse normalization is adopted to achieve a good compromise between the imperceptibility and robustness. Experiments are carried out on an image set of 100 images collected from Internet, and the preliminary results demonstrate that the developed method improves the performance over some representative image watermarking approaches in terms of robustness.