3 resultados para Data selection

em Universidad de Alicante


Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The environmental, cultural and socio-economic causes and consequences of farmland abandonment are issues of increasing concern for researchers and policy makers. In previous studies, we proposed a new methodology for selecting the driving factors in farmland abandonment processes. Using Data Mining and GIS, it is possible to select those variables which are more significantly related to abandonment. The aim of this study is to investigate the application of the above mentioned methodology for finding relationships between relief and farmland abandonment in a Mediterranean region (SE Spain).We have taken into account up to 28 different variables in a single analysis, some of them commonly considered in land use change studies (slope, altitude, TWI, etc), but also other novel variables have been evaluated (sky view factor, terrain view factor, etc). The variable selection process provides results in line with the previous knowledge of the study area, describing some processes that are region specific (e.g. abandonment versus intensification of the agricultural activities). The European INSPIRE Directive (2007/2/EC) establishes that the digital elevation models for land surfaces should be available in all member countries, this means that the research described in this work can be extrapolated to any European country to determine whether these variables (slope, altitude, etc) are important in the process of abandonment.