34 resultados para Learning machine


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, we present a teaching-learning sequence on colour intended to a pre-service elementary teacher programme informed by History and Philosophy of Science. Working in a socio-constructivist framework, we made an excursion on the history of colour. Our excursion through history of colour, as well as the reported misconception on colour helps us to inform the constructions of the teaching-learning sequence. We apply a questionnaire both before and after each of the two cycles of action-research in order to assess students’ knowledge evolution on colour and to evaluate our teaching-learning sequence. Finally, we present a discussion on the persistence of deep-rooted alternative conceptions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.