8 resultados para Feature extraction and classification

em Bulgarian Digital Mathematics Library at IMI-BAS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach for knowledge extraction from the information arriving to the knowledge base input and also new knowledge distribution over knowledge subsets already present in the knowledge base is developed. It is also necessary to realize the knowledge transform into parameters (data) of the model for the following decision-making on the given subset. It is assumed to realize the decision-making with the fuzzy sets’ apparatus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The process of automatic handwriting investigation in forensic science is described. The general scheme of a computer-based handwriting analysis system is used to point out at the basic problems of image enhancement and segmentation, feature extraction and decision-making. Factors that may compromise the accuracy of expert’s conclusion are underlined and directions for future investigations are marked.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

After many years of scholar study, manuscript collections continue to be an important source of novel information for scholars, concerning both the history of earlier times as well as the development of cultural documentation over the centuries. D-SCRIBE project aims to support and facilitate current and future efforts in manuscript digitization and processing. It strives toward the creation of a comprehensive software product, which can assist the content holders in turning an archive of manuscripts into a digital collection using automated methods. In this paper, we focus on the problem of recognizing early Christian Greek manuscripts. We propose a novel digital image binarization scheme for low quality historical documents allowing further content exploitation in an efficient way. Based on the existence of closed cavity regions in the majority of characters and character ligatures in these scripts, we propose a novel, segmentation-free, fast and efficient technique that assists the recognition procedure by tracing and recognizing the most frequently appearing characters or character ligatures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents the principal results of the Ph.D. thesis Investigation and classification of doubly resolvable designs by Stela Zhelezova (Institute of Mathematics and Informatics, BAS), successfully defended at the Specialized Academic Council for Informatics and Mathematical Modeling on 22 February 2010.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is well established that accent recognition can be as accurate as up to 95% when the signals are noise-free, using feature extraction techniques such as mel-frequency cepstral coefficients and binary classifiers such as discriminant analysis, support vector machine and k-nearest neighbors. In this paper, we demonstrate that the predictive performance can be reduced by as much as 15% when the signals are noisy. Specifically, in this paper we perturb the signals with different levels of white noise, and as the noise become stronger, the out-of-sample predictive performance deteriorates from 95% to 80%, although the in-sample prediction gives overly-optimistic results. ACM Computing Classification System (1998): C.3, C.5.1, H.1.2, H.2.4., G.3.