Biblioteca Digital

997 resultados para Document Classification

Document Classification Methods for Organizing Explicit Knowledge

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Intelligent Search and Automatic Document Classification and Cataloging Based on Ontology Approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach to development of intelligent search system and automatic document classification and cataloging tools for CASE-system based on metadata. The described method uses advantages of ontology approach and traditional approach based on keywords. The method has powerful intelligent means and it can be integrated with existing document search systems.

Veja mais

An approach to document classification using verb-object pairs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this thesis is to present a new approach to document classification using verb-object pairs. We explore one possible strategy that uses the presence of relevant verb-object pairs in documents as features and a Naive Bayes classifier as a classifier on which the model is trained. Then, we assess the results from the case study which uses a software based on the strategy and make conclusions.

Veja mais

Effectiveness of document representation for classification

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conventionally, document classification researches focus on improving the learning capabilities of classifiers. Nevertheless, according to our observation, the effectiveness of classification is limited by the suitability of document representation. Intuitively, the more features that are used in representation, the more comprehensive that documents are represented. However, if a representation contains too many irrelevant features, the classifier would suffer from not only the curse of high dimensionality, but also overfitting. To address this problem of suitableness of document representations, we present a classifier-independent approach to measure the effectiveness of document representations. Our approach utilises a labelled document corpus to estimate the distribution of documents in the feature space. By looking through documents in this way, we can clearly identify the contributions made by different features toward the document classification. Some experiments have been performed to show how the effectiveness is evaluated. Our approach can be used as a tool to assist feature selection, dimensionality reduction and document classification.

Veja mais

Fitness assessment of document model

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.

Veja mais

A Statistical Approach for Multilingual Document Clustering and Topic Extraction from Clusters

Relevância:

70.00% 70.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H30

Veja mais

Modelo de representação de texto mais adequado à classificação

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática

Veja mais

Classificação e agregação automática de notícias desportivas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Arquiteturas, Sistemas e Redes

Veja mais

Semantic enrichment of knowledge sources supported by domain ontologies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis introduces a novel conceptual framework to support the creation of knowledge representations based on enriched Semantic Vectors, using the classical vector space model approach extended with ontological support. One of the primary research challenges addressed here relates to the process of formalization and representation of document contents, where most existing approaches are limited and only take into account the explicit, word-based information in the document. This research explores how traditional knowledge representations can be enriched through incorporation of implicit information derived from the complex relationships (semantic associations) modelled by domain ontologies with the addition of information presented in documents. The relevant achievements pursued by this thesis are the following: (i) conceptualization of a model that enables the semantic enrichment of knowledge sources supported by domain experts; (ii) development of a method for extending the traditional vector space, using domain ontologies; (iii) development of a method to support ontology learning, based on the discovery of new ontological relations expressed in non-structured information sources; (iv) development of a process to evaluate the semantic enrichment; (v) implementation of a proof-of-concept, named SENSE (Semantic Enrichment kNowledge SourcEs), which enables to validate the ideas established under the scope of this thesis; (vi) publication of several scientific articles and the support to 4 master dissertations carried out by the department of Electrical and Computer Engineering from FCT/UNL. It is worth mentioning that the work developed under the semantic referential covered by this thesis has reused relevant achievements within the scope of research European projects, in order to address approaches which are considered scientifically sound and coherent and avoid “reinventing the wheel”.

Veja mais

Explorando abordagens de múltiplos rótulos por floresta de caminhos ótimos

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Veja mais

Incremental learning for interactive e-mail filtering

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this article, we propose a framework, namely, Prediction-Learning-Distillation (PLD) for interactive document classification and distilling misclassified documents. Whenever a user points out misclassified documents, the PLD learns from the mistakes and identifies the same mistakes from all other classified documents. The PLD then enforces this learning for future classifications. If the classifier fails to accept relevant documents or reject irrelevant documents on certain categories, then PLD will assign those documents as new positive/negative training instances. The classifier can then strengthen its weakness by learning from these new training instances. Our experiments’ results have demonstrated that the proposed algorithm can learn from user-identified misclassified documents, and then distil the rest successfully.

Veja mais

Opinion of the Committee on Economic and Monetary Affairs and Industrial Policy for the Committee on the Environment, Public Health and Consumer Protection on the proposal from the Commission to the Council for a directive amending for the seventh time Directive 67/548/EEC on the approximation of the laws, regulations and administrative provisions relating to the classification, packaging and labelling of dangerous substances (COM/89/575 final - C3-0047/90 - SYN 227). Session Documents 1990, Document A3-0230/90/ANNEX, 27 September 1990

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Report by the Committee on the Environment, Public Health and Consumer Protection on the proposal for a Council directive amending for the seventh time Directive 67/548/EEC on the approximation of the laws, regulations and administrative provision relating to the classification, packaging and labelling of dangerous substances (COM(89) 575 final - Doc. C3-0047/90 - SYN 0227). Session Documents 1990, Document A3-0230/90, 25 September 1990

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Generic object classification for autonomous robots

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Un dels principals problemes de la interacció dels robots autònoms és el coneixement de l'escena. El reconeixement és fonamental per a solucionar aquest problema i permetre als robots interactuar en un escenari no controlat. En aquest document presentem una aplicació pràctica de la captura d'objectes, de la normalització i de la classificació de senyals triangulars i circulars. El sistema s'introdueix en el robot Aibo de Sony per a millorar-ne la interacció. La metodologia presentada s'ha comprobat en simulacions i problemes de categorització reals, com ara la classificació de senyals de trànsit, amb resultats molt prometedors.

Veja mais

A classification, up to hyperbolicity, of groups given by 2 generators and one relator of length 8

Relevância:

30.00% 30.00%

Publicador:

Resumo:

"Vegeu el resum a l'inici del document del fitxer adjunt."

Veja mais

997 resultados para Document Classification

Filtro por publicador