Biblioteca Digital

16 resultados para Text mining, Classificazione, Stemming, Text categorization

em University of Queensland eSpace - Australia

Fitness assessment of document model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.

Supporting the curation of biological databases with reusable text mining

Relevância:

60.00% 60.00%

Publicador:

EOPAS, the EthnoER online representation of interlinear text

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One of the goals of the ARC funded Eresearch project called Sharing access and analytical tools for ethnographic digital media using high speed networks, or simply EthnoER is to take outputs of normal linguistic analytical processes and present them online in a system we have called the EthnoER online presentation and annotation system, or EOPAS.

Special issue on advances in data mining and its applications

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Oden Salomos: Text, übersetzung, Kommentar Teil 2: Oden 15-28

Relevância:

40.00% 40.00%

Publicador:

Oden Salomos: Text übersetzung, Kommentar Teil 3: Oden 29 -42. Transkription des Syrischen von Klaus Beyer

Relevância:

40.00% 40.00%

Publicador:

The Road to Social Work and Human Service Practice: An Introductory Text

Relevância:

40.00% 40.00%

Publicador:

Inhibition of neointimal formation by natural heparan sulfate proteoglycans of the arterial wall Atherosclerosis IV: Recent Advances in Atherosclerosis Research -- Volume 811, published Apr 1997 Edited by Fujio Numano; Russell Ross description | full text

Relevância:

40.00% 40.00%

Publicador:

Text, theory, space: Land, literature and history in South Africa and Australia - DarianSmith,K, Gunner,L, Nuttall,S

Relevância:

40.00% 40.00%

Publicador:

The celebrity in the text

Relevância:

40.00% 40.00%

Publicador:

Transgendering shojo shosetsu: Girls' inter-text/sex-uality

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This essay recognises the power of reading and intertextuality (embedding texts within texts) in fiction targeted at girls and young women.

Case-Study: Text Publishing

Relevância:

40.00% 40.00%

Publicador:

Which way to the tomb of Jesus? Martha and Myrrhbearer in image, text and liturgy

Relevância:

40.00% 40.00%

Publicador:

Rhetorical styles and newstext: A contrastive analysis of rhetorical relations in Chinese and Australian news-journal text

Relevância:

40.00% 40.00%

Publicador:

Book review of: 'Textual traffic: Colonialism, modernity and the economy of text' by S. Shankar

Relevância:

40.00% 40.00%

Publicador:

Resumo:

No abstract

«
1
2
»