906 resultados para Text mining, Classificazione, Stemming, Text categorization
Resumo:
Text file evaluation is an emergent topic in e-learning that responds to the shortcomings of the assessment based on questions with predefined answers. Questions with predefined answers are formalized in languages such as IMS Question & Test Interoperability Specification (QTI) and supported by many e-learning systems. Complex evaluation domains justify the development of specialized evaluators that participate in several business processes. The goal of this paper is to formalize the concept of a text file evaluation in the scope of the E-Framework – a service oriented framework for development of e-learning systems maintained by a community of practice. The contribution includes an abstract service type and a service usage model. The former describes the generic capabilities of a text file evaluation service. The later is a business process involving a set of services such as repositories of learning objects and learning management systems.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.
Resumo:
Context and Objective: Chagas disease is considered a worldwide emerging disease; it is endemic in Mexico and the state of Coahuila and is considered of little relevance. The objective of this study was to determine the seroprevalence of T. cruzi infection in blood donors and Chagas cardiomyopathy in patients from the coal mining region of Coahuila, Mexico.Design and Setting: Epidemiological, exploratory and prospective study in a general hospital during the period January to June 2011.Methods: We performed laboratory tests ELISA and indirect hemagglutination in three groups of individuals: 1) asymptomatic voluntary blood donors, 2) patients hospitalized in the cardiology department and 3) patients with dilated cardiomyopathy.Results: There were three levels of seroprevalence: 0.31% in asymptomatic individuals, 1.25% in cardiac patients and in patients with dilated cardiomyopathy in 21.14%.Conclusions: In spite of having detected autochthonous cases of Chagas disease, its importance to local public health remains to be established as well as the details of the dynamics of transmission so that the study is still in progress.