5 resultados para natural classification
em Universidad de Alicante
Resumo:
El campo de procesamiento de lenguaje natural (PLN), ha tenido un gran crecimiento en los últimos años; sus áreas de investigación incluyen: recuperación y extracción de información, minería de datos, traducción automática, sistemas de búsquedas de respuestas, generación de resúmenes automáticos, análisis de sentimientos, entre otras. En este artículo se presentan conceptos y algunas herramientas con el fin de contribuir al entendimiento del procesamiento de texto con técnicas de PLN, con el propósito de extraer información relevante que pueda ser usada en un gran rango de aplicaciones. Se pueden desarrollar clasificadores automáticos que permitan categorizar documentos y recomendar etiquetas; estos clasificadores deben ser independientes de la plataforma, fácilmente personalizables para poder ser integrados en diferentes proyectos y que sean capaces de aprender a partir de ejemplos. En el presente artículo se introducen estos algoritmos de clasificación, se analizan algunas herramientas de código abierto disponibles actualmente para llevar a cabo estas tareas y se comparan diversas implementaciones utilizando la métrica F en la evaluación de los clasificadores.
Resumo:
The present study aims to inventory and analyse the ethnobotanical knowledge about medicinal plants in the Serra de Mariola Natural Park. In respect to traditional uses, 93 species reported by local informants were therapeutic, 27 food, 4 natural dyes and 13 handcrafts. We developed a methodology that allowed the location of individuals or vegetation communities with a specific popular use. We prepared a geographic information system (GIS) that included gender, family, scientific nomenclature and common names in Spanish and Catalan for each species. We also made a classification of 39 medicinal uses from ATC (Anatomical, Therapeutic, Chemical classification system). Labiatae (n=19), Compositae (n=9) and Leguminosae (n=6) were the families most represented among the plants used to different purposes in humans. Species with the most elevated cultural importance index (CI) values were Thymus vulgaris (CI=1.431), Rosmarinus officinalis (CI=1.415), Eryngium campestre (CI=1.325), Verbascum sinuatum (CI=1.106) and Sideritis angustifolia (CI=1.041). Thus, the collected plants with more therapeutic uses were: Lippia triphylla (12), Thymus vulgaris and Allium roseum (9) and Erygium campestre (8). The most repeated ATC uses were: G04 (urological use), D03 (treatment of wounds and ulcers) and R02 (throat diseases). These results were in a geographic map where each point represented an individual of any species. A database was created with the corresponding therapeutic uses. This application is useful for the identification of individuals and the selection of species for specific medicinal properties. In the end, knowledge of these useful plants may be interesting to revive the local economy and in some cases promote their cultivation.
Resumo:
Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Records are written in spontaneous language. We explore several machine learning techniques to deal with the classification problem. Random Forest resulted in the most competitive one, achieving an F-measure of 0.876.
Resumo:
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.
Resumo:
The evolution of CRISPR–cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR–cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR–Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.