59 resultados para Natural language techniques, Semantic spaces, Random projection, Documents
Resumo:
In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.
Resumo:
This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).
Resumo:
In the context of the Semantic Web, resources on the net can be enriched by well-defined, machine-understandable metadata describing their associated conceptual meaning. These metadata consisting of natural language descriptions of concepts are the focus of the activity we describe in this chapter, namely, ontology localization. In the framework of the NeOn Methodology, ontology localization is defined as the activity of adapting an ontology to a particular language and culture. This adaptation mainly involves the translation of the natural language descriptions of the ontology from a source natural language to a target natural language, with the final objective of obtaining a multilingual ontology, that is, an ontology documented in several natural languages. The purpose of this chapter is to provide detailed and prescriptive methodological guidelines to support the performance of this activity.
Resumo:
Effective data summarization methods that use AI techniques can help humans understand large sets of data. In this paper, we describe a knowledge-based method for automatically generating summaries of geospatial and temporal data, i.e. data with geographical and temporal references. The method is useful for summarizing data streams, such as GPS traces and traffic information, that are becoming more prevalent with the increasing use of sensors in computing devices. The method presented here is an initial architecture for our ongoing research in this domain. In this paper we describe the data representations we have designed for our method, our implementations of components to perform data abstraction and natural language generation. We also discuss evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.
Resumo:
Vivimos en una época en la que cada vez existe una mayor cantidad de información. En el dominio de la salud la historia clínica digital ha permitido digitalizar toda la información de los pacientes. Estas historias clínicas digitales contienen una gran cantidad de información valiosa escrita en forma narrativa que sólo podremos extraer recurriendo a técnicas de procesado de lenguaje natural. No obstante, si se quiere realizar búsquedas sobre estos textos es importante analizar que la información relativa a síntomas, enfermedades, tratamientos etc. se puede refererir al propio paciente o a sus antecentes familiares, y que ciertos términos pueden aparecer negados o ser hipotéticos. A pesar de que el español ocupa la segunda posición en el listado de idiomas más hablados con más de 500 millones de hispano hablantes, hasta donde tenemos de detección de la negación, probabilidad e histórico en textos clínicos en español. Por tanto, este Trabajo Fin de Grado presenta una implementación basada en el algoritmo ConText para la detección de la negación, probabilidad e histórico en textos clínicos escritos en español. El algoritmo se ha validado con 454 oraciones que incluían un total de 1897 disparadores obteniendo unos resultado de 83.5 %, 96.1 %, 96.9 %, 99.7% y 93.4% de exactitud con condiciones afirmados, negados, probable, probable negado e histórico respectivamente. ---ABSTRACT---We live in an era in which there is a huge amount of information. In the domain of health, the electronic health record has allowed to digitize all the information of the patients. These electronic health records contain valuable information written in narrative form that can only be extracted using techniques of natural language processing. However, if you want to search on these texts is important to analyze if the relative information about symptoms, diseases, treatments, etc. are referred to the patient or family casework, and that certain terms may appear negated or be hypothesis. Although Spanish is the second spoken language with more than 500 million speakers, there seems to be no method of detection of negation, hypothesis or historical in medical texts written in Spanish. Thus, this bachelor’s final degree presents an implementation based on the ConText algorithm for the detection of negation, hypothesis and historical in medical texts written in Spanish. The algorithm has been validated with 454 sentences that included a total of 1897 triggers getting a result of 83.5 %, 96.1 %, 96.9 %, 99.7% and 93.4% accuracy with affirmed, negated, hypothesis, negated hypothesis and historical respectively.
Resumo:
El presente trabajo desarrolla un servicio REST que transforma frases en lenguaje natural a grafos RDF. Los grafos generados son grafos dirigidos, donde los nodos se forman con los sustantivos o adjetivos de las frases, y los arcos se forman con los verbos. Se utiliza dentro del proyecto p-medicine para dar soporte a las siguientes funcionalidades: Búsquedas en lenguaje natural: actualmente la plataforma p-medicine proporciona un interfaz programático para realizar consultas en SPARQL. El servicio desarrollado permitiría generar esas consultas automáticamente a partir de frases en lenguaje natural. Anotaciones de bases de datos mediante lenguaje natural: la plataforma pmedicine incorpora una herramienta, desarrollada por el Grupo de Ingeniería Biomédica de la Universidad Politécnica de Madrid, para la anotación de bases de datos RDF. Estas anotaciones son necesarias para la posterior traducción de las bases de datos a un esquema central. El proceso de anotación requiere que el usuario construya de forma manual las vistas RDF que desea anotar, lo que requiere mostrar gráficamente el esquema RDF y que el usuario construya vistas RDF seleccionando las clases y relaciones necesarias. Este proceso es a menudo complejo y demasiado difícil para un usuario sin perfil técnico. El sistema se incorporará para permitir que la construcción de estas vistas se realice con lenguaje natural. ---ABSTRACT---The present work develops a REST service that transforms natural language sentences to RDF degrees. Generated graphs are directed graphs where nodes are formed with nouns or adjectives of phrases, and the arcs are formed with verbs. Used within the p-medicine project to support the following functionality: Natural language queries: currently the p-medicine platform provides a programmatic interface to query SPARQL. The developed service would automatically generate those queries from natural language sentences. Memos databases using natural language: the p-medicine platform incorporates a tool, developed by the Group of Biomedical Engineering at the Polytechnic University of Madrid, for the annotation of RDF data bases. Such annotations are necessary for the subsequent translation of databases to a central scheme. The annotation process requires the user to manually construct the RDF views that he wants annotate, requiring graphically display the RDF schema and the user to build RDF views by selecting classes and relationships. This process is often complex and too difficult for a user with no technical background. The system is incorporated to allow the construction of these views to be performed with natural language.
Resumo:
This paper describes the application of language translation technologies for generating bus information in Spanish Sign Language (LSE: Lengua de Signos Española). In this work, two main systems have been developed: the first for translating text messages from information panels and the second for translating spoken Spanish into natural conversations at the information point of the bus company. Both systems are made up of a natural language translator (for converting a word sentence into a sequence of LSE signs), and a 3D avatar animation module (for playing back the signs). For the natural language translator, two technological approaches have been analyzed and integrated: an example-based strategy and a statistical translator. When translating spoken utterances, it is also necessary to incorporate a speech recognizer for decoding the spoken utterance into a word sequence, prior to the language translation module. This paper includes a detailed description of the field evaluation carried out in this domain. This evaluation has been carried out at the customer information office in Madrid involving both real bus company employees and deaf people. The evaluation includes objective measurements from the system and information from questionnaires. In the field evaluation, the whole translation presents an SER (Sign Error Rate) of less than 10% and a BLEU greater than 90%.
Resumo:
An important part of human intelligence, both historically and operationally, is our ability to communicate. We learn how to communicate, and maintain our communicative skills, in a society of communicators – a highly effective way to reach and maintain proficiency in this complex skill. Principles that might allow artificial agents to learn language this way are in completely known at present – the multi-dimensional nature of socio-communicative skills are beyond every machine learning framework so far proposed. Our work begins to address the challenge of proposing a way for observation-based machine learning of natural language and communication. Our framework can learn complex communicative skills with minimal up-front knowledge. The system learns by incrementally producing predictive models of causal relationships in observed data, guided by goal-inference and reasoning using forward-inverse models. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime TV-style interview, using multimodal communicative gesture and situated language to talk about recycling of various materials and objects. S1 can learn multimodal complex language and multimodal communicative acts, a vocabulary of 100 words forming natural sentences with relatively complex sentence structure, including manual deictic reference and anaphora. S1 is seeded only with high-level information about goals of the interviewer and interviewee, and a small ontology; no grammar or other information is provided to S1 a priori. The agent learns the pragmatics, semantics, and syntax of complex utterances spoken and gestures from scratch, by observing the humans compare and contrast the cost and pollution related to recycling aluminum cans, glass bottles, newspaper, plastic, and wood. After 20 hours of observation S1 can perform an unscripted TV interview with a human, in the same style, without making mistakes.
Resumo:
This document will be divided into two main parts. The first one will be the classification of the authentication techniques. We will search the main electronic databases for papers related to authentication techniques. We will then summarize the related papers and show what classifications they use for the authentication techniques. After all of the documents have been read and summarized we will analyse them and group the authentication techniques into the classifications found. For the second part of the document we will focus on the study of usability attributes in the authentication techniques. This to know how authentications techniques compare to one another based on their usability attributes. We will search the main electronic databases for papers related to the usability attributes of authentication techniques based on the usability definition of ISO/IEC 25010 (SQuaRE) and its attributes. We will then summarize the related papers and show what authentication methods they describe and which usability attributes they measure. After all of the documents have been read and summarized we will analyse them depending on their usability attribute. At the end we will elaborate those results to show which authentication techniques have better usability in terms of a specific usability attribute. This will help practitioners who are interested in using authentication methods but want or need to focus on a specific usability attribute. They will be able to use this as a guide to help them chose the best option that fits their purpose.
Resumo:
The mobile apps market is a tremendous success, with millions of apps downloaded and used every day by users spread all around the world. For apps’ developers, having their apps published on one of the major app stores (e.g. Google Play market) is just the beginning of the apps lifecycle. Indeed, in order to successfully compete with the other apps in the market, an app has to be updated frequently by adding new attractive features and by fixing existing bugs. Clearly, any developer interested in increasing the success of her app should try to implement features desired by the app’s users and to fix bugs affecting the user experience of many of them. A precious source of information to decide how to collect users’ opinions and wishes is represented by the reviews left by users on the store from which they downloaded the app. However, to exploit such information the app’s developer should manually read each user review and verify if it contains useful information (e.g. suggestions for new features). This is something not doable if the app receives hundreds of reviews per day, as happens for the very popular apps on the market. In this work, our aim is to provide support to mobile apps developers by proposing a novel approach exploiting data mining, natural language processing, machine learning, and clustering techniques in order to classify the user reviews on the basis of the information they contain (e.g. useless, suggestion for new features, bugs reporting). Such an approach has been empirically evaluated and made available in a web-‐based tool publicly available to all apps’ developers. The achieved results showed that the developed tool: (i) is able to correctly categorise user reviews on the basis of their content (e.g. isolating those reporting bugs) with 78% of accuracy, (ii) produces clusters of reviews (e.g. groups together reviews indicating exactly the same bug to be fixed) that are meaningful from a developer’s point-‐of-‐view, and (iii) is considered useful by a software company working in the mobile apps’ development market.
Resumo:
Tradicionalmente, los entornos virtuales se han relacionado o vinculado de forma muy estrecha con campos como el diseño de escenarios tridimensionales o los videojuegos; dejando poco margen a poder pensar en sus aplicaciones en otros ámbitos. Sin embargo, estas tendencias pueden cambiar en tanto se demuestre que las aplicaciones y ventajas de estas facilidades software, se pueden extrapolar a su uso en el ámbito de la enseñanza y el aprendizaje. Estas aplicaciones son los conocidos como Entornos Virtuales Inteligentes (EVI); los cuales, tratan de usar un entorno virtual para llevar a cabo labores de enseñanza y tutoría, aportando ventajas como simulación de entornos peligrosos o tutorización personalizada; cosa que no podemos encontrar en la mayoría de los casos de las situaciones de enseñanza reales. Este trabajo trata de dar solución a una de las problemáticas que se plantean a la hora de trabajar con cualquier entorno virtual con el que nos encontremos y prepararlo para su cometido, sobre todo en aquellos enfocados a la enseñanza: dotar de forma automática e inteligente de una semántica propia a cada uno de los objetos que se encuentran en un entorno virtual y almacenar esta información para su posterior consulta o uso para otras tareas. Esto quiere decir que el objetivo principal de este trabajo, es el proceso de recolección de información que se considera importante de los objetos de los entornos virtuales, como pueden ser sus aspectos de la forma, tamaño o color. Aspectos que, por otra parte, son realmente importantes para poder caracterizar los objetos y hacerlos únicos en un entorno virtual donde, a priori, todos los objetos son los mismos a ojos de un ordenador. Este trabajo que puede parecer trivial en un principio, no lo es tanto; y servirá de sustento fundamental para que otras aplicaciones futuras o ya existentes puedan realizar sus tareas. Una de estas tareas pudiera ser la generación de indicaciones en lenguaje natural para guiar a usuarios a localizar objetos en un entorno virtual, como es el caso del proyecto LORO sobre el que se engloba este trabajo. Algunos ejemplos de uso de esta tarea pueden ser desde ayudar a cualquier usuario a encontrar sus llaves en su propia casa a ayudar a un cirujano a localizar cierta herramienta en un quirófano. Para ello, es indispensable conocer la semántica e información relevante de cada uno de los objetos que se presentan en la escena y diferenciarlos claramente del resto. La solución propuesta se trata de una completa aplicación integrada en el motor de videojuegos y escenarios 3D de mayor soporte del mundo como es Unity 3D, el cual se interrelaciona con ontologías para poder guardar la información de los objetos de cada escena. Esto hace que la aplicación tenga una potencial difusión, gracias a las herramientas antes mencionadas para su desarrollo y a que está pensada para tanto el usuario experto como el usuario común.---ABSTRACT---Traditionally, virtual environments have been related to tridimensional design and videogames; leaving a little margin to think about its applications in other fields. However, this tendencies can be changed as soon as it is proven that the applications and advantages of this software can be taken to the learning and teaching environment. This applications are known as intelligent virtual environments, these use the virtual environment to perform teaching and tutoring tasks; tasks we cannot find in most real life teaching situations. This project aims to give a solution to one of the problematics that appears when someone works with any virtual environments we may encounter and prepare it for its duty, mainly those environments dedicated to teaching: automatically and intelligently give its own semantic to the objects that are in any virtual environment and save this information for its posterior query or use in other tasks. The main purpose of this project is the information recollection process that considers the different important facts about the objects that are in the virtual environments, such as their shape, size or color. Facts that are very important for characterizing the objects; to make them unique in the environment where the objects are all the same to the computer’s eye. This project may seem banal in the beginning, but it is not, it will be the fundamental base for future applications. One of this applications may be a natural language indicator generator for guiding users to locate objects in a virtual environment, such as the LORO project, where this project is included. Some examples of the use of this task are: helping any user to find the keys of his house, helping a surgeon to find a tool in an operation room… For this goals, it is very important to know the semantics and the relevant information of each object of the scenario and differentiate each one of them from the rest. The solution for this proposal is a fully integrated application in the videogame and Unity 3D engine that is related to ontologies so it can save the object’s information in every scenario. The previously mentioned tools, as well as the idea that this application is made for an expert user as well as for a common user, make the application more spreadable.
Resumo:
En los últimos años han surgido nuevos campos de las tecnologías de la información que exploran el tratamiento de la gran cantidad de datos digitales existentes y cómo transformarlos en conocimiento explícito. Las técnicas de Procesamiento del Lenguaje Natural (NLP) son capaces de extraer información de los textos digitales presentados en forma narrativa. Además, las técnicas de machine learning clasifican instancias o ejemplos en función de sus atributos, en distintas categorías, aprendiendo de otros previamente clasificados. Los textos clínicos son una gran fuente de información no estructurada; en consecuencia, información no explotada en su totalidad. Algunos términos usados en textos clínicos se encuentran en una situación de afirmación, negación, hipótesis o histórica. La detección de esta situación es necesaria para la estructuración de información, pero a su vez tiene una gran complejidad. Extrayendo características lingüísticas de los elementos, o tokens, de los textos mediante NLP; transformando estos tokens en instancias y las características en atributos, podemos mediante técnicas de machine learning clasificarlos con el objetivo de detectar si se encuentran afirmados, negados, hipotéticos o históricos. La selección de los atributos que cada token debe tener para su clasificación, así como la selección del algoritmo de machine learning utilizado son elementos cruciales para la clasificación. Son, de hecho, los elementos que componen el modelo de clasificación. Consecuentemente, este trabajo aborda el proceso de extracción de características, selección de atributos y selección del algoritmo de machine learning para la detección de la negación en textos clínicos en español. Se expone un modelo para la clasificación que, mediante el algoritmo J48 y 35 atributos obtenidos de características lingüísticas (morfológicas y sintácticas) y disparadores de negación, detecta si un token está negado en 465 frases provenientes de textos clínicos con un F-Score del 73%, una exhaustividad del 66% y una precisión del 81% con una validación cruzada de 10 iteraciones. ---ABSTRACT--- New information technologies have emerged in the recent years which explore the processing of the huge amount of existing digital data and its transformation into knowledge. Natural Language Processing (NLP) techniques are able to extract certain features from digital texts. Additionally, through machine learning techniques it is feasible to classify instances according to different categories, learning from others previously classified. Clinical texts contain great amount of unstructured data, therefore information not fully exploited. Some terms (tokens) in clinical texts appear in different situations such as affirmed, negated, hypothetic or historic. Detecting this situation is necessary for the structuring of this data, however not simple. It is possible to detect whether if a token is negated, affirmed, hypothetic or historic by extracting its linguistic features by NLP; transforming these tokens into instances, the features into attributes, and classifying these instances through machine learning techniques. Selecting the attributes each instance must have, and choosing the machine learning algorithm are crucial issues for the classification. In fact, these elements set the classification model. Consequently, this work approaches the features retrieval as well as the attributes and algorithm selection process used by machine learning techniques for the detection of negation in clinical texts in Spanish. We present a classification model which, through J48 algorithm and 35 attributes from linguistic features (morphologic and syntactic) and negation triggers, detects whether if a token is negated in 465 sentences from historical records, with a result of 73% FScore, 66% recall and 81% precision using a 10-fold cross-validation.
Resumo:
El objetivo del presente trabajo es el estudio, diseño e implementación de una herramienta software, con interfaz gráfica de usuario, que permita aplicar diversas técnicas de análisis de textos de forma simple. Las técnicas de análisis, que serán implementadas en la herramienta, extraerán información de textos escritos en un lenguaje humano, es decir un lenguaje no artificial, y se le presentará al usuario. La herramienta permite la obtención de tres tipos de información: categorías a las que pertenece un texto, dentro de un conjunto de categorías predeterminadas; grupos de textos que son similares entre sí; y la polaridad de opinión expresada en un texto hacia el tema u objeto del que trata, que puede ser neutra, positiva o negativa.---ABSTRACT---The aim of this work is to study, design and implement a software tool, with graphical user interface, which will enable a user to easily apply various text analysis techniques. The techniques implemented in the tool will extract information from texts written in natural language, i.e. a non artificial language, and will present it to the user. The tool will extract three different types of information about a given set of texts: their categories (from a predefined set of categories), groups of similar texts, the polarity of the attitude expressed in the texts towards their topic.
Resumo:
This paper describes the development of an Advanced Speech Communication System for Deaf People and its field evaluation in a real application domain: the renewal of Driver’s License. The system is composed of two modules. The first one is a Spanish into Spanish Sign Language (LSE: Lengua de Signos Española) translation module made up of a speech recognizer, a natural language translator (for converting a word sequence into a sequence of signs), and a 3D avatar animation module (for playing back the signs). The second module is a Spoken Spanish generator from sign-writing composed of a visual interface (for specifying a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, a text to speech converter. For language translation, the system integrates three technologies: an example-based strategy, a rule-based translation method and a statistical translator. This paper also includes a detailed description of the evaluation carried out in the Local Traffic Office in the city of Toledo (Spain) involving real government employees and deaf people. This evaluation includes objective measurements from the system and subjective information from questionnaires. Finally, the paper reports an analysis of the main problems and a discussion about possible solutions.
Resumo:
This article presents a multi-agent expert system (SMAF) , that allows the input of incidents which occur in different elements of the telecommunications area. SMAF interacts with experts and general users, and each agent with all the agents? community, recording the incidents and their solutions in a knowledge base, without the analysis of their causes. The incidents are expressed using keywords taken from natural language (originally Spanish) and their main concepts are recorded with their severities as the users express them. Then, there is a search of the best solution for each incident, being helped by a human operator using a distancenotions between them.