894 resultados para Query expansion, Text mining, Information retrieval, Chinese IR


Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Gestión de Recursos Humanos a través de Internet es un problema latente y presente actualmente en cualquier sitio web dedicado a la búsqueda de empleo. Este problema también está presente en AFRICA BUILD Portal. AFRICA BUILD Portal es una emergente red socio-profesional nacida con el ánimo de crear comunidades virtuales que fomenten la educación e investigación en el área de la salud en países africanos. Uno de los métodos para fomentar la educación e investigación es mediante la movilidad de estudiantes e investigadores entre instituciones, apareciendo así, el citado problema de la gestión de recursos humanos. Por tanto, este trabajo se centra en solventar el problema de la gestión de recursos humanos en el entorno específico de AFRICA BUILD Portal. Para solventar este problema, el objetivo es desarrollar un sistema de recomendación que ayude en la gestión de recursos humanos en lo que concierne a la selección de las mejores ofertas y demandas de movilidad. Caracterizando al sistema de recomendación como un sistema semántico el cual ofrecerá las recomendaciones basándose en las reglas y restricciones impuestas por el dominio. La aproximación propuesta se basa en seguir el enfoque de los sistemas de Matchmaking semánticos. Siguiendo este enfoque, por un lado, se ha empleado un razonador de lógica descriptiva que ofrece inferencias útiles en el cálculo de las recomendaciones y por otro lado, herramientas de procesamiento de lenguaje natural para dar soporte al proceso de recomendación. Finalmente para la integración del sistema de recomendación con AFRICA BUILD Portal se han empleado diversas tecnologías web. Los resultados del sistema basados en la comparación de recomendaciones creadas por el sistema y por usuarios reales han mostrado un funcionamiento y rendimiento aceptable. Empleando medidas de evaluación de sistemas de recuperación de información se ha obtenido una precisión media del sistema de un 52%, cifra satisfactoria tratándose de un sistema semántico. Pudiendo concluir que con la solución implementada se ha construido un sistema estable y modular posibilitando: por un lado, una fácil evolución que debería ir encaminada a lograr un rendimiento mayor, incrementando su precisión y por otro lado, dejando abiertas nuevas vías de crecimiento orientadas a la explotación del potencial de AFRICA BUILD Portal mediante la Web 3.0. ---ABSTRACT---The Human Resource Management through Internet is currently a latent problem shown in any employment website. This problem has also appeared in AFRICA BUILD Portal. AFRICA BUILD Portal is an emerging socio-professional network with the objective of creating virtual communities to foster the capacity for health research and education in African countries. One way to foster this capacity of research and education is through the mobility of students and researches between institutions, thus appearing the Human Resource Management problem. Therefore, this dissertation focuses on solving the Human Resource Management problem in the specific environment of AFRICA BUILD Portal. To solve this problem, the objective is to develop a recommender system which assists the management of Human Resources with respect to the selection of the best mobility supplies and demands. The recommender system is a semantic system which will provide the recommendations according to the domain rules and restrictions. The proposed approach is based on semantic matchmaking solutions. So, this approach on the one hand uses a Description Logics reasoning engine which provides useful inferences to the recommendation process and on the other hand uses Natural Language Processing techniques to support the recommendation process. Finally, Web technologies are used in order to integrate the recommendation system into AFRICA BUILD Portal. The results of evaluating the system are based on the comparison between recommendations created by the system and by real users. These results have shown an acceptable behavior and performance. The average precision of the system has been obtained by evaluation measures for information retrieval systems, so the average precision of the system is at 52% which may be considered as a satisfactory result taking into account that the system is a semantic system. To conclude, it could be stated that the implemented system is stable and modular. This fact on the one hand allows an easy evolution that should aim to achieve a higher performance by increasing its average precision and on the other hand keeps open new ways to increase the functionality of the system oriented to exploit the potential of AFRICA BUILD Portal through Web 3.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Durante los últimos años ha aumentado la presencia de personas pertenecientes al mundo de la política en la red debido a la proliferación de las redes sociales, siendo Twitter la que mayor repercusión mediática tiene en este ámbito. El estudio del comportamiento de los políticos en Twitter y de la acogida que tienen entre los ciudadanos proporciona información muy valiosa a la hora de analizar las campañas electorales. De esta forma, se puede estudiar la repercusión real que tienen sus mensajes en los resultados electorales, así como distinguir aquellos comportamientos que tienen una mayor aceptación por parte de la la ciudadaná. Gracias a los avances desarrollados en el campo de la minería de textos, se poseen las herramientas necesarias para analizar un gran volumen de textos y extraer de ellos información de utilidad. Este proyecto tiene como finalidad recopilar una muestra significativa de mensajes de Twitter pertenecientes a los candidatos de los principales partidos políticos que se presentan a las elecciones autonómicas de Madrid en 2015. Estos mensajes, junto con las respuestas de otros usuarios, se han analizado usando algoritmos de aprendizaje automático y aplicando las técnicas de minería de textos más oportunas. Los resultados obtenidos para cada político se han examinado en profundidad y se han presentado mediante tablas y gráficas para facilitar su comprensión.---ABSTRACT---During the past few years the presence on the Internet of people related with politics has increased, due to the proliferation of social networks. Among all existing social networks, Twitter is the one which has the greatest media impact in this field. Therefore, an analysis of the behaviour of politicians in this social network, along with the response from the citizens, gives us very valuable information when analysing electoral campaigns. This way it is possible to know their messages impact in the election results. Moreover, it can be inferred which behaviours have better acceptance among the citizenship. Thanks to the advances achieved in the text mining field, its tools can be used to analyse a great amount of texts and extract from them useful information. The present project aims to collect a significant sample of Twitter messages from the candidates of the principal political parties for the 2015 autonomic elections in Madrid. These messages, as well as the answers received by the other users, have been analysed using machine learning algorithms and applying the most suitable data mining techniques. The results obtained for each politician have been examined in depth and have been presented using tables and graphs to make its understanding easier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La tesis que se presenta tiene como propósito la construcción automática de ontologías a partir de textos, enmarcándose en el área denominada Ontology Learning. Esta disciplina tiene como objetivo automatizar la elaboración de modelos de dominio a partir de fuentes información estructurada o no estructurada, y tuvo su origen con el comienzo del milenio, a raíz del crecimiento exponencial del volumen de información accesible en Internet. Debido a que la mayoría de información se presenta en la web en forma de texto, el aprendizaje automático de ontologías se ha centrado en el análisis de este tipo de fuente, nutriéndose a lo largo de los años de técnicas muy diversas provenientes de áreas como la Recuperación de Información, Extracción de Información, Sumarización y, en general, de áreas relacionadas con el procesamiento del lenguaje natural. La principal contribución de esta tesis consiste en que, a diferencia de la mayoría de las técnicas actuales, el método que se propone no analiza la estructura sintáctica superficial del lenguaje, sino que estudia su nivel semántico profundo. Su objetivo, por tanto, es tratar de deducir el modelo del dominio a partir de la forma con la que se articulan los significados de las oraciones en lenguaje natural. Debido a que el nivel semántico profundo es independiente de la lengua, el método permitirá operar en escenarios multilingües, en los que es necesario combinar información proveniente de textos en diferentes idiomas. Para acceder a este nivel del lenguaje, el método utiliza el modelo de las interlinguas. Estos formalismos, provenientes del área de la traducción automática, permiten representar el significado de las oraciones de forma independiente de la lengua. Se utilizará en concreto UNL (Universal Networking Language), considerado como la única interlingua de propósito general que está normalizada. La aproximación utilizada en esta tesis supone la continuación de trabajos previos realizados tanto por su autor como por el equipo de investigación del que forma parte, en los que se estudió cómo utilizar el modelo de las interlinguas en las áreas de extracción y recuperación de información multilingüe. Básicamente, el procedimiento definido en el método trata de identificar, en la representación UNL de los textos, ciertas regularidades que permiten deducir las piezas de la ontología del dominio. Debido a que UNL es un formalismo basado en redes semánticas, estas regularidades se presentan en forma de grafos, generalizándose en estructuras denominadas patrones lingüísticos. Por otra parte, UNL aún conserva ciertos mecanismos de cohesión del discurso procedentes de los lenguajes naturales, como el fenómeno de la anáfora. Con el fin de aumentar la efectividad en la comprensión de las expresiones, el método provee, como otra contribución relevante, la definición de un algoritmo para la resolución de la anáfora pronominal circunscrita al modelo de la interlingua, limitada al caso de pronombres personales de tercera persona cuando su antecedente es un nombre propio. El método propuesto se sustenta en la definición de un marco formal, que ha debido elaborarse adaptando ciertas definiciones provenientes de la teoría de grafos e incorporando otras nuevas, con el objetivo de ubicar las nociones de expresión UNL, patrón lingüístico y las operaciones de encaje de patrones, que son la base de los procesos del método. Tanto el marco formal como todos los procesos que define el método se han implementado con el fin de realizar la experimentación, aplicándose sobre un artículo de la colección EOLSS “Encyclopedia of Life Support Systems” de la UNESCO. ABSTRACT The purpose of this thesis is the automatic construction of ontologies from texts. This thesis is set within the area of Ontology Learning. This discipline aims to automatize domain models from structured or unstructured information sources, and had its origin with the beginning of the millennium, as a result of the exponential growth in the volume of information accessible on the Internet. Since most information is presented on the web in the form of text, the automatic ontology learning is focused on the analysis of this type of source, nourished over the years by very different techniques from areas such as Information Retrieval, Information Extraction, Summarization and, in general, by areas related to natural language processing. The main contribution of this thesis consists of, in contrast with the majority of current techniques, the fact that the method proposed does not analyze the syntactic surface structure of the language, but explores his deep semantic level. Its objective, therefore, is trying to infer the domain model from the way the meanings of the sentences are articulated in natural language. Since the deep semantic level does not depend on the language, the method will allow to operate in multilingual scenarios, where it is necessary to combine information from texts in different languages. To access to this level of the language, the method uses the interlingua model. These formalisms, coming from the area of machine translation, allow to represent the meaning of the sentences independently of the language. In this particular case, UNL (Universal Networking Language) will be used, which considered to be the only interlingua of general purpose that is standardized. The approach used in this thesis corresponds to the continuation of previous works carried out both by the author of this thesis and by the research group of which he is part, in which it is studied how to use the interlingua model in the areas of multilingual information extraction and retrieval. Basically, the procedure defined in the method tries to identify certain regularities at the UNL representation of texts that allow the deduction of the parts of the ontology of the domain. Since UNL is a formalism based on semantic networks, these regularities are presented in the form of graphs, generalizing in structures called linguistic patterns. On the other hand, UNL still preserves certain mechanisms of discourse cohesion from natural languages, such as the phenomenon of the anaphora. In order to increase the effectiveness in the understanding of expressions, the method provides, as another significant contribution, the definition of an algorithm for the resolution of pronominal anaphora limited to the model of the interlingua, in the case of third person personal pronouns when its antecedent is a proper noun. The proposed method is based on the definition of a formal framework, adapting some definitions from Graph Theory and incorporating new ones, in order to locate the notions of UNL expression and linguistic pattern, as well as the operations of pattern matching, which are the basis of the method processes. Both the formal framework and all the processes that define the method have been implemented in order to carry out the experimentation, applying on an article of the "Encyclopedia of Life Support Systems" of the UNESCO-EOLSS collection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using an event-related functional MRI design, we explored the relative roles of dorsal and ventral prefrontal cortex (PFC) regions during specific components (Encoding, Delay, Response) of a working memory task under different memory-load conditions. In a group analysis, effects of increased memory load were observed only in dorsal PFC in the encoding period. Activity was lateralized to the right hemisphere in the high but not the low memory-load condition. Individual analyses revealed variability in activation patterns across subjects. Regression analyses indicated that one source of variability was subjects’ memory retrieval rate. It was observed that dorsal PFC plays a differentially greater role in information retrieval for slower subjects, possibly because of inefficient retrieval processes or a reduced quality of mnemonic representations. This study supports the idea that dorsal and ventral PFC play different roles in component processes of working memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Disruption of guanylyl cyclase-A (GC-A) results in mice displaying an elevated blood pressure, which is not altered by high or low dietary salt. However, atrial natriuretic peptide (ANP), a proposed ligand for GC-A, has been suggested as critical for the maintenance of normal blood pressure during high salt intake. In this report, we show that infusion of ANP results in substantial natriuresis and diuresis in wild-type mice but fails to cause significant changes in sodium excretion or urine output in GC-A-deficient mice. ANP, therefore, appears to signal through GC-A in the kidney. Other natriuretic/diuretic factors could be released from the heart. Therefore, acute volume expansion was used as a means to cause release of granules from the atrium of the heart. That granule release occurred was confirmed by measurements of plasma ANP concentrations, which were markedly elevated in both wild-type and GC-A-null mice. After volume expansion, urine output as well as urinary sodium and cyclic GMP excretion increased rapidly and markedly in wild-type mice, but the rapid increases were abolished in GC-A-deficient animals. These results strongly suggest that natriuretic/diuretic factors released from the heart function exclusively through GC-A.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the new features that IR-n system applies on the topic processing for CL-SR are described. This set of features are based on applying logic forms to topics with the aim of incrementing the weight of topic terms according to a set of syntactic rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a CL-SR system that employs two different techniques: the first one is based on NLP rules that consist on applying logic forms to the topic processing while the second one basically consists on applying the IR-n statistical search engine to the spoken document collection. The application of logic forms to the topics allows to increase the weight of topic terms according to a set of syntactic rules. Thus, the weights of the topic terms are used by IR-n system in the information retrieval process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El foco geográfico de un documento identifica el lugar o lugares en los que se centra el contenido del texto. En este trabajo se presenta una aproximación basada en corpus para la detección del foco geográfico en el texto. Frente a otras aproximaciones que se centran en el uso de información puramente geográfica para la detección del foco, nuestra propuesta emplea toda la información textual existente en los documentos del corpus de trabajo, partiendo de la hipótesis de que la aparición de determinados personajes, eventos, fechas e incluso términos comunes, pueden resultar fundamentales para esta tarea. Para validar nuestra hipótesis, se ha realizado un estudio sobre un corpus de noticias geolocalizadas que tuvieron lugar entre los años 2008 y 2011. Esta distribución temporal nos ha permitido, además, analizar la evolución del rendimiento del clasificador y de los términos más representativos de diferentes localidades a lo largo del tiempo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article analyzes the appropriateness of a text summarization system, COMPENDIUM, for generating abstracts of biomedical papers. Two approaches are suggested: an extractive (COMPENDIUM E), which only selects and extracts the most relevant sentences of the documents, and an abstractive-oriented one (COMPENDIUM E–A), thus facing also the challenge of abstractive summarization. This novel strategy combines extractive information, with some pieces of information of the article that have been previously compressed or fused. Specifically, in this article, we want to study: i) whether COMPENDIUM produces good summaries in the biomedical domain; ii) which summarization approach is more suitable; and iii) the opinion of real users towards automatic summaries. Therefore, two types of evaluation were performed: quantitative and qualitative, for evaluating both the information contained in the summaries, as well as the user satisfaction. Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out. This also confirms the suitability of our suggested approach for generating summaries following an abstractive-oriented paradigm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La gran cantidad de información disponible en Internet está dificultando cada vez más que los usuarios puedan digerir toda esa información, siendo actualmente casi impensable sin la ayuda de herramientas basadas en las Tecnologías del Lenguaje Humano (TLH), como pueden ser los recuperadores de información o resumidores automáticos. El interés de este proyecto emergente (y por tanto, su objetivo principal) viene motivado precisamente por la necesidad de definir y crear un marco tecnológico basado en TLH, capaz de procesar y anotar semánticamente la información, así como permitir la generación de información de forma automática, flexibilizando el tipo de información a presentar y adaptándola a las necesidades de los usuarios. En este artículo se proporciona una visión general de este proyecto, centrándonos en la arquitectura propuesta y el estado actual del mismo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This qualitative study focuses on what contributes to making a music information-seeking experience satisfying in the context of everyday life. Data were collected through in-depth interviews conducted with 15 younger adults (18 to 29 years old). The analysis revealed that satisfaction could depend on both hedonic (i.e., experiencing pleasure) and utilitarian outcomes. It was found that two types of utilitarian outcomes contributed to satisfaction: (1) the acquisition of music, and (2) the acquisition of information about music. Information about music was gathered to (1) enrich the listening experience, (2) increase one's music knowledge, and/or (3) optimize future acquisition. This study contributes to a better understanding of music information-seeking behavior in recreational contexts. It also has implications for music information retrieval systems design: results suggest that these systems should be engaging, include a wealth of extra-musical information, allow users to navigate among music items, and encourage serendipitous encountering of music.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper is to evaluate the efficacy of the application WebBootCaT to create specialised corpora automatically, investigating the translation of articles of association from Italian into English. The first section reflects on the relevant literature and proposes the utility of corpora for translators. The second section discusses the methodology employed, and the third section analyses the results obtained and comments on how language professionals could possibly exploit the application to its full. The fourth section provides a few concrete usage examples of the thus built corpora, to then conclude that WebBootCaT is a genuinely powerful tool that could be implemented by professional translators in order to save time and improve their translations in the long term.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the article relevance of system development for subject search using computational linguistics is considered. The basic principles of system functioning are defined. The principle of grammar development for information retrieval from the partially structured text in a natural language is considered. The ranging principle of results of information search is defined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This qualitative study focuses on what contributes to making a music information-seeking experience satisfying in the context of everyday life. Data were collected through in-depth interviews conducted with 15 younger adults (18 to 29 years old). The analysis revealed that satisfaction could depend on both hedonic (i.e., experiencing pleasure) and utilitarian outcomes. It was found that two types of utilitarian outcomes contributed to satisfaction: (1) the acquisition of music, and (2) the acquisition of information about music. Information about music was gathered to (1) enrich the listening experience, (2) increase one's music knowledge, and/or (3) optimize future acquisition. This study contributes to a better understanding of music information-seeking behavior in recreational contexts. It also has implications for music information retrieval systems design: results suggest that these systems should be engaging, include a wealth of extra-musical information, allow users to navigate among music items, and encourage serendipitous encountering of music.