17 resultados para language production, lexical retrieval, semantic interference
em Universidad de Alicante
Resumo:
In this paper we explore the use of semantic classes in an existing information retrieval system in order to improve its results. Thus, we use two different ontologies of semantic classes (WordNet domain and Basic Level Concepts) in order to re-rank the retrieved documents and obtain better recall and precision. Finally, we implement a new method for weighting the expanded terms taking into account the weights of the original query terms and their relations in WordNet with respect to the new ones (which have demonstrated to improve the results). The evaluation of these approaches was carried out in the CLEF Robust-WSD Task, obtaining an improvement of 1.8% in GMAP for the semantic classes approach and 10% in MAP employing the WordNet term weighting approach.
Resumo:
One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.
Resumo:
El proyecto ATTOS centra su actividad en el estudio y desarrollo de técnicas de análisis de opiniones, enfocado a proporcionar toda la información necesaria para que una empresa o una institución pueda tomar decisiones estratégicas en función a la imagen que la sociedad tiene sobre esa empresa, producto o servicio. El objetivo último del proyecto es la interpretación automática de estas opiniones, posibilitando así su posterior explotación. Para ello se estudian parámetros tales como la intensidad de la opinión, ubicación geográfica y perfil de usuario, entre otros factores, para facilitar la toma de decisiones. El objetivo general del proyecto se centra en el estudio, desarrollo y experimentación de técnicas, recursos y sistemas basados en Tecnologías del Lenguaje Humano (TLH), para conformar una plataforma de monitorización de la Web 2.0 que genere información sobre tendencias de opinión relacionadas con un tema.
Resumo:
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.
Resumo:
In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on information access (for example, Web-search systems like Google or Altavista). However, these applications have problems when they must access to cross-language information, or when they need to show information in a language different from the one of the query. This paper explores the use of syntactic-sematic patterns as a method to access to multilingual information, and revise, in the case of Information Retrieval, where it is possible and useful to employ patterns when it comes to the multilingual and interactive aspects. On the one hand, the multilingual aspects that are going to be studied are the ones related to the access to documents in different languages from the one of the query, as well as the automatic translation of the document, i.e. a machine translation system based on patterns. On the other hand, this paper is going to go deep into the interactive aspects related to the reformulation of a query based on the syntactic-semantic pattern of the request.
Resumo:
Les investigacions recents sobre la transferència i l'adquisició d’una L3 han indagat sobre les interferències lingüístiques quan hi ha més d'una font de transferència. Els participants en aquest estudi de cas van ser parlants d’espanyol i d’anglès, que apreninen una tercera llengua, el català, tipològicament més pròxima a l’espanyol. Aquest estudi va investigar la producció dels bilingües del català fosc /ɫ/, un segment que no és present en espanyol ja que tots els laterals es produeixen com una clara /l/, i que, tanmateix, es realitza en anglès en posició final després de vocal. Contràriament al model que planteja la proximitat d'idiomes prèviament adquirids com un dels factors determinants per a la transferència de competències, l’estudi va mostrar que la proximitat tipològica a un dels L1 no és determinista per a la transferència en el nivell fonològic en l’aprenentatge d’una L3, ja que els participants produeixen laterals catalanes similars a /ɫ/. En aquest estudi de cas es constata, d’acord amb el Model de Millora Acumulativa, com es transfereix aquest segment fonològic des de l’anglés al català.
Resumo:
In this paper we present the enrichment of the Integration of Semantic Resources based in WordNet (ISR-WN Enriched). This new proposal improves the previous one where several semantic resources such as SUMO, WordNet Domains and WordNet Affects were related, adding other semantic resources such as Semantic Classes and SentiWordNet. Firstly, the paper describes the architecture of this proposal explaining the particularities of each integrated resource. After that, we analyze some problems related to the mappings of different versions and how we solve them. Moreover, we show the advantages that this kind of tool can provide to different applications of Natural Language Processing. Related to that question, we can demonstrate that the integration of semantic resources allows acquiring a multidimensional vision in the analysis of natural language.
Resumo:
In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.
Resumo:
This paper describes the first participation of IR-n system at Spoken Document Retrieval, focusing on the experiments we made before participation and showing the results we obtained. IR-n system is an Information Retrieval system based on passages and the recognition of sentences to define them. So, the main goal of this experiment is to adapt IR-n system to the spoken document structure by means of the utterance splitter and the overlapping passage technique allowing to match utterances and sentences.
Resumo:
In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.
Resumo:
The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.
Resumo:
Este trabajo presenta el uso de una ontología en el dominio financiero para la expansión de consultas con el fin de mejorar los resultados de un sistema de recuperación de información (RI) financiera. Este sistema está compuesto por una ontología y un índice de Lucene que permite recuperación de conceptos identificados mediante procesamiento de lenguaje natural. Se ha llevado a cabo una evaluación con un conjunto limitado de consultas y los resultados indican que la ambigüedad sigue siendo un problema al expandir la consulta. En ocasiones, la elección de las entidades adecuadas a la hora de expandir las consultas (filtrando por sector, empresa, etc.) permite resolver esa ambigüedad.
Resumo:
Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.
Resumo:
One of the most important factors of recognition, belonging and identification in scientific communities is their specialized language: doctors, mathematicians and anthropologists feel they are part of a group with which they can interact because they share a common “language”. While ideology is present in all academic registers, it is in human sciences where its presence (or absence) leads to more visible linguistic phenomena. An interesting example is that of lesbian studies: as non-heterosexual members of society have become less stigmatized, lesbian studies have developed a language of their own. In our paper, we shall explore the mechanisms used in the creation of specific vocabulary in this academic area, paying special attention to the refashioning or deconstruction of meaning of established terms as a result of changes in social perception or the challenging of pre-determined meanings.
Resumo:
La gran cantidad de información disponible en Internet está dificultando cada vez más que los usuarios puedan digerir toda esa información, siendo actualmente casi impensable sin la ayuda de herramientas basadas en las Tecnologías del Lenguaje Humano (TLH), como pueden ser los recuperadores de información o resumidores automáticos. El interés de este proyecto emergente (y por tanto, su objetivo principal) viene motivado precisamente por la necesidad de definir y crear un marco tecnológico basado en TLH, capaz de procesar y anotar semánticamente la información, así como permitir la generación de información de forma automática, flexibilizando el tipo de información a presentar y adaptándola a las necesidades de los usuarios. En este artículo se proporciona una visión general de este proyecto, centrándonos en la arquitectura propuesta y el estado actual del mismo.