154 resultados para Procesamiento en lenguaje natural
Resumo:
En este artículo se presenta un método para recomendar artículos científicos teniendo en cuenta su grado de generalidad o especificidad. Este enfoque se basa en la idea de que personas menos expertas en un tema preferirían leer artículos más generales para introducirse en el mismo, mientras que personas más expertas preferirían artículos más específicos. Frente a otras técnicas de recomendación que se centran en el análisis de perfiles de usuario, nuestra propuesta se basa puramente en el análisis del contenido. Presentamos dos aproximaciones para recomendar artículos basados en el modelado de tópicos (Topic Modelling). El primero de ellos se basa en la divergencia de tópicos que se dan en los documentos, mientras que el segundo se basa en la similitud que se dan entre estos tópicos. Con ambas medidas se consiguió determinar lo general o específico de un artículo para su recomendación, superando en ambos casos a un sistema de recuperación de información tradicional.
Resumo:
Today's generation of Internet devices has changed how users are interacting with media, from passive and unidirectional users to proactive and interactive. Users can use these devices to comment or rate a TV show and search for related information regarding characters, facts or personalities. This phenomenon is known as second screen. This paper describes SAM, an EU-funded research project that focuses on developing an advanced digital media delivery platform based on second screen interaction and content syndication within a social media context, providing open and standardised ways of characterising, discovering and syndicating digital assets. This work provides an overview of the project and its main objectives, focusing on the NLP challenges to be faced and the technologies developed so far.
Resumo:
ElectionMap es una aplicación web que realiza un seguimiento a los comentarios publicados en Twitter en relación a entidades que refieren a partidos políticos. Las opiniones de los usuarios sobre estas entidades son clasificadas según su valoración y posteriormente representadas en un mapa geográfico para conocer la aceptación social sobre agrupaciones políticas en las distintas regiones de la geografía española.
Resumo:
En los países democráticos, conocer la intención de voto de los ciudadanos y las valoraciones de los principales partidos y líderes políticos es de gran interés tanto para los propios partidos como para los medios de comunicación y el público en general. Para ello se han utilizado tradicionalmente costosas encuestas personales. El auge de las redes sociales, principalmente Twitter, permite pensar en ellas como una alternativa barata a las encuestas. En este trabajo, revisamos la bibliografía científica más relevante en este ámbito, poniendo especial énfasis en el caso español.
Resumo:
Actualmente existe una gran cantidad de empresas ofreciendo servicios para el análisis de contenido y minería de datos de las redes sociales con el objetivo de realizar análisis de opiniones y gestión de la reputación. Un alto porcentaje de pequeñas y medianas empresas (pymes) ofrecen soluciones específicas a un sector o dominio industrial. Sin embargo, la adquisición de la necesaria tecnología básica para ofrecer tales servicios es demasiado compleja y constituye un sobrecoste demasiado alto para sus limitados recursos. El objetivo del proyecto europeo OpeNER es la reutilización y desarrollo de componentes y recursos para el procesamiento lingüístico que proporcione la tecnología necesaria para su uso industrial y/o académico.
Resumo:
This paper outlines the approach adopted by the PLSI research group at University of Alicante in the PASCAL-2006 second Recognising Textual Entailment challenge. Our system is composed of several components. On the one hand, the first component performs the derivation of the logic forms of the text/hypothesis pairs and, on the other hand, the second component provides us with a similarity score given by the semantic relations between the derived logic forms. In order to obtain this score we apply several measures of similitude and relatedness based on the structure and content of WordNet.
Resumo:
The present is marked by the availability of large volumes of heterogeneous data, whose management is extremely complex. While the treatment of factual data has been widely studied, the processing of subjective information still poses important challenges. This is especially true in tasks that combine Opinion Analysis with other challenges, such as the ones related to Question Answering. In this paper, we describe the different approaches we employed in the NTCIR 8 MOAT monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual English-Chinese tasks, implemented in our OpAL system. The results obtained when using different settings of the system, as well as the error analysis performed after the competition, offered us some clear insights on the best combination of techniques, that balance between precision and recall. Contrary to our initial intuitions, we have also seen that the inclusion of specialized Natural Language Processing tools dealing with Temporality or Anaphora Resolution lowers the system performance, while the use of topic detection techniques using faceted search with Wikipedia and Latent Semantic Analysis leads to satisfactory system performance, both for the monolingual setting, as well as in a multilingual one.
Resumo:
This paper shows a system about the recognition of temporal expressions in Spanish and the resolution of their temporal reference. For the identification and recognition of temporal expressions we have based on a temporal expression grammar and for the resolution on an inference engine, where we have the information necessary to do the date operation based on the recognized expressions. For further information treatment, the output is proposed by means of XML tags in order to add standard information of the resolution obtained. Different kinds of annotation of temporal expressions are explained in another articles [WILSON2001][KATZ2001]. In the evaluation of our proposal we have obtained successful results.
Resumo:
The Internet boom in recent years has increased the interest in the field of plagiarism detection. A lot of documents are published on the Net everyday and anyone can access and plagiarize them. Of course, checking all cases of plagiarism manually is an unfeasible task. Therefore, it is necessary to create new systems that are able to automatically detect cases of plagiarism produced. In this paper, we introduce a new hybrid system for plagiarism detection which combines the advantages of the two main plagiarism detection techniques. This system consists of two analysis phases: the first phase uses an intrinsic detection technique which dismisses much of the text, and the second phase employs an external detection technique to identify the plagiarized text sections. With this combination we achieve a detection system which obtains accurate results and is also faster thanks to the prefiltering of the text.
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Question Answering systems are employed to retrieve the answers to these queries. The first one is generic, while the second is specific for emotions. We comparatively evaluate and analyze the systems’ results, concluding that opinion Question Answering requires the use of specific resources and methods.
Resumo:
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
Resumo:
In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.
Resumo:
The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.
Resumo:
In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on information access (for example, Web-search systems like Google or Altavista). However, these applications have problems when they must access to cross-language information, or when they need to show information in a language different from the one of the query. This paper explores the use of syntactic-sematic patterns as a method to access to multilingual information, and revise, in the case of Information Retrieval, where it is possible and useful to employ patterns when it comes to the multilingual and interactive aspects. On the one hand, the multilingual aspects that are going to be studied are the ones related to the access to documents in different languages from the one of the query, as well as the automatic translation of the document, i.e. a machine translation system based on patterns. On the other hand, this paper is going to go deep into the interactive aspects related to the reformulation of a query based on the syntactic-semantic pattern of the request.