13 resultados para Sentiment de cohésion
em Universidad Politécnica de Madrid
Resumo:
In this paper we describe the specification of amodel for the semantically interoperable representation of language resources for sentiment analysis. The model integrates "lemon", an RDF-based model for the specification of ontology-lexica (Buitelaar et al. 2009), which is used increasinglyfor the representation of language resources asLinked Data, with Marl, an RDF-based model for the representation of sentiment annotations (West-erski et al., 2011; Sánchez-Rada et al., 2013)
Resumo:
This paper describes our participation at SemEval- 2014 sentiment analysis task, in both contextual and message polarity classification. Our idea was to com- pare two different techniques for sentiment analysis. First, a machine learning classifier specifically built for the task using the provided training corpus. On the other hand, a lexicon-based approach using natural language processing techniques, developed for a ge- neric sentiment analysis task with no adaptation to the provided training corpus. Results, though far from the best runs, prove that the generic model is more robust as it achieves a more balanced evaluation for message polarity along the different test sets.
Resumo:
This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.
Resumo:
This approach aims at aligning, unifying and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. A sentiment lexicon is a critical and essential resource for tagging subjective corpora on the web or elsewhere. In many situations, the multilingual property of the sentiment lexicon is important because the writer is using two languages alternately in the same text, message or post. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and the UnifiedMetrics procedure for CPU and GPU, respectively.
Resumo:
Sentiment analysis has recently gained popularity in the financial domain thanks to its capability to predict the stock market based on the wisdom of the crowds. Nevertheless, current sentiment indicators are still silos that cannot be combined to get better insight about the mood of different communities. In this article we propose a Linked Data approach for modelling sentiment and emotions about financial entities. We aim at integrating sentiment information from different communities or providers, and complements existing initiatives such as FIBO. The ap- proach has been validated in the semantic annotation of tweets of several stocks in the Spanish stock market, including its sentiment information.
Resumo:
We present a methodology for legacy language resource adaptation that generates domain-specific sentiment lexicons organized around domain entities described with lexical information and sentiment words described in the context of these entities. We explain the steps of the methodology and we give a working example of our initial results. The resulting lexicons are modelled as Linked Data resources by use of established formats for Linguistic Linked Data (lemon, NIF) and for linked sentiment expressions (Marl), thereby contributing and linking to existing Language Resources in the Linguistic Linked Open Data cloud.
Resumo:
Sentiment and Emotion Analysis strongly depend on quality language resources, especially sentiment dictionaries. These resources are usually scattered, heterogeneous and limited to specific domains of appli- cation by simple algorithms. The EUROSENTIMENT project addresses these issues by 1) developing a common language resource representation model for sentiment analysis, and APIs for sentiment analysis services based on established Linked Data formats (lemon, Marl, NIF and ONYX) 2) by creating a Language Resource Pool (a.k.a. LRP) that makes avail- able to the community existing scattered language resources and services for sentiment analysis in an interoperable way. In this paper we describe the available language resources and services in the LRP and some sam- ple applications that can be developed on top of the EUROSENTIMENT LRP.
Resumo:
In this paper we present a dataset componsed of domain-specific sentiment lexicons in six languages for two domains. We used existing collections of reviews from Trip Advisor, Amazon, the Stanford Network Analysis Project and the OpinRank Review Dataset. We use an RDF model based on the lemon and Marl formats to represent the lexicons. We describe the methodology that we applied to generate the domain-specific lexicons and we provide access information to our datasets.
Resumo:
This thesis is the result of a project whose objective has been to develop and deploy a dashboard for sentiment analysis of football in Twitter based on web components and D3.js. To do so, a visualisation server has been developed in order to present the data obtained from Twitter and analysed with Senpy. This visualisation server has been developed with Polymer web components and D3.js. Data mining has been done with a pipeline between Twitter, Senpy and ElasticSearch. Luigi have been used in this process because helps building complex pipelines of batch jobs, so it has analysed all tweets and stored them in ElasticSearch. To continue, D3.js has been used to create interactive widgets that make data easily accessible, this widgets will allow the user to interact with them and �filter the most interesting data for him. Polymer web components have been used to make this dashboard according to Google's material design and be able to show dynamic data in widgets. As a result, this project will allow an extensive analysis of the social network, pointing out the influence of players and teams and the emotions and sentiments that emerge in a lapse of time.
Resumo:
Las redes sociales en la actualidad son muy relevantes, no solo ocupan mucho tiempo en la vida diaria de las personas si no que también sirve a millones de empresas para publicitarse entre otras cosas. Al fenómeno de las redes sociales se le ha unido la faceta empresarial. La liberación de las APIs de algunas redes sociales ha permitido el desarrollo de aplicaciones de todo tipo y que puedan tener diferentes objetivos como por ejemplo este proyecto. Este proyecto comenzó desde el interés por Ericsson del estudio del API de Google+ y sugerencias para dar valores añadidos a las empresas de telecomunicaciones. También ha complementando la referencia disponible en Ericsson y de los otros dos proyectos de recuperación de información de las redes sociales, añadiendo una serie de opciones para el usuario en la aplicación. Para ello, se ha analizado y realizado un ejemplo, de lo que podemos obtener de las redes sociales, principalmente Twitter y Google+. Lo primero en lo que se ha basado el proyecto ha sido en realizar un estudio teórico sobre el inicio de las redes sociales, el desarrollo y el estado en el que se encuentran, analizando así las principales redes sociales que existen y aportando una visión general sobre todas ellas. También se ha realizado un estado de arte sobre una serie de webs que se dedican al uso de esa información disponible en Internet. Posteriormente, de todas las redes sociales con APIs disponibles se realizó la elección de Google+ porque es una red social nueva aun por explorar y mejorar. Y la elección de Twitter por la serie de opciones y datos que se puede obtener de ella. De ambas se han estudiado sus APIs, para posteriormente con la información obtenida, realizar una aplicación prototipo que recogiera una serie de funciones útiles a partir de los datos de sus redes sociales. Por último se ha realizado una simple interfaz en la cual se puede acceder a los datos de la cuenta como si se estuviera en Twitter o Google+, además con los datos de Twitter se puede realizar una búsqueda avanzada con alertas, un análisis de sentimiento, ver tus mayores retweets de los que te siguen y por último realizar un seguimiento comparando lo que se comenta sobre dos temas determinados. Con este proyecto se ha pretendido proporcionar una idea general de todo lo relacionado con las redes sociales, las aplicaciones disponibles para trabajar con ellas, la información del API de Twitter y Google+ y un concepto de lo que se puede obtener. Today social networks are very relevant, they not only take a long time in daily life of people but also serve millions of businesses to advertise and other things. The phenomenon of social networks has been joined the business side. The release of the APIs of some social networks has allowed the development of applications of all types and different objectives such as this project. This project started from an interest in the study of Ericsson about Google+ API and suggestions to add value to telecommunications companies. This project has complementing the reference available in Ericsson and the other two projects of information retrieval of social networks, adding a number of options for the user in the application. To do this, we have analyzed and made an example of what we can get it from social networks, mainly Twitter and Google+. The first thing that has done in the project was to make a theoretical study on the initiation of social networks, the development and the state in which they are found, and analyze the major social networks that exist. There has also been made a state of art on a number of websites that are dedicated to the use of this information available online. Subsequently, about all the social networks APIs available, Google+ was choice because it is a new social network even to explore and improve. And the choice of Twitter for the number of options and data that can be obtained from it. In both APIs have been studied, and later with the information obtained, make a prototype application to collect a number of useful features from data of social networks. Finally there has been a simple interface, in which you can access the account as if you were on Twitter or Google+. With Twitter data can perform an advanced search with alerts, sentiment analysis, see retweets of who follow you and make comparing between two particular topics. This project is intended to provide an overview of everything related to social networks, applications available to work with them, information about API of Google+ and Twitter, and a concept of what you can get.
Resumo:
There are several different standardised and widespread formats to represent emotions. However, there is no standard semantic model yet. This paper presents a new ontology, called Onyx, that aims to become such a standard while adding concepts from the latest Semantic Web models. In particular, the ontology focuses on the representation of Emotion Analysis results. But the model is abstract and inherits from previous standards and formats. It can thus be used as a reference representation of emotions in any future application or ontology. To prove this, we have translated resources from EmotionML representation to Onyx. We also present several ways in which developers could benefit from using this ontology instead of an ad-hoc presentation. Our ultimate goal is to foster the use of semantic technologies for emotion Analysis while following the Linked Data ideals.
Resumo:
El análisis de opiniones es un área en la cual múltiples disciplinas han otorgado diferentes enfoques para elaborar modelos que sean capaces de extraer la polaridad de los textos analizados. En función del dominio o categoría del texto analizado, donde ejemplos de categorías son Deportes o Banca, estos modelos deben ser modificados para obtener un análisis de opinión de calidad. En esta tesis se presenta un modelo que pretende elaborar un análisis de opiniones independiente de la categoría a analizar y un extenso estado del arte sobre análisis de opiniones. Se propone un enfoque cuantitativo que haría uso de un léxico polarizado semilla como único recurso cualitativo del modelo. El enfoque propuesto hace uso de un corpus anotado de textos por polaridad y categoría y el léxico polarizado semilla para producir un modelo capaz de elaborar un análisis de opinión de calidad en las distintas categorías analizadas y expandir el léxico polarizado semilla con términos que se adecúan a las categorías procesadas.---ABSTRACT---Sentiment analysis is an area in which multiple disciplines have given diferent approaches to make models that are able to extract the polarity of the analyzed texts. Depending on the domain or category of the analyzed text, where examples of categories are Sports or Banking, these models should be modified to obtain a good opinion analysis. This thesis presents a model that aims to develop a category independent opinion analysis model and a extensive sentiment analysis state of the art. A quantitative approach is proposed that will use a polarized lexicon as the only qualitative resource. The proposed approach uses an annotated corpus by polarity and category and a polarized lexicon seed to produce a model able to develop a good opinion analysis in the various categories analyzed and to expand the polarized lexicon seed with terms that fit the processed categories.
Resumo:
Esta tesis presenta un modelo, una metodología, una arquitectura, varios algoritmos y programas para crear un lexicón de sentimientos unificado (LSU) que cubre cuatro lenguas: inglés, español, portugués y chino. El objetivo principal es alinear, unificar, y expandir el conjunto de lexicones de sentimientos disponibles en Internet y los desarrollados a lo largo de esta investigación. Así, el principal problema a resolver es la tarea de unificar de forma automatizada los diferentes lexicones de sentimientos obtenidos por el crawler CSR, porque la unidad de medida para asignar la intensidad de los valores de la polaridad (de forma manual, semiautomática y automática) varía de acuerdo con las diferentes metodologías utilizadas para la construcción de cada lexicón. La representación codificada de la estructura de datos de los términos presenta también una variación en la estructura de lexicón a lexicón. Por lo que al unificar en un lexicón de sentimientos se hace posible la reutilización del conocimiento recopilado por los diferentes grupos de investigación y se incrementa, a la vez, el alcance, la calidad y la robustez de los lexicones. Nuestra metodología LSU calcula un valor unificado de la intensidad de la polaridad para cada entrada léxica que está presente en al menos dos de los lexicones de sentimientos que forman parte de este estudio. En contraste, las entradas léxicas que no son comunes en al menos dos de los lexicones conservan su valor original. El coeficiente de Pearson resultante permite medir la correlación existente entre las entradas léxicas asignándoles un rango de valores de uno a menos uno, donde uno indica que los valores de los términos están perfectamente correlacionados, cero indica que no existe correlación y menos uno significa que están inversamente correlacionados. Este procedimiento se lleva acabo con la función de MetricasUnificadas tanto en la CPU como en la GPU. Otro problema a resolver es el tiempo de procesamiento que se requiere para realizar la tarea de unificación de la intensidad de la polaridad y con ello alcanzar una cobertura mayor de lemas en los lexicones de sentimientos existentes. Asimismo, la metodología LSU utiliza el procesamiento paralelo para unificar los 155 802 términos. El algoritmo LSU procesa mediante cargas iguales el subconjunto de entradas léxicas en cada uno de los 1344 núcleos en la GPU. Los resultados de nuestro análisis arrojaron un total de 95 430 entradas léxicas donde 35 201 obtuvieron valores positivos, 22 029 negativos y 38 200 neutrales. Finalmente, el tiempo de ejecución fue de 2,506 segundos para el total de las entradas léxicas, lo que permitió reducir el procesamiento de cómputo hasta en una tercera parte con respecto al algoritmo secuencial. De estos resultados se concluye que al lograr un lexicón de sentimientos unificado que permite homogeneizar la intensidad de la polaridad de las unidades léxicas (con valores positivos, negativos y neutrales) deriva no sólo en el análisis semántico del corpus basado en los términos con una mayor carga de polaridad, o del resumen de las valoraciones o las tendencias de neuromarketing, sino también en aplicaciones como el etiquetado subjetivo de sitios web o de portales sintácticos y semánticos, por mencionar algunas. ABSTRACT This thesis presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral P, N, Z depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and - 1 , where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155,802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95,430 lexical entries, out of which there are 35,201 considered to be positive, 22,029 negative, and 38,200 neutral. Finally, the runtime was 2.505 seconds for 95,430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times with respect to the sequential implementation. A key contribution of this work is that we preserve the use of a unified sentiment lexicon for all tasks. Such lexicon is used to define resources and resource-related properties that can be verified based on the results of the analysis and is powerful, general and extensible enough to express a large class of interesting properties. Some applications of this work include merging, aligning, pruning and extending the current sentiment lexicons.