854 resultados para opinion mining
Resumo:
In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.
Resumo:
Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.
Resumo:
Negli ultimi anni Internet ha cambiato le modalità di creazione e distribuzione delle informazioni turistiche. Un ruolo fondamentale viene ricoperto dalle piattaforme di social media, tecnologie che permettono ai consumatori di condividere le proprie esperienze ed opinioni. Diventa necessario, quindi, comprendere i cambiamenti in queste tecnologie e nel comportamento dei viaggiatori per poter applicare strategie di marketing di successo. In questo studio, utilizzando Opinion Finder, un software spesso impiegato nel campo dell'opinion mining, si esamineranno da un punto di vista qualitativo i post e commenti estratti da alcuni profili degli enti di promozione turistica nazionale in Europa, dividendo l'analisi per fattori che possono influenzare il sentimento degli utenti. Attraverso i risultati ottenuti, si può dimostrare che l'analisi delle opinioni e del sentimento si presenta come un ottimo strumento per evidenziare possibili fenomeni utili per la pianificazione di strategie di marketing per gli enti. Studi futuri potrebbero migliorare la valutazione di questi dati attraverso la creazione di un corpus di apprendimento per il software che contenga testi relativi al mondo del turismo e permettendo ad Opinion Finder di incrementare la validità della classificazione del sentimento, contestualizzando le espressioni in maniera corretta.
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
Comunicación presentada en las IV Jornadas TIMM, Torres (Jaén), 7-8 abril 2011.
Resumo:
The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.
Resumo:
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Question Answering systems are employed to retrieve the answers to these queries. The first one is generic, while the second is specific for emotions. We comparatively evaluate and analyze the systems’ results, concluding that opinion Question Answering requires the use of specific resources and methods.
Resumo:
The huge amount of data available on the Web needs to be organized in order to be accessible to users in real time. This paper presents a method for summarizing subjective texts based on the strength of the opinion expressed in them. We used a corpus of blog posts and their corresponding comments (blog threads) in English, structured around five topics and we divided them according to their polarity and subsequently summarized. Despite the difficulties of real Web data, the results obtained are encouraging; an average of 79% of the summaries is considered to be comprehensible. Our work allows the user to obtain a summary of the most relevant opinions contained in the blog. This allows them to save time and be able to look for information easily, allowing more effective searches on the Web.
Resumo:
This paper presents a preliminary study in which Machine Learning experiments applied to Opinion Mining in blogs have been carried out. We created and annotated a blog corpus in Spanish using EmotiBlog. We evaluated the utility of the features labelled firstly carrying out experiments with combinations of them and secondly using the feature selection techniques, we also deal with several problems, such as the noisy character of the input texts, the small size of the training set, the granularity of the annotation scheme and the language object of our study, Spanish, with less resource than English. We obtained promising results considering that it is a preliminary study.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.
Resumo:
Automatic creation of polarity lexicons is a crucial issue to be solved in order to reduce time andefforts in the first steps of Sentiment Analysis. In this paper we present a methodology based onlinguistic cues that allows us to automatically discover, extract and label subjective adjectivesthat should be collected in a domain-based polarity lexicon. For this purpose, we designed abootstrapping algorithm that, from a small set of seed polar adjectives, is capable to iterativelyidentify, extract and annotate positive and negative adjectives. Additionally, the methodautomatically creates lists of highly subjective elements that change their prior polarity evenwithin the same domain. The algorithm proposed reached a precision of 97.5% for positiveadjectives and 71.4% for negative ones in the semantic orientation identification task.
Resumo:
Progettazione di un sistema di Social Intelligence e Sentiment Analysis per un'azienda del settore consumer goods