847 resultados para Sentiment Analysis, Opinion Mining, Twitter


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure. © 2014 Springer International Publishing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tweet sentiment analysis is an important research topic. An accurate and timely analysis report could give good indications on the general public's opinions. After reviewing the current research, we identify the need of effective and efficient methods to conduct tweet sentiment analysis. This paper aims to achieve a high level of performance for classifying tweets with sentiment information. We propose a feasible solution which improves the level of accuracy with good time efficiency. Specifically, we develop a novel feature combination scheme which utilizes the sentiment lexicons and the extracted tweet unigrams of high information gain. We evaluate the performance of six popular machine learning classifiers among which the Naive Bayes Multinomial (NBM) classifier achieves the accuracy rate of 84.60% and takes a few minutes to complete classifying thousands of tweets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Product rating systems are very popular on the web, and users are increasingly depending on the overall product ratings provided by websites to make purchase decisions or to compare various products. Currently most of these systems directly depend on users’ ratings and aggregate the ratings using simple aggregating methods such as mean or median [1]. In fact, many websites also allow users to express their opinions in the form of textual product reviews. In this paper, we propose a new product reputation model that uses opinion mining techniques in order to extract sentiments about product’s features, and then provide a method to generate a more realistic reputation value for every feature of the product and the product itself. We considered the strength of the opinion rather than its orientation only. We do not treat all product features equally when we calculate the overall product reputation, as some features are more important to customers than others, and consequently have more impact on customers buying decisions. Our method provides helpful details about the product features for customers rather than only representing reputation as a number only.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social media channels, such as Facebook or Twitter, allow for people to express their views and opinions about any public topics. Public sentiment related to future events, such as demonstrations or parades, indicate public attitude and therefore may be applied while trying to estimate the level of disruption and disorder during such events. Consequently, sentiment analysis of social media content may be of interest for different organisations, especially in security and law enforcement sectors. This paper presents a new lexicon-based sentiment analysis algorithm that has been designed with the main focus on real time Twitter content analysis. The algorithm consists of two key components, namely sentiment normalisation and evidence-based combination function, which have been used in order to estimate the intensity of the sentiment rather than positive/negative label and to support the mixed sentiment classification process. Finally, we illustrate a case study examining the relation between negative sentiment of twitter posts related to English Defence League and the level of disorder during the organisation’s related events.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'informatica e le sue tecnologie nella società moderna si riassumono spesso in un assioma fuorviante: essa, infatti, è comunemente legata al concetto che ciò che le tecnologie ci offrono può essere accessibile da tutti e sfruttato, all'interno della propria quotidianità, in modi più o meno semplici. Anche se quello appena descritto è un obiettivo fondamentale del mondo high-tech, occorre chiarire subito una questione: l'informatica non è semplicemente tutto ciò che le tecnologie ci offrono, perchè questo pensiero sommario fa presagire ad un'informatica "generalizzante"; l'informatica invece si divide tra molteplici ambiti, toccando diversi mondi inter-disciplinari. L'importanza di queste tecnologie nella società moderna deve spingerci a porre domande, riflessioni sul perchè l'informatica, in tutte le sue sfaccettature, negli ultimi decenni, ha portato una vera e propria rivoluzione nelle nostre vite, nelle nostre abitudini, e non di meno importanza, nel nostro contesto lavorativo e aziendale, e non ha alcuna intenzione (per fortuna) di fermare le proprie possibilità di sviluppo. In questo trattato ci occuperemo di definire una particolare tecnica moderna relativa a una parte di quel mondo complesso che viene definito come "Intelligenza Artificiale". L'intelligenza Artificiale (IA) è una scienza che si è sviluppata proprio con il progresso tecnologico e dei suoi potenti strumenti, che non sono solo informatici, ma soprattutto teorico-matematici (probabilistici) e anche inerenti l'ambito Elettronico-TLC (basti pensare alla Robotica): ecco l'interdisciplinarità. Concetto che è fondamentale per poi affrontare il nocciolo del percorso presentato nel secondo capitolo del documento proposto: i due approcci possibili, semantico e probabilistico, verso l'elaborazione del linguaggio naturale(NLP), branca fondamentale di IA. Per quanto darò un buono spazio nella tesi a come le tecniche di NLP semantiche e statistiche si siano sviluppate nel tempo, verrà prestata attenzione soprattutto ai concetti fondamentali di questi ambiti, perché, come già detto sopra, anche se è fondamentale farsi delle basi e conoscere l'evoluzione di queste tecnologie nel tempo, l'obiettivo è quello a un certo punto di staccarsi e studiare il livello tecnologico moderno inerenti a questo mondo, con uno sguardo anche al domani: in questo caso, la Sentiment Analysis (capitolo 3). Sentiment Analysis (SA) è una tecnica di NLP che si sta definendo proprio ai giorni nostri, tecnica che si è sviluppata soprattutto in relazione all'esplosione del fenomeno Social Network, che viviamo e "tocchiamo" costantemente. L'approfondimento centrale della tesi verterà sulla presentazione di alcuni esempi moderni e modelli di SA che riguardano entrambi gli approcci (statistico e semantico), con particolare attenzione a modelli di SA che sono stati proposti per Twitter in questi ultimi anni, valutando quali sono gli scenari che propone questa tecnica moderna, e a quali conseguenze contestuali (e non) potrebbe portare questa particolare tecnica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dealing with the ever-growing information overload in the Internet, Recommender Systems are widely used online to suggest potential customers item they may like or find useful. Collaborative Filtering is the most popular techniques for Recommender Systems which collects opinions from customers in the form of ratings on items, services or service providers. In addition to the customer rating about a service provider, there is also a good number of online customer feedback information available over the Internet as customer reviews, comments, newsgroups post, discussion forums or blogs which is collectively called user generated contents. This information can be used to generate the public reputation of the service providers’. To do this, data mining techniques, specially recently emerged opinion mining could be a useful tool. In this paper we present a state of the art review of Opinion Mining from online customer feedback. We critically evaluate the existing work and expose cutting edge area of interest in opinion mining. We also classify the approaches taken by different researchers into several categories and sub-categories. Each of those steps is analyzed with their strength and limitations in this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Product reviews are the foremost source of information for customers and manufacturers to help them make appropriate purchasing and production decisions. Natural language data is typically very sparse; the most common words are those that do not carry a lot of semantic content, and occurrences of any particular content-bearing word are rare, while co-occurrences of these words are rarer. Mining product aspects, along with corresponding opinions, is essential for Aspect-Based Opinion Mining (ABOM) as a result of the e-commerce revolution. Therefore, the need for automatic mining of reviews has reached a peak. In this work, we deal with ABOM as sequence labelling problem and propose a supervised extraction method to identify product aspects and corresponding opinions. We use Conditional Random Fields (CRFs) to solve the extraction problem and propose a feature function to enhance accuracy. The proposed method is evaluated using two different datasets. We also evaluate the effectiveness of feature function and the optimisation through multiple experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the results of exploratory experiments using lexical valence extracted from brain using electroencephalography (EEG) for sentiment analysis. We selected 78 English words (36 for training and 42 for testing), presented as stimuli to 3 English native speakers. EEG signals were recorded from the subjects while they performed a mental imaging task for each word stimulus. Wavelet decomposition was employed to extract EEG features from the time-frequency domain. The extracted features were used as inputs to a sparse multinomial logistic regression (SMLR) classifier for valence classification, after univariate ANOVA feature selection. After mapping EEG signals to sentiment valences, we exploited the lexical polarity extracted from brain data for the prediction of the valence of 12 sentences taken from the SemEval-2007 shared task, and compared it against existing lexical resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The autism spectrum disorder (ASD) is increasingly being recognized as a major public health issue which affects approximately 0.5-0.6% of the population. Promoting the general awareness of the disorder, increasing the engagement with the affected individuals and their carers, and understanding the success of penetration of the current clinical recommendations in the target communities, is crucial in driving research as well as policy. The aim of the present work is to investigate if Twitter, as a highly popular platform for information exchange, can be used as a data-mining source which could aid in the aforementioned challenges. Specifically, using a large data set of harvested tweets, we present a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.