19 resultados para Legislation as topic
em Aston University Research Archive
Resumo:
In Information Filtering (IF) a user may be interested in several topics in parallel. But IF systems have been built on representational models derived from Information Retrieval and Text Categorization, which assume independence between terms. The linearity of these models results in user profiles that can only represent one topic of interest. We present a methodology that takes into account term dependencies to construct a single profile representation for multiple topics, in the form of a hierarchical term network. We also introduce a series of non-linear functions for evaluating documents against the profile. Initial experiments produced positive results.
Resumo:
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, obtained by reversing the sequence of sentiment and topic generation in the modeling process, is also studied. Although JST is equivalent to Reverse-JST without a hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on data sets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the data sets despite using no labeled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion.
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
Topic management by non-native speakers (NNSs) during informal conversations has received comparatively little attention from researchers, and receives surprisingly little attention in second language learning and teaching. This article reports on one of the topic management strategies employed by international students during informal, social interactions with native-speaker peers, exploring the process of maintaining topic continuity following temporary suspensions of topics. The concept of side sequences is employed to illustrate the nature of different types of topic suspension, as well as the process of jointly negotiating a return to the topic. Extracts from the conversations show that such sequences were not exclusively occasioned by language difficulties, and that the non-native speaker participants were able to effect successful returns to the main topic of the conversations.
Resumo:
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.
Resumo:
Pharmacy originates from a background of medication compounding and supply. More recently, this role has developed away from an absolute focus on the supply of pharmaceuticals with, for example, the advent of pharmacist prescribing. Nevertheless, for a majority of the profession, medication supply remains a core activity. Regulation of the pharmacy profession is now the responsibility of the General Pharmaceutical Council, although up until 27 September 2010, this role fell to the Royal Pharmaceutical Society of Great Britain (RPSGB). Before this change, in one of the most high-profile legal cases involving a pharmacist in a professional capacity, R. v Lee, a pharmacist was prosecuted firstly for gross negligence manslaughter, later revised to offences under the Medicines Act 1968, for a single error relating to medication supply, and was given a suspended custodial sentence. Offences against sections 64 or 85 of the Medicines Act are absolute offences and there is no due diligence defence. Prosecution of a pharmacist for the supply of incorrect medication may seem a measured course of action to protect the public from the wrongful supply of potent pharmacotherapeutic agents; however, further analysis of Lee indicates that this approach may be counterproductive. An appeal of the original conviction in the Lee case has resulted in a clarification of the interpretation of section 85(5); however currently, prosecutions under section 64 are still a possibility. Owing to the seriousness of a criminal conviction under section 64, this continuation will potentially stifle the profession's ability to learn from dispensing errors. © The Author [2013]. Published by Oxford University Press; all rights reserved.
Resumo:
A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.
Resumo:
Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. © 2014 Association for Computational Linguistics.
Resumo:
Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
This article explores some of the strategies used by international students of English to manage topic shifts in casual conversations with English-speaking peers. It therefore covers aspects of discourse which have been comparatively under-researched, and where research has also tended to focus on the problems rather than the communicative achievements of non-native speakers. A detailed analysis of the conversations under discussion, which were recorded by the participants themselves, showed that they all flowed smoothly, and this was in large measure due to the ways in which topic shifts were managed. The paper will focus on a very distinct type of topic shift, namely that of topic transitions, which enable a smooth flow from one topic to another, but which do not explicitly signal that a shift is taking place. It will examine how the non-native speakers achieved coherence in the topic transitions which they initiated, which strategies or procedures they employed, and show how their initiations were effective in enabling the proposed topic to be understood, taken up and developed. It therefore adds to our understanding of the interactional achievements of international speakers in informal, social contexts. © 2013 Elsevier B.V.
Resumo:
Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.
Resumo:
Topic classification (TC) of short text messages offers an effective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolution). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the topics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detection (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets. Copyright 2013 ACM.