34 resultados para Topic segmentation
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
Topic management by non-native speakers (NNSs) during informal conversations has received comparatively little attention from researchers, and receives surprisingly little attention in second language learning and teaching. This article reports on one of the topic management strategies employed by international students during informal, social interactions with native-speaker peers, exploring the process of maintaining topic continuity following temporary suspensions of topics. The concept of side sequences is employed to illustrate the nature of different types of topic suspension, as well as the process of jointly negotiating a return to the topic. Extracts from the conversations show that such sequences were not exclusively occasioned by language difficulties, and that the non-native speaker participants were able to effect successful returns to the main topic of the conversations.
Resumo:
This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging. © 2011 IEEE.
Resumo:
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.
Resumo:
Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. © 2014 Association for Computational Linguistics.
Resumo:
Measurement of lung ventilation is one of the most reliable techniques in diagnosing pulmonary diseases. The time-consuming and bias-prone traditional methods using hyperpolarized H 3He and 1H magnetic resonance imageries have recently been improved by an automated technique based on 'multiple active contour evolution'. This method involves a simultaneous evolution of multiple initial conditions, called 'snakes', eventually leading to their 'merging' and is entirely independent of the shapes and sizes of snakes or other parametric details. The objective of this paper is to show, through a theoretical analysis, that the functional dynamics of merging as depicted in the active contour method has a direct analogue in statistical physics and this explains its 'universality'. We show that the multiple active contour method has an universal scaling behaviour akin to that of classical nucleation in two spatial dimensions. We prove our point by comparing the numerically evaluated exponents with an equivalent thermodynamic model. © IOP Publishing Ltd and Deutsche Physikalische Gesellschaft.
Resumo:
Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
This article explores some of the strategies used by international students of English to manage topic shifts in casual conversations with English-speaking peers. It therefore covers aspects of discourse which have been comparatively under-researched, and where research has also tended to focus on the problems rather than the communicative achievements of non-native speakers. A detailed analysis of the conversations under discussion, which were recorded by the participants themselves, showed that they all flowed smoothly, and this was in large measure due to the ways in which topic shifts were managed. The paper will focus on a very distinct type of topic shift, namely that of topic transitions, which enable a smooth flow from one topic to another, but which do not explicitly signal that a shift is taking place. It will examine how the non-native speakers achieved coherence in the topic transitions which they initiated, which strategies or procedures they employed, and show how their initiations were effective in enabling the proposed topic to be understood, taken up and developed. It therefore adds to our understanding of the interactional achievements of international speakers in informal, social contexts. © 2013 Elsevier B.V.
Resumo:
Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.
Resumo:
The purpose of this study is threefold: (1) to identify the underlying benefits sought by international visitors to Macau, China, which has emerged as a popular gambling destination in Asia; (2) to segment tourists visiting Macau by employing a cluster analysis based on the benefits sought; and (3) to examine any salient differences between the segment groups with regard to their behavioral characteristics, socio-economic characteristics, and demographic profiles. A convenience sample was used to collect data in the Macau International Airport, in the Macau Ferry Terminal, and at the border gate with Mainland China. A total 1,513 useful surveys were retained for data analysis. Cluster analysis discloses four distinct clusters: "convention and business seekers," "family and vacation seekers," "gambling and shopping seekers," and "multi-purpose seekers." Based on the results of our findings, several managerial implications are discussed. © Taylor & Francis Group, LLC.
Resumo:
This thesis is concerned with understanding how Emergency Management Agencies (EMAs) influence public preparedness for mass evacuation across seven countries. Due to the lack of cross-national research (Tierney et al., 2001), there is a lack of knowledge on EMAs perspectives and approaches to the governance of public preparedness. This thesis seeks to address this gap through cross-national research that explores and contributes towards understanding the governance of public preparedness. The research draws upon the risk communication (Wood et al., 2011; Tierney et al., 2001) social marketing (Marshall et al., 2007; Kotler and Lee, 2008; Ramaprasad, 2005), risk governance (Walker et al., 2010, 2013; Kuhlicke et al., 2011; IRGC, 2005, 2007; Renn et al., 2011; Klinke and Renn, 2012), risk society (Beck, 1992, 1999, 2002) and governmentality (Foucault, 1978, 2003, 2009) literature to explain this governance and how EMAs responsibilize the public for their preparedness. EMAs from seven countries (Belgium, Denmark, Germany, Iceland, Japan, Sweden, the United Kingdom) explain how they prepare their public for mass evacuation in response to different types of risk. A cross-national (Hantrais, 1999) interpretive research approach, using qualitative methods including semi-structured interviews, documents and observation, was used to collect data. The data analysis process (Miles and Huberman, 1999) identified how the concepts of risk, knowledge and responsibility are critical for theorising how EMAs influence public preparedness for mass evacuation. The key findings grounded in these concepts include: - Theoretically, risk is multi-functional in the governance of public preparedness. It regulates behaviour, enables surveillance and acts as a technique of exclusion. - EMAs knowledge and how this influenced their assessment of risk, together with how they share the responsibility for public preparedness across institutions and the public, are key to the governance of public preparedness for mass evacuation. This resulted in a form of public segmentation common to all countries, whereby the public were prepared unequally. - EMAs use their prior knowledge and assessments of risk to target public preparedness in response to particular known hazards. However, this strategy places the non-targeted public at greater risk in relation to unknown hazards, such as a man-made disaster. - A cross-national conceptual framework of four distinctive governance practices (exclusionary, informing, involving and influencing) are utilised to influence public preparedness. - The uncertainty associated with particular types of risk limits the application of social marketing as a strategy for influencing the public to take responsibility and can potentially increase the risk to the public.