933 resultados para Databases as Topic


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, obtained by reversing the sequence of sentiment and topic generation in the modeling process, is also studied. Although JST is equivalent to Reverse-JST without a hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on data sets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the data sets despite using no labeled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Topic management by non-native speakers (NNSs) during informal conversations has received comparatively little attention from researchers, and receives surprisingly little attention in second language learning and teaching. This article reports on one of the topic management strategies employed by international students during informal, social interactions with native-speaker peers, exploring the process of maintaining topic continuity following temporary suspensions of topics. The concept of side sequences is employed to illustrate the nature of different types of topic suspension, as well as the process of jointly negotiating a return to the topic. Extracts from the conversations show that such sequences were not exclusively occasioned by language difficulties, and that the non-native speaker participants were able to effect successful returns to the main topic of the conversations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – The purpose of this paper is to report the state-of-the-art of servitization by presenting a clinical review of literature currently available on the topic. The paper aims to define the servitization concept, report on its origin, features and drivers and give examples of its adoption along with future research challenges. Design/methodology/approach – In determining the scope of this study, the focus is on articles that are central and relevant to servitization within a wider manufacturing context. The methodology consists of identifying relevant publication databases, searching these using a wide range of key words and phrases associated with servitization, and then fully reviewing each article in turn. The key findings and their implications for research are all described. Findings – Servitization is the innovation of an organisation's capabilities and processes to shift from selling products to selling integrated products and services that deliver value in use. There are a diverse range of servitization examples in the literature. These tend to emphasize the potential to maintain revenue streams and improve profitability. Practical implications – Servitization does not represent a panacea for manufactures. However, it is a concept of significant potential value, providing routes for companies to move up the value chain and exploit higher value business activities. There is little work to date that can be used to help practitioners. Originality/value – This paper provides a useful review of servitization and a platform on which to base more in-depth research into the broader topic of service-led competitive strategy by drawing on the work from other related research communities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. © 2014 Association for Computational Linguistics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mapping-based visualisations of image databases are well suited to users wanting to survey the overall content of a collection. Given the large amount of image data contained within such visualisations, however, this approach has yet to be applied to large image databases stored remotely. In this technical demonstration, we showcase our Web-Based Images Browser (WBIB). Our novel system makes use of image pyramids so that users can interactively explore mapping-based visualisations of large remote image databases. © 2012 Authors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The methods and software for integration of databases (DBs) on inorganic material and substance properties have been developed. The information systems integration is based on known approaches combination: EII (Enterprise Information Integration) and EAI (Enterprise Application Integration). The metabase - special database that stores data on integrated DBs contents is an integrated system kernel. Proposed methods have been applied for DBs integrated system creation in the field of inorganic chemistry and materials science. Important developed integrated system feature is ability to include DBs that have been created by means of different DBMS using essentially various computer platforms: Sun (DB "Diagram") and Intel (other DBs) and diverse operating systems: Sun Solaris (DB "Diagram") and Microsoft Windows Server (other DBs).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article explores some of the strategies used by international students of English to manage topic shifts in casual conversations with English-speaking peers. It therefore covers aspects of discourse which have been comparatively under-researched, and where research has also tended to focus on the problems rather than the communicative achievements of non-native speakers. A detailed analysis of the conversations under discussion, which were recorded by the participants themselves, showed that they all flowed smoothly, and this was in large measure due to the ways in which topic shifts were managed. The paper will focus on a very distinct type of topic shift, namely that of topic transitions, which enable a smooth flow from one topic to another, but which do not explicitly signal that a shift is taking place. It will examine how the non-native speakers achieved coherence in the topic transitions which they initiated, which strategies or procedures they employed, and show how their initiations were effective in enabling the proposed topic to be understood, taken up and developed. It therefore adds to our understanding of the interactional achievements of international speakers in informal, social contexts. © 2013 Elsevier B.V.