962 resultados para Topic analysis


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postprint

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sentiment analysis over Twitter offer organisations a fast and effective way to monitor the publics' feelings towards their brand, business, directors, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. In this paper, we introduce a novel approach of adding semantics as additional features into the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we add its semantic concept (e.g. Apple product) as an additional feature, and measure the correlation of the representative concept with negative/positive sentiment. We apply this approach to predict sentiment for three different Twitter datasets. Our results show an average increase of F harmonic accuracy score for identifying both negative and positive sentiment of around 6.5% and 4.8% over the baselines of unigrams and part-of-speech features respectively. We also compare against an approach based on sentiment-bearing topic analysis, and find that semantic features produce better Recall and F score when classifying negative sentiment, and better Precision with lower Recall and F score in positive sentiment classification.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Considerando-se a linguagem um elemento de interação social, concebe-se o texto, em todas as suas manifestações, como objeto norteador do ensino da Língua Materna, cujo objetivo, segundo orientações dos PCN de Língua Portuguesa, é formar cidadãos reflexivos e críticos, proficientes na prática da leitura e da escrita. Entretanto, face a um currículo extenso e limitada carga horária, as aulas de produção textual, não raras vezes, deixam de propiciar uma formação autônoma e reproduzem práticas que visam apenas à apropriação da norma padrão e à apreensão de modelos de escrita, gerando um aprendizado desvinculado da realidade do educando. Dessa forma, a presente dissertação pretende mostrar os desafios enfrentados por professores do Ensino Médio para desconstruir a passividade que se instaura no educando, ao longo do Ensino Fundamental II, perante seu próprio texto, seja no ato de exposição do tema proposto para a redação, seja na etapa destinada à avaliação da mesma, na qual pouco se estimula sua participação. Com base na perspectiva sociointeracionista de Bakhtin (2009), este trabalho defende a ideia de que o envolvimento do educando com seu texto deve se iniciar na fase de elaboração do tema, de modo que este possa ser integrado à vivência do aluno, até a fase de revisão e correção, em que o acompanhamento do processo visa a garantir ao autor o direito à analise e à reescritura de sua produção, momentos imprescindíveis ao desenvolvimento da competência leitora e escritora. O corpus desta pesquisa é formado por produções textuais de estudantes do primeiro e do segundo anos do Ensino Médio de uma escola da rede particular do município do Rio de Janeiro. Foram analisadas 64 redações, sendo 28 produzidas a partir de temas determinados pelo professor, focando procedimentos relativos à análise do tema e à reescrita, e 36 relacionadas à construção de temas a partir de textos não verbais, buscando-se maior amplitude interpretativa. Além disso, também foram aplicados questionários, nos quais os alunos expuseram seus pontos de vista sobre as aulas de redação na escola. A análise do corpus demonstrou que a escrita é um processo e que o ensino de produção de textos se torna mais produtivo na medida em que se possibilita maior participação dos educandos em todas as etapas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

本文旨在研究如何基于小世界模型进行文本分割,确定片段主题,进而总结全文的中心主题,使文本的主题脉络呈现出来。为此首先证明由文本形成的词汇共现图呈现短路径,高聚集度的特性,说明小世界结构存在于文本中;然后依据小世界结构将词汇共现图划分为“簇”,通过计算“簇”在文本中所占的密度比重识别片段边界,使“簇”与片段对应起来;最后利用短路径,高聚集度的特性提取图“簇”的主题词,采取背景词汇聚类及主题词联想的方式将主题词扩充到待分析文本之外,尝试挖掘隐藏于字词表面之下的文本内涵。虽然国际上已有很多关于小世界结构及基于其上的应用研究,但利用小世界特性进行主题分析还是一个崭新的课题。实验表明,本文所给方法的结果明显好于其他方法,说明可以为下一步文本推理的工作提供有价值的预处理。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [12]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multicarrier code division multiple access (MC-CDMA) is a very promising candidate for the multiple access scheme in fourth generation wireless communi- cation systems. During asynchronous transmission, multiple access interference (MAI) is a major challenge for MC-CDMA systems and significantly affects their performance. The main objectives of this thesis are to analyze the MAI in asyn- chronous MC-CDMA, and to develop robust techniques to reduce the MAI effect. Focus is first on the statistical analysis of MAI in asynchronous MC-CDMA. A new statistical model of MAI is developed. In the new model, the derivation of MAI can be applied to different distributions of timing offset, and the MAI power is modelled as a Gamma distributed random variable. By applying the new statistical model of MAI, a new computer simulation model is proposed. This model is based on the modelling of a multiuser system as a single user system followed by an additive noise component representing the MAI, which enables the new simulation model to significantly reduce the computation load during computer simulations. MAI reduction using slow frequency hopping (SFH) technique is the topic of the second part of the thesis. Two subsystems are considered. The first sub- system involves subcarrier frequency hopping as a group, which is referred to as GSFH/MC-CDMA. In the second subsystem, the condition of group hopping is dropped, resulting in a more general system, namely individual subcarrier frequency hopping MC-CDMA (ISFH/MC-CDMA). This research found that with the introduction of SFH, both of GSFH/MC-CDMA and ISFH/MC-CDMA sys- tems generate less MAI power than the basic MC-CDMA system during asyn- chronous transmission. Because of this, both SFH systems are shown to outper- form MC-CDMA in terms of BER. This improvement, however, is at the expense of spectral widening. In the third part of this thesis, base station polarization diversity, as another MAI reduction technique, is introduced to asynchronous MC-CDMA. The com- bined system is referred to as Pol/MC-CDMA. In this part a new optimum com- bining technique namely maximal signal-to-MAI ratio combining (MSMAIRC) is proposed to combine the signals in two base station antennas. With the applica- tion of MSMAIRC and in the absents of additive white Gaussian noise (AWGN), the resulting signal-to-MAI ratio (SMAIR) is not only maximized but also in- dependent of cross polarization discrimination (XPD) and antenna angle. In the case when AWGN is present, the performance of MSMAIRC is still affected by the XPD and antenna angle, but to a much lesser degree than the traditional maximal ratio combining (MRC). Furthermore, this research found that the BER performance for Pol/MC-CDMA can be further improved by changing the angle between the two receiving antennas. Hence the optimum antenna angles for both MSMAIRC and MRC are derived and their effects on the BER performance are compared. With the derived optimum antenna angle, the Pol/MC-CDMA system is able to obtain the lowest BER for a given XPD.