989 resultados para Topic models
Resumo:
A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.
Resumo:
Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. © 2014 Association for Computational Linguistics.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM.
Resumo:
International audience
Resumo:
Detailed large-scale information on mammal distribution has often been lacking, hindering conservation efforts. We used the information from the 2009 IUCN Red List of Threatened Species as a baseline for developing habitat suitability models for 5027 out of 5330 known terrestrial mammal species, based on their habitat relationships. We focused on the following environmental variables: land cover, elevation and hydrological features. Models were developed at 300 m resolution and limited to within species' known geographical ranges. A subset of the models was validated using points of known species occurrence. We conducted a global, fine-scale analysis of patterns of species richness. The richness of mammal species estimated by the overlap of their suitable habitat is on average one-third less than that estimated by the overlap of their geographical ranges. The highest absolute difference is found in tropical and subtropical regions in South America, Africa and Southeast Asia that are not covered by dense forest. The proportion of suitable habitat within mammal geographical ranges correlates with the IUCN Red List category to which they have been assigned, decreasing monotonically from Least Concern to Endangered. These results demonstrate the importance of fine-resolution distribution data for the development of global conservation strategies for mammals.
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
This keynote presentation will report some of our research work and experience on the development and applications of relevant methods, models, systems and simulation techniques in support of different types and various levels of decision making for business, management and engineering. In particular, the following topics will be covered. Modelling, multi-agent-based simulation and analysis of the allocation management of carbon dioxide emission permits in China (Nanfeng Liu & Shuliang Li Agent-based simulation of the dynamic evolution of enterprise carbon assets (Yin Zeng & Shuliang Li) A framework & system for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps: a big data perspective (Jin Xu, Zheng Li, Shuliang Li & Yanyan Zhang) Open innovation: intelligent model, social media & complex adaptive system simulation (Shuliang Li & Jim Zheng Li) A framework, model and software prototype for modelling and simulation for deshopping behaviour and how companies respond (Shawkat Rahman & Shuliang Li) Integrating multiple agents, simulation, knowledge bases and fuzzy logic for international marketing decision making (Shuliang Li & Jim Zheng Li) A Web-based hybrid intelligent system for combined conventional, digital, mobile, social media and mobile marketing strategy formulation (Shuliang Li & Jim Zheng Li) A hybrid intelligent model for Web & social media dynamics, and evolutionary and adaptive branding (Shuliang Li) A hybrid paradigm for modelling, simulation and analysis of brand virality in social media (Shuliang Li & Jim Zheng Li) Network configuration management: attack paradigms and architectures for computer network survivability (Tero Karvinen & Shuliang Li)
Resumo:
Conventional topic models are ineffective for topic extraction from microblog messages since the lack of structure and context among the posts renders poor message-level word co-occurrence patterns. In this work, we organize microblog posts as conversation trees based on reposting and replying relations, which enrich context information to alleviate data sparseness. Our model generates words according to topic dependencies derived from the conversation structures. In specific, we differentiate messages as leader messages, which initiate key aspects of previously focused topics or shift the focus to different topics, and follower messages that do not introduce any new information but simply echo topics from the messages that they repost or reply. Our model captures the different extents that leader and follower messages may contain the key topical words, thus further enhances the quality of the induced topics. The results of thorough experiments demonstrate the effectiveness of our proposed model.
Resumo:
The Childhood protection is a subject with high value for the society, but, the Child Abuse cases are difficult to identify. The process from suspicious to accusation is very difficult to achieve. It must configure very strong evidences. Typically, Health Care services deal with these cases from the beginning where there are evidences based on the diagnosis, but they aren’t enough to promote the accusation. Besides that, this subject it’s highly sensitive because there are legal aspects to deal with such as: the patient privacy, paternity issues, medical confidentiality, among others. We propose a Child Abuses critical knowledge monitor system model that addresses this problem. This decision support system is implemented with a multiple scientific domains: to capture of tokens from clinical documents from multiple sources; a topic model approach to identify the topics of the documents; knowledge management through the use of ontologies to support the critical knowledge sensibility concepts and relations such as: symptoms, behaviors, among other evidences in order to match with the topics inferred from the clinical documents and then alert and log when clinical evidences are present. Based on these alerts clinical personnel could analyze the situation and take the appropriate procedures.
Resumo:
INTRODUCTION: This study sought to increase understanding of women's thoughts and feelings about decision making and the experience of subsequent pregnancy following stillbirth (intrauterine death after 24 weeks' gestation). METHODS: Eleven women were interviewed, 8 of whom were pregnant at the time of the interview. Modified grounded theory was used to guide the research methodology and to analyze the data. RESULTS: A model was developed to illustrate women's experiences of decision making in relation to subsequent pregnancy and of subsequent pregnancy itself. DISCUSSION: The results of the current study have significant implications for women who have experienced stillbirth and the health professionals who work with them. Based on the model, women may find it helpful to discuss their beliefs in relation to healing and health professionals to provide support with this in mind. Women and their partners may also benefit from explanations and support about the potentially conflicting emotions they may experience during this time.
Resumo:
OBJECTIVE: To calculate the variable costs involved with the process of delivering erythropoiesis stimulating agents (ESA) in European dialysis practices. METHODS: A conceptual model was developed to classify the processes and sub-processes followed in the pharmacy (ordering from supplier, receiving/storing/delivering ESA to the dialysis unit), dialysis unit (dose determination, ordering, receipt, registration, storage, administration, registration) and waste disposal unit. Time and material costs were recorded. Labour costs were derived from actual local wages while material costs came from the facilities' accounting records. Activities associated with ESA administration were listed and each activity evaluated to determine if dosing frequency affected the amount of resources required. RESULTS: A total of 21 centres in 8 European countries supplied data for 142 patients (mean) per hospital (range 42-648). Patients received various ESA regimens (thrice-weekly, twice-weekly, once-weekly, once every 2 weeks and once-monthly). Administering ESA every 2 weeks, the mean costs per patient per year for each process and the estimates of the percentage reduction in costs obtainable, respectively, were: pharmacy labour (10.1 euro, 39%); dialysis unit labour (66.0 euro, 65%); dialysis unit materials (4.11 euro, 61%) and waste unit materials (0.43 euro, 49%). LIMITATION: Impact on financial costs was not measured. CONCLUSION: ESA administration has quantifiable labour and material costs which are affected by dosing frequency.
Resumo:
BACKGROUND: Three different burnout types have been described: The "frenetic" type describes involved and ambitious subjects who sacrifice their health and personal lives for their jobs; the "underchallenged" type describes indifferent and bored workers who fail to find personal development in their jobs, and the "worn-out" in type describes neglectful subjects who feel they have little control over results and whose efforts go unacknowledged. The study aimed to describe the possible associations between burnout types and general sociodemographic and occupational characteristics. METHODS: A cross-sectional study was carried out on a multi-occupational sample of randomly selected university employees (n = 409). The presence of burnout types was assessed by means of the "Burnout Clinical Subtype Questionnaire (BCSQ-36)", and the degree of association between variables was assessed using an adjusted odds ratio (OR) obtained from multivariate logistic regression models. RESULTS: Individuals working more than 40 hours per week presented with the greatest risk for "frenetic" burnout compared to those working fewer than 35 hours (adjusted OR = 5.69; 95% CI = 2.52-12.82; p < 0.001). Administration and service personnel presented the greatest risk of "underchallenged" burnout compared to teaching and research staff (adjusted OR = 2.85; 95% CI = 1.16-7.01; p = 0.023). Employees with more than sixteen years of service in the organisation presented the greatest risk of "worn-out" burnout compared to those with less than four years of service (adjusted OR = 4.56; 95% CI = 1.47-14.16; p = 0.009). CONCLUSIONS: This study is the first to our knowledge that suggests the existence of associations between the different burnout subtypes (classified according to the degree of dedication to work) and the different sociodemographic and occupational characteristics that are congruent with the definition of each of the subtypes. These results are consistent with the clinical profile definitions of burnout syndrome. In addition, they assist the recognition of distinct profiles and reinforce the idea of differential characterisation of the syndrome for more effective treatment.
Resumo:
Despite medical advances, mortality in infective endocarditis (IE) is still very high. Previous studies on prognosis in IE have observed conflicting results. The aim of this study was to identify predictors of in-hospital mortality in a large multicenter cohort of left-sided IE.Methods An observational multicenter study was conducted from January 1984 to December 2006 in seven hospitals in Andalusia, Spain. Seven hundred and five left-side IE patients were included. The main outcome measure was in-hospital mortality. Several prognostic factors were analysed by univariate tests and then by multilogistic regression model. Results.The overall mortality was 29.5% (25.5% from 1984 to 1995 and 31.9% from 1996 to 2006; Odds Ratio 1.25; 95% Confidence Interval: 0.97-1.60; p = 0.07). In univariate analysis, age, comorbidity, especially chronic liver disease, prosthetic valve, virulent microorganism such as Staphylococcus aureus, Streptococcus agalactiae and fungi, and complications (septic shock, severe heart failure, renal insufficiency, neurologic manifestations and perivalvular extension) were related with higher mortality. Independent factors for mortality in multivariate analysis were: Charlson comorbidity score (OR: 1.2; 95% CI: 1.1-1.3), prosthetic endocarditis (OR: 1.9; CI: 1.2-3.1), Staphylococcus aureus aetiology (OR: 2.1; CI: 1.3-3.5), severe heart failure (OR: 5.4; CI: 3.3-8.8), neurologic manifestations (OR: 1.9; CI: 1.2-2.9), septic shock (OR: 4.2; CI: 2.3-7.7), perivalvular extension (OR: 2.4; CI: 1.3-4.5) and acute renal failure (OR: 1.69; CI: 1.0-2.6). Conversely, Streptococcus viridans group etiology (OR: 0.4; CI: 0.2-0.7) and surgical treatment (OR: 0.5; CI: 0.3-0.8) were protective factors.Conclusions Several characteristics of left-sided endocarditis enable selection of a patient group at higher risk of mortality. This group may benefit from more specialised attention in referral centers and should help to identify those patients who might benefit from more aggressive diagnostic and/or therapeutic procedures.