22 resultados para Incunabula as Topic


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tagging recommender system allows Internet users to annotate resources with personalized tags and provides users the freedom to obtain recommendations. However, It is usually confronted with serious privacy concerns, because adversaries may re-identify a user and her/his sensitive tags with only a little background information. This paper proposes a privacy preserving tagging release algorithm, PriTop, which is designed to protect users under the notion of differential privacy. The proposed PriTop algorithm includes three privacy preserving operations: Private Topic Model Generation structures the uncontrolled tags, Private Weight Perturbation adds Laplace noise into the weights to hide the numbers of tags; while Private Tag Selection finally finds the most suitable replacement tags for the original tags. We present extensive experimental results on four real world datasets and results suggest the proposed PriTop algorithm can successfully retain the utility of the datasets while preserving privacy. © 2014 Springer International Publishing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although random control trial is the gold standard in medical research, researchers are increasingly looking to alternative data sources for hypothesis generation and early-stage evidence collection. Coded clinical data are collected routinely in most hospitals. While they contain rich information directly related to the real clinical setting, they are both noisy and semantically diverse, making them difficult to analyze with conventional statistical tools. This paper presents a novel application of Bayesian nonparametric modeling to uncover latent information in coded clinical data. For a patient cohort, a Bayesian nonparametric model is used to reveal the common comorbidity groups shared by the patients and the proportion that each comorbidity group is reflected individual patient. To demonstrate the method, we present a case study based on hospitalization coding from an Australian hospital. The model recovered 15 comorbidity groups among 1012 patients hospitalized during a month. When patients from two areas of unequal socio-economic status were compared, it reveals higher prevalence of diverticular disease in the region of lower socio-economic status. The study builds a convincing case for routine coded data to speed up hypothesis generation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tagging recommender systems provide users the freedom to explore tags and obtain recommendations. The releasing and sharing of these tagging datasets will accelerate both commercial and research work on recommender systems. However, releasing the original tagging datasets is usually confronted with serious privacy concerns, because adversaries may re-identify a user and her/his sensitive information from tagging datasets with only a little background information. Recently, several privacy techniques have been proposed to address the problem, but most of these lack a strict privacy notion, and rarely prevent individuals being re-identified from the dataset. This paper proposes a privacy- preserving tag release algorithm, PriTop. This algorithm is designed to satisfy differential privacy, a strict privacy notion with the goal of protecting users in a tagging dataset. The proposed PriTop algorithm includes three privacy-preserving operations: Private topic model generation structures the uncontrolled tags; private weight perturbation adds Laplace noise into the weights to hide the numbers of tags; while private tag selection finally finds the most suitable replacement tags for the original tags, so the exact tags can be hidden. We present extensive experimental results on four real-world datasets, Delicious, MovieLens, Last.fm and BibSonomy. While the recommendation algorithm is successful in all the cases, our results further suggest the proposed PriTop algorithm can successfully retain the utility of the datasets while preserving privacy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online social media systems have created new ways for individuals to communicate, share information and interact with a wide audience. For organisations, social media provide new avenues for communication and collaboration with their stakeholders. The potential value of social media tools to assist in the successful communication and marketing inside and outside of engineering organisations has been identified. In the context of engineering education, the potential of social media to open new modes of communication, interaction and experimentation between students and teachers has also been identified, and a limited number of examples can be found documented in the literature. One of the most widely-used social media tools is the ‘microblogging’ service Twitter. This research presents an analysis of nearly 19,000 tweets relating to ‘engineering education’ collected over a period of almost a year. Social network analysis is used to visualise the Twitter data. The Twitter social media communication is examined to identify who is active on this topic, who is influential, and what is the structure of the online conversations relating to engineering education. This work provides insights regarding how engineering education is currently represented in social media internationally, and offers a methodology to those interested in related future research.