21 resultados para Dirichlet process


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Medical outcomes are inexorably linked to patient illness and clinical interventions. Interventions change the course of disease, crucially determining outcome. Traditional outcome prediction models build a single classifier by augmenting interventions with disease information. Interventions, however, differentially affect prognosis, thus a single prediction rule may not suffice to capture variations. Interventions also evolve over time as more advanced interventions replace older ones. To this end, we propose a Bayesian nonparametric, supervised framework that models a set of intervention groups through a mixture distribution building a separate prediction rule for each group, and allows the mixture distribution to change with time. This is achieved by using a hierarchical Dirichlet process mixture model over the interventions. The outcome is then modeled as conditional on both the latent grouping and the disease information through a Bayesian logistic regression. Experiments on synthetic and medical cohorts for 30-day readmission prediction demonstrate the superiority of the proposed model over clinical and data mining baselines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data.