30 resultados para bayesian analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The Millennium Declaration in 2000 brought special global attention to HIV, tuberculosis, and malaria through the formulation of Millennium Development Goal (MDG) 6. The Global Burden of Disease 2013 study provides a consistent and comprehensive approach to disease estimation for between 1990 and 2013, and an opportunity to assess whether accelerated progress has occured since the Millennium Declaration. METHODS: To estimate incidence and mortality for HIV, we used the UNAIDS Spectrum model appropriately modified based on a systematic review of available studies of mortality with and without antiretroviral therapy (ART). For concentrated epidemics, we calibrated Spectrum models to fit vital registration data corrected for misclassification of HIV deaths. In generalised epidemics, we minimised a loss function to select epidemic curves most consistent with prevalence data and demographic data for all-cause mortality. We analysed counterfactual scenarios for HIV to assess years of life saved through prevention of mother-to-child transmission (PMTCT) and ART. For tuberculosis, we analysed vital registration and verbal autopsy data to estimate mortality using cause of death ensemble modelling. We analysed data for corrected case-notifications, expert opinions on the case-detection rate, prevalence surveys, and estimated cause-specific mortality using Bayesian meta-regression to generate consistent trends in all parameters. We analysed malaria mortality and incidence using an updated cause of death database, a systematic analysis of verbal autopsy validation studies for malaria, and recent studies (2010-13) of incidence, drug resistance, and coverage of insecticide-treated bednets. FINDINGS: Globally in 2013, there were 1·8 million new HIV infections (95% uncertainty interval 1·7 million to 2·1 million), 29·2 million prevalent HIV cases (28·1 to 31·7), and 1·3 million HIV deaths (1·3 to 1·5). At the peak of the epidemic in 2005, HIV caused 1·7 million deaths (1·6 million to 1·9 million). Concentrated epidemics in Latin America and eastern Europe are substantially smaller than previously estimated. Through interventions including PMTCT and ART, 19·1 million life-years (16·6 million to 21·5 million) have been saved, 70·3% (65·4 to 76·1) in developing countries. From 2000 to 2011, the ratio of development assistance for health for HIV to years of life saved through intervention was US$4498 in developing countries. Including in HIV-positive individuals, all-form tuberculosis incidence was 7·5 million (7·4 million to 7·7 million), prevalence was 11·9 million (11·6 million to 12·2 million), and number of deaths was 1·4 million (1·3 million to 1·5 million) in 2013. In the same year and in only individuals who were HIV-negative, all-form tuberculosis incidence was 7·1 million (6·9 million to 7·3 million), prevalence was 11·2 million (10·8 million to 11·6 million), and number of deaths was 1·3 million (1·2 million to 1·4 million). Annualised rates of change (ARC) for incidence, prevalence, and death became negative after 2000. Tuberculosis in HIV-negative individuals disproportionately occurs in men and boys (versus women and girls); 64·0% of cases (63·6 to 64·3) and 64·7% of deaths (60·8 to 70·3). Globally, malaria cases and deaths grew rapidly from 1990 reaching a peak of 232 million cases (143 million to 387 million) in 2003 and 1·2 million deaths (1·1 million to 1·4 million) in 2004. Since 2004, child deaths from malaria in sub-Saharan Africa have decreased by 31·5% (15·7 to 44·1). Outside of Africa, malaria mortality has been steadily decreasing since 1990. INTERPRETATION: Our estimates of the number of people living with HIV are 18·7% smaller than UNAIDS's estimates in 2012. The number of people living with malaria is larger than estimated by WHO. The number of people living with HIV, tuberculosis, or malaria have all decreased since 2000. At the global level, upward trends for malaria and HIV deaths have been reversed and declines in tuberculosis deaths have accelerated. 101 countries (74 of which are developing) still have increasing HIV incidence. Substantial progress since the Millennium Declaration is an encouraging sign of the effect of global action. FUNDING: Bill & Melinda Gates Foundation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The fifth Millennium Development Goal (MDG 5) established the goal of a 75% reduction in the maternal mortality ratio (MMR; number of maternal deaths per 100,000 livebirths) between 1990 and 2015. We aimed to measure levels and track trends in maternal mortality, the key causes contributing to maternal death, and timing of maternal death with respect to delivery. METHODS: We used robust statistical methods including the Cause of Death Ensemble model (CODEm) to analyse a database of data for 7065 site-years and estimate the number of maternal deaths from all causes in 188 countries between 1990 and 2013. We estimated the number of pregnancy-related deaths caused by HIV on the basis of a systematic review of the relative risk of dying during pregnancy for HIV-positive women compared with HIV-negative women. We also estimated the fraction of these deaths aggravated by pregnancy on the basis of a systematic review. To estimate the numbers of maternal deaths due to nine different causes, we identified 61 sources from a systematic review and 943 site-years of vital registration data. We also did a systematic review of reports about the timing of maternal death, identifying 142 sources to use in our analysis. We developed estimates for each country for 1990-2013 using Bayesian meta-regression. We estimated 95% uncertainty intervals (UIs) for all values. FINDINGS: 292,982 (95% UI 261,017-327,792) maternal deaths occurred in 2013, compared with 376,034 (343,483-407,574) in 1990. The global annual rate of change in the MMR was -0·3% (-1·1 to 0·6) from 1990 to 2003, and -2·7% (-3·9 to -1·5) from 2003 to 2013, with evidence of continued acceleration. MMRs reduced consistently in south, east, and southeast Asia between 1990 and 2013, but maternal deaths increased in much of sub-Saharan Africa during the 1990s. 2070 (1290-2866) maternal deaths were related to HIV in 2013, 0·4% (0·2-0·6) of the global total. MMR was highest in the oldest age groups in both 1990 and 2013. In 2013, most deaths occurred intrapartum or postpartum. Causes varied by region and between 1990 and 2013. We recorded substantial variation in the MMR by country in 2013, from 956·8 (685·1-1262·8) in South Sudan to 2·4 (1·6-3·6) in Iceland. INTERPRETATION: Global rates of change suggest that only 16 countries will achieve the MDG 5 target by 2015. Accelerated reductions since the Millennium Declaration in 2000 coincide with increased development assistance for maternal, newborn, and child health. Setting of targets and associated interventions for after 2015 will need careful consideration of regions that are making slow progress, such as west and central Africa. FUNDING: Bill & Melinda Gates Foundation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Performance in triathlon is dependent upon factors that include somatotype, physiological capacity, technical proficiency and race strategy. Given the multidisciplinary nature of triathlon and the interaction between each of the three race components, the identification of target split times that can be used to inform the design of training plans and race pacing strategies is a complex task. The present study uses machine learning techniques to analyse a large database of performances in Olympic distance triathlons (2008–2012). The analysis reveals patterns of performance in five components of triathlon (three race “legs” and two transitions) and the complex relationships between performance in each component and overall performance in a race. The results provide three perspectives on the relationship between performance in each component of triathlon and the final placing in a race. These perspectives allow the identification of target split times that are required to achieve a certain final place in a race and the opportunity to make evidence-based decisions about race tactics in order to optimise performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

 My research is to exploit side information into advanced Bayesian nonparametric models. We have developed some novel models for data clustering and medical data analysis and also have made our methods scalable for large-scale data. I have published my research in several journal and conference papers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

 Objectives: To synthesize the efficacy and safety outcomes from randomized-controlled trials (RCTs) regarding new oral anticoagulant, protease-activated receptor-1 (PAR-1) antagonist, and warfarin adjunctive to aspirin for patients after acute coronary syndrome (ACS) via pair-wise and network meta-analyses.
Methods: A comprehensive literature search was performed in Embase, Medline, Cochrane Library Web of Knowledge, and Scopus. The pair-wise meta-analysis was undertaken respectively to each agent/treatment category via Revmen 5.1. In order to estimate the relative efficacy of each agent/treatment category whilst preserving the randomized comparisons within each trial, a Bayesian network meta-analysis was conducted in WinBUGS using both fixed- and random-effects model. Covariate analysis was performed to explore the effects of length of follow-up and age of subject on the final results.
Results: In total, 23 RCTs were included in the meta-analysis. As shown by the results (OR,95%CI) for the pair-wise meta-analysis, new oral anticoagulants (0.85, [0.78, 0.93] and 3.04, [2.21, 4.19]), PAR-1 antagonists (0.80, [0.52, 1.22] and 1.55, [1.25, 1.93]) and warfarin (0.87, [0.74, 1.02] and 1.77, [1.46, 2.14]) might be able to provide better outcome in the incidences of major adverse events (MAE) but with higher bleeding risk comparing to aspirin treatment alone. Based on the model fit assessment, the random-effects model was adopted. The network meta-analysis (treatment effect comparing to aspirin lone) identified ximelagatran (-0.3044, [-0.8601, 0.2502]), dabigatran (-0.2144, [-0.8666, 0.4525]), rivoroxaban (-0.2179, [-0.5986, 0.1628]) and vorapaxar (-0.2272, [-0.81, 0.1664]) produced better improvements in MAE incidences whereas vorapaxar (0.3764, [-0.4444, 1.124]), warfarin (0.663, [0.3375, 1.037]), ximelagatran (0.7509, [-0.4164, 2.002]) and apixaban (0.8594, [-0.0049, 1.7]) produced less major bleeding events. The indirect comparisons among drug category (difference in incidence comparing to aspirin lone) showed new oral anticoagulants (-0.1974, [-0.284, -0.111]) and PAR-1 antagonists (-0.1239, [-0.215, -0.033]) to besuperior to warfarin (-0.1004, [-0.166, -0.035]) in the occurrences of MAE whereas PAR-1 antagonists (0.4292, [0.2123, 0.6476]) afforded better outcomes in major bleeding events against warfarin (0.5742, [0.3889, 0.7619]) and new oral anticoagulants (1.169, [0.8667, 1.485]).
Conclusion: Based on the study results, we cannot recommend the routine administration of new oral anticoagulant as add-on treatment for patients after ACS. However, for ACS patients comorbid with atrial fibrillation, new oral anticoagulant might be superior to warfarin in both efficacy and safety outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveil-lance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmenta-tion and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparamet-ric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper was to see whether all-cause and cause-specific mortality rates vary between Asian ethnic subgroups, and whether overseas born Asian subgroup mortality rate ratios varied by nativity and duration of residence. We used hierarchical Bayesian methods to allow for sparse data in the analysis of linked census-mortality data for 25-75 year old New Zealanders. We found directly standardised posterior all-cause and cardiovascular mortality rates were highest for the Indian ethnic group, significantly so when compared with those of Chinese ethnicity. In contrast, cancer mortality rates were lowest for ethnic Indians. Asian overseas born subgroups have about 70% of the mortality rate of their New Zealand born Asian counterparts, a result that showed little variation by Asian subgroup or cause of death. Within the overseas born population, all-cause mortality rates for migrants living 0-9 years in New Zealand were about 60% of the mortality rate of those living more than 25 years in New Zealand regardless of ethnicity. The corresponding figure for cardiovascular mortality rates was 50%. However, while Chinese cancer mortality rates increased with duration of residence, Indian and Other Asian cancer mortality rates did not. Future research on the mechanisms of worsening of health with increased time spent in the host country is required to improve the understanding of the process, and would assist the policy-makers and health planners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our research aims at contributing to the multilevel modeling in data analytics. We address the task of multilevel clustering, multilevel regression, and classification. We provide state of the art solution for the critical problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The Global Burden of Disease, Injuries, and Risk Factor study 2013 (GBD 2013) is the first of a series of annual updates of the GBD. Risk factor quantification, particularly of modifiable risk factors, can help to identify emerging threats to population health and opportunities for prevention. The GBD 2013 provides a timely opportunity to update the comparative risk assessment with new data for exposure, relative risks, and evidence on the appropriate counterfactual risk distribution. Methods: Attributable deaths, years of life lost, years lived with disability, and disability-adjusted life-years (DALYs) have been estimated for 79 risks or clusters of risks using the GBD 2010 methods. Risk-outcome pairs meeting explicit evidence criteria were assessed for 188 countries for the period 1990-2013 by age and sex using three inputs: risk exposure, relative risks, and the theoretical minimum risk exposure level (TMREL). Risks are organised into a hierarchy with blocks of behavioural, environmental and occupational, and metabolic risks at the first level of the hierarchy. The next level in the hierarchy includes nine clusters of related risks and two individual risks, with more detail provided at levels 3 and 4 of the hierarchy. Compared with GBD 2010, six new risk factors have been added: handwashing practices, occupational exposure to trichloroethylene, childhood wasting, childhood stunting, unsafe sex, and low glomerular filtration rate. For most risks, data for exposure were synthesised with a Bayesian metaregression method, DisMod-MR 2.0, or spatial-temporal Gaussian process regression. Relative risks were based on meta-regressions of published cohort and intervention studies. Attributable burden for clusters of risks and all risks combined took into account evidence on the mediation of some risks such as high body-mass index (BMI) through other risks such as high systolic blood pressure and high cholesterol. Findings: All risks combined account for 57·2% (95% uncertainty interval [UI] 55·8-58·5) of deaths and 41·6% (40·1-43·0) of DALYs. Risks quantified account for 87·9% (86·5-89·3) of cardiovascular disease DALYs, ranging to a low of 0% for neonatal disorders and neglected tropical diseases and malaria. In terms of global DALYs in 2013, six risks or clusters of risks each caused more than 5% of DALYs: dietary risks accounting for 11·3 million deaths and 241·4 million DALYs, high systolic blood pressure for 10·4 million deaths and 208·1 million DALYs, child and maternal malnutrition for 1·7 million deaths and 176·9 million DALYs, tobacco smoke for 6·1 million deaths and 143·5 million DALYs, air pollution for 5·5 million deaths and 141·5 million DALYs, and high BMI for 4·4 million deaths and 134·0 million DALYs. Risk factor patterns vary across regions and countries and with time. In sub-Saharan Africa, the leading risk factors are child and maternal malnutrition, unsafe sex, and unsafe water, sanitation, and handwashing. In women, in nearly all countries in the Americas, north Africa, and the Middle East, and in many other high-income countries, high BMI is the leading risk factor, with high systolic blood pressure as the leading risk in most of Central and Eastern Europe and south and east Asia. For men, high systolic blood pressure or tobacco use are the leading risks in nearly all high-income countries, in north Africa and the Middle East, Europe, and Asia. For men and women, unsafe sex is the leading risk in a corridor from Kenya to South Africa. Interpretation: Behavioural, environmental and occupational, and metabolic risks can explain half of global mortality and more than one-third of global DALYs providing many opportunities for prevention. Of the larger risks, the attributable burden of high BMI has increased in the past 23 years. In view of the prominence of behavioural risk factors, behavioural and social science research on interventions for these risks should be strengthened. Many prevention and primary care policy options are available now to act on key risks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel in-cylinder pressure method for determining ignition delay has been proposed and demonstrated. This method proposes a new Bayesian statistical model to resolve the start of combustion, defined as being the point at which the band-pass in-cylinder pressure deviates from background noise and the combustion resonance begins. Further, it is demonstrated that this method is still accurate in situations where there is noise present. The start of combustion can be resolved for each cycle without the need for ad hoc methods such as cycle averaging. Therefore, this method allows for analysis of consecutive cycles and inter-cycle variability studies. Ignition delay obtained by this method and by the net rate of heat release have been shown to give good agreement. However, the use of combustion resonance to determine the start of combustion is preferable over the net rate of heat release method because it does not rely on knowledge of heat losses and will still function accurately in the presence of noise. Results for a six-cylinder turbo-charged common-rail diesel engine run with neat diesel fuel at full, three quarters and half load have been presented. Under these conditions the ignition delay was shown to increase as the load was decreased with a significant increase in ignition delay at half load, when compared with three quarter and full loads.