62 resultados para Estimateur de Bayes


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research aims to explore and identify political risks on a large infrastructure project in an exaggerated environment to ascertain whether sufficient objective information can be gathered by project managers to utilise risk modelling techniques. During the study, the author proposes a new definition of political risk; performs a detailed project study of the Neelum Jhelum Hydroelectric Project in Pakistan; implements a probabilistic model using the principle of decomposition and Bayes probabilistic theorem and answers the question: was it possible for project managers to obtain all the relevant objective data to implement a probabilistic model?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Although the detrimental impact of major depressive disorder (MDD) at the individual level has been described, its global epidemiology remains unclear given limitations in the data. Here we present the modelled epidemiological profile of MDD dealing with heterogeneity in the data, enforcing internal consistency between epidemiological parameters and making estimates for world regions with no empirical data. These estimates were used to quantify the burden of MDD for the Global Burden of Disease Study 2010 (GBD 2010). Method Analyses drew on data from our existing literature review of the epidemiology of MDD. DisMod-MR, the latest version of the generic disease modelling system redesigned as a Bayesian meta-regression tool, derived prevalence by age, year and sex for 21 regions. Prior epidemiological knowledge, study- and country-level covariates adjusted sub-optimal raw data. Results There were over 298 million cases of MDD globally at any point in time in 2010, with the highest proportion of cases occurring between 25 and 34 years. Global point prevalence was very similar across time (4.4% (95% uncertainty: 4.2–4.7%) in 1990, 4.4% (4.1–4.7%) in 2005 and 2010), but higher in females (5.5% (5.0–6.0%) compared to males (3.2% (3.0–3.6%) in 2010. Regions in conflict had higher prevalence than those with no conflict. The annual incidence of an episode of MDD followed a similar age and regional pattern to prevalence but was about one and a half times higher, consistent with an average duration of 37.7 weeks. Conclusion We were able to integrate available data, including those from high quality surveys and sub-optimal studies, into a model adjusting for known methodological sources of heterogeneity. We were also able to estimate the epidemiology of MDD in regions with no available data. This informed GBD 2010 and the public health field, with a clearer understanding of the global distribution of MDD.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Depressive disorders were a leading cause of burden in the Global Burden of Disease (GBD) 1990 and 2000 studies. Here, we analyze the burden of depressive disorders in GBD 2010 and present severity proportions, burden by country, region, age, sex, and year, as well as burden of depressive disorders as a risk factor for suicide and ischemic heart disease. Methods and Findings Burden was calculated for major depressive disorder (MDD) and dysthymia. A systematic review of epidemiological data was conducted. The data were pooled using a Bayesian meta-regression. Disability weights from population survey data quantified the severity of health loss from depressive disorders. These weights were used to calculate years lived with disability (YLDs) and disability adjusted life years (DALYs). Separate DALYs were estimated for suicide and ischemic heart disease attributable to depressive disorders.Depressive disorders were the second leading cause of YLDs in 2010. MDD accounted for 8.2% (5.9%-10.8%) of global YLDs and dysthymia for 1.4% (0.9%-2.0%). Depressive disorders were a leading cause of DALYs even though no mortality was attributed to them as the underlying cause. MDD accounted for 2.5% (1.9%-3.2%) of global DALYs and dysthymia for 0.5% (0.3%-0.6%). There was more regional variation in burden for MDD than for dysthymia; with higher estimates in females, and adults of working age. Whilst burden increased by 37.5% between 1990 and 2010, this was due to population growth and ageing. MDD explained 16 million suicide DALYs and almost 4 million ischemic heart disease DALYs. This attributable burden would increase the overall burden of depressive disorders from 3.0% (2.2%-3.8%) to 3.8% (3.0%-4.7%) of global DALYs. Conclusions GBD 2010 identified depressive disorders as a leading cause of burden. MDD was also a contributor of burden allocated to suicide and ischemic heart disease. These findings emphasize the importance of including depressive disorders as a public-health priority and implementing cost-effective interventions to reduce its burden.Please see later in the article for the Editors' Summary.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study aimed to investigate the spatial clustering and dynamic dispersion of dengue incidence in Queensland, Australia. We used Moran's I statistic to assess the spatial autocorrelation of reported dengue cases. Spatial empirical Bayes smoothing estimates were used to display the spatial distribution of dengue in postal areas throughout Queensland. Local indicators of spatial association (LISA) maps and logistic regression models were used to identify spatial clusters and examine the spatio-temporal patterns of the spread of dengue. The results indicate that the spatial distribution of dengue was clustered during each of the three periods of 1993–1996, 1997–2000 and 2001–2004. The high-incidence clusters of dengue were primarily concentrated in the north of Queensland and low-incidence clusters occurred in the south-east of Queensland. The study concludes that the geographical range of notified dengue cases has significantly expanded in Queensland over recent years.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diabetic macular edema (DME) is one of the most common causes of visual loss among diabetes mellitus patients. Early detection and successive treatment may improve the visual acuity. DME is mainly graded into non-clinically significant macular edema (NCSME) and clinically significant macular edema according to the location of hard exudates in the macula region. DME can be identified by manual examination of fundus images. It is laborious and resource intensive. Hence, in this work, automated grading of DME is proposed using higher-order spectra (HOS) of Radon transform projections of the fundus images. We have used third-order cumulants and bispectrum magnitude, in this work, as features, and compared their performance. They can capture subtle changes in the fundus image. Spectral regression discriminant analysis (SRDA) reduces feature dimension, and minimum redundancy maximum relevance method is used to rank the significant SRDA components. Ranked features are fed to various supervised classifiers, viz. Naive Bayes, AdaBoost and support vector machine, to discriminate No DME, NCSME and clinically significant macular edema classes. The performance of our system is evaluated using the publicly available MESSIDOR dataset (300 images) and also verified with a local dataset (300 images). Our results show that HOS cumulants and bispectrum magnitude obtained an average accuracy of 95.56 and 94.39 % for MESSIDOR dataset and 95.93 and 93.33 % for local dataset, respectively.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies. © 2012 Nature America, Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical comparison of oil samples is an integral part of oil spill identification, which deals with the process of linking an oil spill with its source of origin. In current practice, a frequentist hypothesis test is often used to evaluate evidence in support of a match between a spill and a source sample. As frequentist tests are only able to evaluate evidence against a hypothesis but not in support of it, we argue that this leads to unsound statistical reasoning. Moreover, currently only verbal conclusions on a very coarse scale can be made about the match between two samples, whereas a finer quantitative assessment would often be preferred. To address these issues, we propose a Bayesian predictive approach for evaluating the similarity between the chemical compositions of two oil samples. We derive the underlying statistical model from some basic assumptions on modeling assays in analytical chemistry, and to further facilitate and improve numerical evaluations, we develop analytical expressions for the key elements of Bayesian inference for this model. The approach is illustrated with both simulated and real data and is shown to have appealing properties in comparison with both standard frequentist and Bayesian approaches

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper the issue of finding uncertainty intervals for queries in a Bayesian Network is reconsidered. The investigation focuses on Bayesian Nets with discrete nodes and finite populations. An earlier asymptotic approach is compared with a simulation-based approach, together with further alternatives, one based on a single sample of the Bayesian Net of a particular finite population size, and another which uses expected population sizes together with exact probabilities. We conclude that a query of a Bayesian Net should be expressed as a probability embedded in an uncertainty interval. Based on an investigation of two Bayesian Net structures, the preferred method is the simulation method. However, both the single sample method and the expected sample size methods may be useful and are simpler to compute. Any method at all is more useful than none, when assessing a Bayesian Net under development, or when drawing conclusions from an ‘expert’ system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Being able to accurately predict the risk of falling is crucial in patients with Parkinson’s dis- ease (PD). This is due to the unfavorable effect of falls, which can lower the quality of life as well as directly impact on survival. Three methods considered for predicting falls are decision trees (DT), Bayesian networks (BN), and support vector machines (SVM). Data on a 1-year prospective study conducted at IHBI, Australia, for 51 people with PD are used. Data processing are conducted using rpart and e1071 packages in R for DT and SVM, con- secutively; and Bayes Server 5.5 for the BN. The results show that BN and SVM produce consistently higher accuracy over the 12 months evaluation time points (average sensitivity and specificity > 92%) than DT (average sensitivity 88%, average specificity 72%). DT is prone to imbalanced data so needs to adjust for the misclassification cost. However, DT provides a straightforward, interpretable result and thus is appealing for helping to identify important items related to falls and to generate fallers’ profiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 +/- 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 +/- 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 +/- 0.06 s.e.), and ADHD and major depressive disorder (0.32 +/- 0.07 s.e.), low between schizophrenia and ASD (0.16 +/- 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We carried out a discriminant analysis with identity by descent (IBD) at each marker as inputs, and the sib pair type (affected-affected versus affected-unaffected) as the output. Using simple logistic regression for this discriminant analysis, we illustrate the importance of comparing models with different number of parameters. Such model comparisons are best carried out using either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). When AIC (or BIC) stepwise variable selection was applied to the German Asthma data set, a group of markers were selected which provide the best fit to the data (assuming an additive effect). Interestingly, these 25-26 markers were not identical to those with the highest (in magnitude) single-locus lod scores.