955 resultados para bayes theorem
Resumo:
The number of elderly patients requiring hospitalisation in Europe is rising. With a greater proportion of elderly people in the population comes a greater demand for health services and, in particular, hospital care. Thus, with a growing number of elderly patients requiring hospitalisation competing with non-elderly patients for a fixed (and in some cases, decreasing) number of hospital beds, this results in much longer waiting times for patients, often with a less satisfactory hospital experience. However, if a better understanding of the recurring nature of elderly patient movements between the community and hospital can be developed, then it may be possible for alternative provisions of care in the community to be put in place and thus prevent readmission to hospital. The research in this paper aims to model the multiple patient transitions between hospital and community by utilising a mixture of conditional Coxian phase-type distributions that incorporates Bayes' theorem. For the purpose of demonstration, the results of a simulation study are presented and the model is applied to hospital readmission data from the Lombardy region of Italy.
Resumo:
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.
Resumo:
Tobacco smoking, alcohol drinking, and occupational exposures to polycyclic aromatic hydrocarbons are the major proven risk factors for human head and neck squamous-cell cancer (HNSCC). Major research focus on gene-environment interactions concerning HNSCC has been on genes encoding enzymes of metabolism for tobacco smoke constituents and repair enzymes. To investigate the role of genetically determined individual predispositions in enzymes of xenobiotic metabolism and in repair enzymes under the exogenous risk factor tobacco smoke in the carcinogenesis of HNSCC, we conducted a case-control study on 312 cases and 300 noncancer controls. We focused on the impact of 22 sequence variations in CYP1A1, CYP1B1, CYP2E1, ERCC2/XPD, GSTM1, GSTP1, GSTT1, NAT2, NQO1, and XRCC1. To assess relevant main and interactive effects of polymorphic genes on the susceptibility to HNSCC we used statistical models such as logic regression and a Bayesian version of logic regression. In subgroup analysis of nonsmokers, main effects in ERCC2 (Lys751Gln) C/C genotype and combined ERCC2 (Arg156Arg) C/A and A/A genotypes were predominant. When stratifying for smokers, the data revealed main effects on combined CYP1B1 (Leu432Val) C/G and G/G genotypes, followed by CYP1B1 (Leu432Val) G/G genotype and CYP2E1 (-70G>T) G/T genotype. When fitting logistic regression models including relevant main effects and interactions in smokers, we found relevant associations of CYP1B1 (Leu432Val) C/G genotype and CYP2E1 (-70G>T) G/T genotype (OR, 10.84; 95% CI, 1.64-71.53) as well as CYP1B1 (Leu432Val) G/G genotype and GSTM1 null/null genotype (OR, 11.79; 95% CI, 2.18-63.77) with HNSCC. The findings underline the relevance of genotypes of polymorphic CYP1B1 combined with exposures to tobacco smoke.
Resumo:
Background Although the detrimental impact of major depressive disorder (MDD) at the individual level has been described, its global epidemiology remains unclear given limitations in the data. Here we present the modelled epidemiological profile of MDD dealing with heterogeneity in the data, enforcing internal consistency between epidemiological parameters and making estimates for world regions with no empirical data. These estimates were used to quantify the burden of MDD for the Global Burden of Disease Study 2010 (GBD 2010). Method Analyses drew on data from our existing literature review of the epidemiology of MDD. DisMod-MR, the latest version of the generic disease modelling system redesigned as a Bayesian meta-regression tool, derived prevalence by age, year and sex for 21 regions. Prior epidemiological knowledge, study- and country-level covariates adjusted sub-optimal raw data. Results There were over 298 million cases of MDD globally at any point in time in 2010, with the highest proportion of cases occurring between 25 and 34 years. Global point prevalence was very similar across time (4.4% (95% uncertainty: 4.2–4.7%) in 1990, 4.4% (4.1–4.7%) in 2005 and 2010), but higher in females (5.5% (5.0–6.0%) compared to males (3.2% (3.0–3.6%) in 2010. Regions in conflict had higher prevalence than those with no conflict. The annual incidence of an episode of MDD followed a similar age and regional pattern to prevalence but was about one and a half times higher, consistent with an average duration of 37.7 weeks. Conclusion We were able to integrate available data, including those from high quality surveys and sub-optimal studies, into a model adjusting for known methodological sources of heterogeneity. We were also able to estimate the epidemiology of MDD in regions with no available data. This informed GBD 2010 and the public health field, with a clearer understanding of the global distribution of MDD.
Resumo:
Background Depressive disorders were a leading cause of burden in the Global Burden of Disease (GBD) 1990 and 2000 studies. Here, we analyze the burden of depressive disorders in GBD 2010 and present severity proportions, burden by country, region, age, sex, and year, as well as burden of depressive disorders as a risk factor for suicide and ischemic heart disease. Methods and Findings Burden was calculated for major depressive disorder (MDD) and dysthymia. A systematic review of epidemiological data was conducted. The data were pooled using a Bayesian meta-regression. Disability weights from population survey data quantified the severity of health loss from depressive disorders. These weights were used to calculate years lived with disability (YLDs) and disability adjusted life years (DALYs). Separate DALYs were estimated for suicide and ischemic heart disease attributable to depressive disorders.Depressive disorders were the second leading cause of YLDs in 2010. MDD accounted for 8.2% (5.9%-10.8%) of global YLDs and dysthymia for 1.4% (0.9%-2.0%). Depressive disorders were a leading cause of DALYs even though no mortality was attributed to them as the underlying cause. MDD accounted for 2.5% (1.9%-3.2%) of global DALYs and dysthymia for 0.5% (0.3%-0.6%). There was more regional variation in burden for MDD than for dysthymia; with higher estimates in females, and adults of working age. Whilst burden increased by 37.5% between 1990 and 2010, this was due to population growth and ageing. MDD explained 16 million suicide DALYs and almost 4 million ischemic heart disease DALYs. This attributable burden would increase the overall burden of depressive disorders from 3.0% (2.2%-3.8%) to 3.8% (3.0%-4.7%) of global DALYs. Conclusions GBD 2010 identified depressive disorders as a leading cause of burden. MDD was also a contributor of burden allocated to suicide and ischemic heart disease. These findings emphasize the importance of including depressive disorders as a public-health priority and implementing cost-effective interventions to reduce its burden.Please see later in the article for the Editors' Summary.
Resumo:
We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.
Resumo:
To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies. © 2012 Nature America, Inc. All rights reserved.
Resumo:
We carried out a discriminant analysis with identity by descent (IBD) at each marker as inputs, and the sib pair type (affected-affected versus affected-unaffected) as the output. Using simple logistic regression for this discriminant analysis, we illustrate the importance of comparing models with different number of parameters. Such model comparisons are best carried out using either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). When AIC (or BIC) stepwise variable selection was applied to the German Asthma data set, a group of markers were selected which provide the best fit to the data (assuming an additive effect). Interestingly, these 25-26 markers were not identical to those with the highest (in magnitude) single-locus lod scores.
Resumo:
The problem of time variant reliability analysis of existing structures subjected to stationary random dynamic excitations is considered. The study assumes that samples of dynamic response of the structure, under the action of external excitations, have been measured at a set of sparse points on the structure. The utilization of these measurements m in updating reliability models, postulated prior to making any measurements, is considered. This is achieved by using dynamic state estimation methods which combine results from Markov process theory and Bayes' theorem. The uncertainties present in measurements as well as in the postulated model for the structural behaviour are accounted for. The samples of external excitations are taken to emanate from known stochastic models and allowance is made for ability (or lack of it) to measure the applied excitations. The future reliability of the structure is modeled using expected structural response conditioned on all the measurements made. This expected response is shown to have a time varying mean and a random component that can be treated as being weakly stationary. For linear systems, an approximate analytical solution for the problem of reliability model updating is obtained by combining theories of discrete Kalman filter and level crossing statistics. For the case of nonlinear systems, the problem is tackled by combining particle filtering strategies with data based extreme value analysis. In all these studies, the governing stochastic differential equations are discretized using the strong forms of Ito-Taylor's discretization schemes. The possibility of using conditional simulation strategies, when applied external actions are measured, is also considered. The proposed procedures are exemplifiedmby considering the reliability analysis of a few low-dimensional dynamical systems based on synthetically generated measurement data. The performance of the procedures developed is also assessed based on a limited amount of pertinent Monte Carlo simulations. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.
Resumo:
Given the increasing cost of designing and building new highway pavements, reliability analysis has become vital to ensure that a given pavement performs as expected in the field. Recognizing the importance of failure analysis to safety, reliability, performance, and economy, back analysis has been employed in various engineering applications to evaluate the inherent uncertainties of the design and analysis. The probabilistic back analysis method formulated on Bayes' theorem and solved using the Markov chain Monte Carlo simulation method with a Metropolis-Hastings algorithm has proved to be highly efficient to address this issue. It is also quite flexible and is applicable to any type of prior information. In this paper, this method has been used to back-analyze the parameters that influence the pavement life and to consider the uncertainty of the mechanistic-empirical pavement design model. The load-induced pavement structural responses (e.g., stresses, strains, and deflections) used to predict the pavement life are estimated using the response surface methodology model developed based on the results of linear elastic analysis. The failure criteria adopted for the analysis were based on the factor of safety (FOS), and the study was carried out for different sample sizes and jumping distributions to estimate the most robust posterior statistics. From the posterior statistics of the case considered, it was observed that after approximately 150 million standard axle load repetitions, the mean values of the pavement properties decrease as expected, with a significant decrease in the values of the elastic moduli of the expected layers. An analysis of the posterior statistics indicated that the parameters that contribute significantly to the pavement failure were the moduli of the base and surface layer, which is consistent with the findings from other studies. After the back analysis, the base modulus parameters show a significant decrease of 15.8% and the surface layer modulus a decrease of 3.12% in the mean value. The usefulness of the back analysis methodology is further highlighted by estimating the design parameters for specified values of the factor of safety. The analysis revealed that for the pavement section considered, a reliability of 89% and 94% can be achieved by adopting FOS values of 1.5 and 2, respectively. The methodology proposed can therefore be effectively used to identify the parameters that are critical to pavement failure in the design of pavements for specified levels of reliability. DOI: 10.1061/(ASCE)TE.1943-5436.0000455. (C) 2013 American Society of Civil Engineers.
Resumo:
The study extends the first order reliability method (FORM) and inverse FORM to update reliability models for existing, statically loaded structures based on measured responses. Solutions based on Bayes' theorem, Markov chain Monte Carlo simulations, and inverse reliability analysis are developed. The case of linear systems with Gaussian uncertainties and linear performance functions is shown to be exactly solvable. FORM and inverse reliability based methods are subsequently developed to deal with more general problems. The proposed procedures are implemented by combining Matlab based reliability modules with finite element models residing on the Abaqus software. Numerical illustrations on linear and nonlinear frames are presented. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The application of Bayes' Theorem to signal processing provides a consistent framework for proceeding from prior knowledge to a posterior inference conditioned on both the prior knowledge and the observed signal data. The first part of the lecture will illustrate how the Bayesian methodology can be applied to a variety of signal processing problems. The second part of the lecture will introduce the concept of Markov Chain Monte-Carlo (MCMC) methods which is an effective approach to overcoming many of the analytical and computational problems inherent in statistical inference. Such techniques are at the centre of the rapidly developing area of Bayesian signal processing which, with the continual increase in available computational power, is likely to provide the underlying framework for most signal processing applications.
Resumo:
Formation resistivity is one of the most important parameters to be evaluated in the evaluation of reservoir. In order to acquire the true value of virginal formation, various types of resistivity logging tools have been developed. However, with the increment of the proved reserves, the thickness of interest pay zone is becoming thinner and thinner, especially in the terrestrial deposit oilfield, so that electrical logging tools, limited by the contradictory requirements of resolution and investigation depth of this kinds of tools, can not provide the true value of the formation resistivity. Therefore, resitivity inversion techniques have been popular in the determination of true formation resistivity based on the improving logging data from new tools. In geophysical inverse problems, non-unique solution is inevitable due to the noisy data and deficient measurement information. I address this problem in my dissertation from three aspects, data acquisition, data processing/inversion and applications of the results/ uncertainty evaluation of the non-unique solution. Some other problems in the traditional inversion methods such as slowness speed of the convergence and the initial-correlation results. Firstly, I deal with the uncertainties in the data to be processed. The combination of micro-spherically focused log (MSFL) and dual laterolog(DLL) is the standard program to determine formation resistivity. During the inversion, the readings of MSFL are regarded as the resistivity of invasion zone of the formation after being corrected. However, the errors can be as large as 30 percent due to mud cake influence even if the rugose borehole effects on the readings of MSFL can be ignored. Furthermore, there still are argues about whether the two logs can be quantitatively used to determine formation resisitivities due to the different measurement principles. Thus, anew type of laterolog tool is designed theoretically. The new tool can provide three curves with different investigation depths and the nearly same resolution. The resolution is about 0.4meter. Secondly, because the popular iterative inversion method based on the least-square estimation can not solve problems more than two parameters simultaneously and the new laterolog logging tool is not applied to practice, my work is focused on two parameters inversion (radius of the invasion and the resistivty of virgin information ) of traditional dual laterolog logging data. An unequal weighted damp factors- revised method is developed to instead of the parameter-revised techniques used in the traditional inversion method. In this new method, the parameter is revised not only dependency on the damp its self but also dependency on the difference between the measurement data and the fitting data in different layers. At least 2 iterative numbers are reduced than the older method, the computation cost of inversion is reduced. The damp least-squares inversion method is the realization of Tikhonov's tradeoff theory on the smooth solution and stability of inversion process. This method is realized through linearity of non-linear inversion problem which must lead to the dependency of solution on the initial value of parameters. Thus, severe debates on efficiency of this kinds of methods are getting popular with the developments of non-linear processing methods. The artificial neural net method is proposed in this dissertation. The database of tool's response to formation parameters is built through the modeling of the laterolog tool and then is used to training the neural nets. A unit model is put forward to simplify the dada space and an additional physical limitation is applied to optimize the net after the cross-validation method is done. Results show that the neural net inversion method could replace the traditional inversion method in a single formation and can be used a method to determine the initial value of the traditional method. No matter what method is developed, the non-uniqueness and uncertainties of the solution could be inevitable. Thus, it is wise to evaluate the non-uniqueness and uncertainties of the solution in the application of inversion results. Bayes theorem provides a way to solve such problems. This method is illustrately discussed in a single formation and achieve plausible results. In the end, the traditional least squares inversion method is used to process raw logging data, the calculated oil saturation increased 20 percent than that not be proceed compared to core analysis.