937 resultados para Bayesian smoothing
Resumo:
The real purpose of collecting big data is to identify causality in the hope that this will facilitate credible predictivity . But the search for causality can trap one into infinite regress, and thus one takes refuge in seeking associations between variables in data sets. Regrettably, the mere knowledge of associations does not enable predictivity. Associations need to be embedded within the framework of probability calculus to make coherent predictions. This is so because associations are a feature of probability models, and hence they do not exist outside the framework of a model. Measures of association, like correlation, regression, and mutual information merely refute a preconceived model. Estimated measures of associations do not lead to a probability model; a model is the product of pure thought. This paper discusses these and other fundamentals that are germane to seeking associations in particular, and machine learning in general. ACM Computing Classification System (1998): H.1.2, H.2.4., G.3.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015
Resumo:
2010 Mathematics Subject Classification: 94A17.
Resumo:
2000 Mathematics Subject Classification: 62E16, 65C05, 65C20.
Resumo:
2000 Mathematics Subject Classification: 62E16,62F15, 62H12, 62M20.
Resumo:
Storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then the construction of storylines by linking events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across epochs. Existing approaches often ignore the dependency of hierarchical structures in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract structured representations and evolution patterns of storylines. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.
Resumo:
Lifelong surveillance is not cost-effective after endovascular aneurysm repair (EVAR), but is required to detect aortic complications which are fatal if untreated (type 1/3 endoleak, sac expansion, device migration). Aneurysm morphology determines the probability of aortic complications and therefore the need for surveillance, but existing analyses have proven incapable of identifying patients at sufficiently low risk to justify abandoning surveillance. This study aimed to improve the prediction of aortic complications, through the application of machine-learning techniques. Patients undergoing EVAR at 2 centres were studied from 2004–2010. Aneurysm morphology had previously been studied to derive the SGVI Score for predicting aortic complications. Bayesian Neural Networks were designed using the same data, to dichotomise patients into groups at low- or high-risk of aortic complications. Network training was performed only on patients treated at centre 1. External validation was performed by assessing network performance independently of network training, on patients treated at centre 2. Discrimination was assessed by Kaplan-Meier analysis to compare aortic complications in predicted low-risk versus predicted high-risk patients. 761 patients aged 75 +/− 7 years underwent EVAR in 2 centres. Mean follow-up was 36+/− 20 months. Neural networks were created incorporating neck angu- lation/length/diameter/volume; AAA diameter/area/volume/length/tortuosity; and common iliac tortuosity/diameter. A 19-feature network predicted aor- tic complications with excellent discrimination and external validation (5-year freedom from aortic complications in predicted low-risk vs predicted high-risk patients: 97.9% vs. 63%; p < 0.0001). A Bayesian Neural-Network algorithm can identify patients in whom it may be safe to abandon surveillance after EVAR. This proposal requires prospective study.
Resumo:
Ely and Peski (2006) and Friedenberg and Meier (2010) provide examples when changing the type space behind a game, taking a "bigger" type space, induces changes of Bayesian Nash Equilibria, in other words, the Bayesian Nash Equilibrium is not invariant under type morphisms. In this paper we introduce the notion of strong type morphism. Strong type morphisms are stronger than ordinary and conditional type morphisms (Ely and Peski, 2006), and we show that Bayesian Nash Equilibria are not invariant under strong type morphisms either. We present our results in a very simple, finite setting, and conclude that there is no chance to get reasonable assumptions for Bayesian Nash Equilibria to be invariant under any kind of reasonable type morphisms.
Resumo:
In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I–IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96–100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential ‘hubs of activity’. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several ‘key genes’ may be required for the development of glioblastoma. Further studies are needed to validate these ‘key genes’ as useful tools for early detection and novel therapeutic options for these tumors.
Resumo:
This research explores Bayesian updating as a tool for estimating parameters probabilistically by dynamic analysis of data sequences. Two distinct Bayesian updating methodologies are assessed. The first approach focuses on Bayesian updating of failure rates for primary events in fault trees. A Poisson Exponentially Moving Average (PEWMA) model is implemnented to carry out Bayesian updating of failure rates for individual primary events in the fault tree. To provide a basis for testing of the PEWMA model, a fault tree is developed based on the Texas City Refinery incident which occurred in 2005. A qualitative fault tree analysis is then carried out to obtain a logical expression for the top event. A dynamic Fault Tree analysis is carried out by evaluating the top event probability at each Bayesian updating step by Monte Carlo sampling from posterior failure rate distributions. It is demonstrated that PEWMA modeling is advantageous over conventional conjugate Poisson-Gamma updating techniques when failure data is collected over long time spans. The second approach focuses on Bayesian updating of parameters in non-linear forward models. Specifically, the technique is applied to the hydrocarbon material balance equation. In order to test the accuracy of the implemented Bayesian updating models, a synthetic data set is developed using the Eclipse reservoir simulator. Both structured grid and MCMC sampling based solution techniques are implemented and are shown to model the synthetic data set with good accuracy. Furthermore, a graphical analysis shows that the implemented MCMC model displays good convergence properties. A case study demonstrates that Likelihood variance affects the rate at which the posterior assimilates information from the measured data sequence. Error in the measured data significantly affects the accuracy of the posterior parameter distributions. Increasing the likelihood variance mitigates random measurement errors, but casuses the overall variance of the posterior to increase. Bayesian updating is shown to be advantageous over deterministic regression techniques as it allows for incorporation of prior belief and full modeling uncertainty over the parameter ranges. As such, the Bayesian approach to estimation of parameters in the material balance equation shows utility for incorporation into reservoir engineering workflows.
Resumo:
Rapid development in industry have contributed to more complex systems that are prone to failure. In applications where the presence of faults may lead to premature failure, fault detection and diagnostics tools are often implemented. The goal of this research is to improve the diagnostic ability of existing FDD methods. Kernel Principal Component Analysis has good fault detection capability, however it can only detect the fault and identify few variables that have contribution on occurrence of fault and thus not precise in diagnosing. Hence, KPCA was used to detect abnormal events and the most contributed variables were taken out for more analysis in diagnosis phase. The diagnosis phase was done in both qualitative and quantitative manner. In qualitative mode, a networked-base causality analysis method was developed to show the causal effect between the most contributing variables in occurrence of the fault. In order to have more quantitative diagnosis, a Bayesian network was constructed to analyze the problem in probabilistic perspective.
Resumo:
Bayesian adaptive methods have been extensively used in psychophysics to estimate the point at which performance on a task attains arbitrary percentage levels, although the statistical properties of these estimators have never been assessed. We used simulation techniques to determine the small-sample properties of Bayesian estimators of arbitrary performance points, specifically addressing the issues of bias and precision as a function of the target percentage level. The study covered three major types of psychophysical task (yes-no detection, 2AFC discrimination and 2AFC detection) and explored the entire range of target performance levels allowed for by each task. Other factors included in the study were the form and parameters of the actual psychometric function Psi, the form and parameters of the model function M assumed in the Bayesian method, and the location of Psi within the parameter space. Our results indicate that Bayesian adaptive methods render unbiased estimators of any arbitrary point on psi only when M=Psi, and otherwise they yield bias whose magnitude can be considerable as the target level moves away from the midpoint of the range of Psi. The standard error of the estimator also increases as the target level approaches extreme values whether or not M=Psi. Contrary to widespread belief, neither the performance level at which bias is null nor that at which standard error is minimal can be predicted by the sweat factor. A closed-form expression nevertheless gives a reasonable fit to data describing the dependence of standard error on number of trials and target level, which allows determination of the number of trials that must be administered to obtain estimates with prescribed precision.
Resumo:
Fixed-step-size (FSS) and Bayesian staircases are widely used methods to estimate sensory thresholds in 2AFC tasks, although a direct comparison of both types of procedure under identical conditions has not previously been reported. A simulation study and an empirical test were conducted to compare the performance of optimized Bayesian staircases with that of four optimized variants of FSS staircase differing as to up-down rule. The ultimate goal was to determine whether FSS or Bayesian staircases are the best choice in experimental psychophysics. The comparison considered the properties of the estimates (i.e. bias and standard errors) in relation to their cost (i.e. the number of trials to completion). The simulation study showed that mean estimates of Bayesian and FSS staircases are dependable when sufficient trials are given and that, in both cases, the standard deviation (SD) of the estimates decreases with number of trials, although the SD of Bayesian estimates is always lower than that of FSS estimates (and thus, Bayesian staircases are more efficient). The empirical test did not support these conclusions, as (1) neither procedure rendered estimates converging on some value, (2) standard deviations did not follow the expected pattern of decrease with number of trials, and (3) both procedures appeared to be equally efficient. Potential factors explaining the discrepancies between simulation and empirical results are commented upon and, all things considered, a sensible recommendation is for psychophysicists to run no fewer than 18 and no more than 30 reversals of an FSS staircase implementing the 1-up/3-down rule.
Resumo:
Variants of adaptive Bayesian procedures for estimating the 5% point on a psychometric function were studied by simulation. Bias and standard error were the criteria to evaluate performance. The results indicated a superiority of (a) uniform priors, (b) model likelihood functions that are odd symmetric about threshold and that have parameter values larger than their counterparts in the psychometric function, (c) stimulus placement at the prior mean, and (d) estimates defined as the posterior mean. Unbiasedness arises in only 10 trials, and 20 trials ensure constant standard errors. The standard error of the estimates equals 0.617 times the inverse of the square root of the number of trials. Other variants yielded bias and larger standard errors.