145 resultados para Bayesian risk prediction models
em Queensland University of Technology - ePrints Archive
Resumo:
We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.
Resumo:
Traditional crash prediction models, such as generalized linear regression models, are incapable of taking into account the multilevel data structure, which extensively exists in crash data. Disregarding the possible within-group correlations can lead to the production of models giving unreliable and biased estimates of unknowns. This study innovatively proposes a -level hierarchy, viz. (Geographic region level – Traffic site level – Traffic crash level – Driver-vehicle unit level – Vehicle-occupant level) Time level, to establish a general form of multilevel data structure in traffic safety analysis. To properly model the potential cross-group heterogeneity due to the multilevel data structure, a framework of Bayesian hierarchical models that explicitly specify multilevel structure and correctly yield parameter estimates is introduced and recommended. The proposed method is illustrated in an individual-severity analysis of intersection crashes using the Singapore crash records. This study proved the importance of accounting for the within-group correlations and demonstrated the flexibilities and effectiveness of the Bayesian hierarchical method in modeling multilevel structure of traffic crash data.
Resumo:
Objective We aimed to predict sub-national spatial variation in numbers of people infected with Schistosoma haematobium, and associated uncertainties, in Burkina Faso, Mali and Niger, prior to implementation of national control programmes. Methods We used national field survey datasets covering a contiguous area 2,750 × 850 km, from 26,790 school-aged children (5–14 years) in 418 schools. Bayesian geostatistical models were used to predict prevalence of high and low intensity infections and associated 95% credible intervals (CrI). Numbers infected were determined by multiplying predicted prevalence by numbers of school-aged children in 1 km2 pixels covering the study area. Findings Numbers of school-aged children with low-intensity infections were: 433,268 in Burkina Faso, 872,328 in Mali and 580,286 in Niger. Numbers with high-intensity infections were: 416,009 in Burkina Faso, 511,845 in Mali and 254,150 in Niger. 95% CrIs (indicative of uncertainty) were wide; e.g. the mean number of boys aged 10–14 years infected in Mali was 140,200 (95% CrI 6200, 512,100). Conclusion National aggregate estimates for numbers infected mask important local variation, e.g. most S. haematobium infections in Niger occur in the Niger River valley. Prevalence of high-intensity infections was strongly clustered in foci in western and central Mali, north-eastern and northwestern Burkina Faso and the Niger River valley in Niger. Populations in these foci are likely to carry the bulk of the urinary schistosomiasis burden and should receive priority for schistosomiasis control. Uncertainties in predicted prevalence and numbers infected should be acknowledged and taken into consideration by control programme planners.
Resumo:
Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites
Resumo:
Early models of bankruptcy prediction employed financial ratios drawn from pre-bankruptcy financial statements and performed well both in-sample and out-of-sample. Since then there has been an ongoing effort in the literature to develop models with even greater predictive performance. A significant innovation in the literature was the introduction into bankruptcy prediction models of capital market data such as excess stock returns and stock return volatility, along with the application of the Black–Scholes–Merton option-pricing model. In this note, we test five key bankruptcy models from the literature using an upto- date data set and find that they each contain unique information regarding the probability of bankruptcy but that their performance varies over time. We build a new model comprising key variables from each of the five models and add a new variable that proxies for the degree of diversification within the firm. The degree of diversification is shown to be negatively associated with the risk of bankruptcy. This more general model outperforms the existing models in a variety of in-sample and out-of-sample tests.
Resumo:
Recent literature has focused on realized volatility models to predict financial risk. This paper studies the benefit of explicitly modeling jumps in this class of models for value at risk (VaR) prediction. Several popular realized volatility models are compared in terms of their VaR forecasting performances through a Monte Carlo study and an analysis based on empirical data of eight Chinese stocks. The results suggest that careful modeling of jumps in realized volatility models can largely improve VaR prediction, especially for emerging markets where jumps play a stronger role than those in developed markets.
Resumo:
Predicting safety on roadways is standard practice for road safety professionals and has a corresponding extensive literature. The majority of safety prediction models are estimated using roadway segment and intersection (microscale) data, while more recently efforts have been undertaken to predict safety at the planning level (macroscale). Safety prediction models typically include roadway, operations, and exposure variables—factors known to affect safety in fundamental ways. Environmental variables, in particular variables attempting to capture the effect of rain on road safety, are difficult to obtain and have rarely been considered. In the few cases weather variables have been included, historical averages rather than actual weather conditions during which crashes are observed have been used. Without the inclusion of weather related variables researchers have had difficulty explaining regional differences in the safety performance of various entities (e.g. intersections, road segments, highways, etc.) As part of the NCHRP 8-44 research effort, researchers developed PLANSAFE, or planning level safety prediction models. These models make use of socio-economic, demographic, and roadway variables for predicting planning level safety. Accounting for regional differences - similar to the experience for microscale safety models - has been problematic during the development of planning level safety prediction models. More specifically, without weather related variables there is an insufficient set of variables for explaining safety differences across regions and states. Furthermore, omitted variable bias resulting from excluding these important variables may adversely impact the coefficients of included variables, thus contributing to difficulty in model interpretation and accuracy. This paper summarizes the results of an effort to include weather related variables, particularly various measures of rainfall, into accident frequency prediction and the prediction of the frequency of fatal and/or injury degree of severity crash models. The purpose of the study was to determine whether these variables do in fact improve overall goodness of fit of the models, whether these variables may explain some or all of observed regional differences, and identifying the estimated effects of rainfall on safety. The models are based on Traffic Analysis Zone level datasets from Michigan, and Pima and Maricopa Counties in Arizona. Numerous rain-related variables were found to be statistically significant, selected rain related variables improved the overall goodness of fit, and inclusion of these variables reduced the portion of the model explained by the constant in the base models without weather variables. Rain tends to diminish safety, as expected, in fairly complex ways, depending on rain frequency and intensity.
Resumo:
A study was done to develop macrolevel crash prediction models that can be used to understand and identify effective countermeasures for improving signalized highway intersections and multilane stop-controlled highway intersections in rural areas. Poisson and negative binomial regression models were fit to intersection crash data from Georgia, California, and Michigan. To assess the suitability of the models, several goodness-of-fit measures were computed. The statistical models were then used to shed light on the relationships between crash occurrence and traffic and geometric features of the rural signalized intersections. The results revealed that traffic flow variables significantly affected the overall safety performance of the intersections regardless of intersection type and that the geometric features of intersections varied across intersection type and also influenced crash type.
Resumo:
Most of existing motorway traffic safety studies using disaggregate traffic flow data aim at developing models for identifying real-time traffic risks by comparing pre-crash and non-crash conditions. One of serious shortcomings in those studies is that non-crash conditions are arbitrarily selected and hence, not representative, i.e. selected non-crash data might not be the right data comparable with pre-crash data; the non-crash/pre-crash ratio is arbitrarily decided and neglects the abundance of non-crash over pre-crash conditions; etc. Here, we present a methodology for developing a real-time MotorwaY Traffic Risk Identification Model (MyTRIM) using individual vehicle data, meteorological data, and crash data. Non-crash data are clustered into groups called traffic regimes. Thereafter, pre-crash data are classified into regimes to match with relevant non-crash data. Among totally eight traffic regimes obtained, four highly risky regimes were identified; three regime-based Risk Identification Models (RIM) with sufficient pre-crash data were developed. MyTRIM memorizes the latest risk evolution identified by RIM to predict near future risks. Traffic practitioners can decide MyTRIM’s memory size based on the trade-off between detection and false alarm rates. Decreasing the memory size from 5 to 1 precipitates the increase of detection rate from 65.0% to 100.0% and of false alarm rate from 0.21% to 3.68%. Moreover, critical factors in differentiating pre-crash and non-crash conditions are recognized and usable for developing preventive measures. MyTRIM can be used by practitioners in real-time as an independent tool to make online decision or integrated with existing traffic management systems.
Resumo:
This paper examines the impact of allowing for stochastic volatility and jumps (SVJ) in a structural model on corporate credit risk prediction. The results from a simulation study verify the better performance of the SVJ model compared with the commonly used Merton model, and three sources are provided to explain the superiority. The empirical analysis on two real samples further ascertains the importance of recognizing the stochastic volatility and jumps by showing that the SVJ model decreases bias in spread prediction from the Merton model, and better explains the time variation in actual CDS spreads. The improvements are found particularly apparent in small firms or when the market is turbulent such as the recent financial crisis.
Resumo:
This study compares Value-at-Risk (VaR) measures for Australian banks over a period that includes the Global Financial Crisis (GFC) to determine whether the methodology and parameter selection are important for capital adequacy holdings that will ultimately support a bank in a crisis period. VaR methodology promoted under Basel II was largely criticised during the GFC for its failure to capture downside risk. However, results from this study indicate that 1-year parametric and historical models produce better measures of VaR than models with longer time frames. VaR estimates produced using Monte Carlo simulations show a high percentage of violations but with lower average magnitude of a violation when they occur. VaR estimates produced by the ARMA GARCH model also show a relatively high percentage of violations, however, the average magnitude of a violation is quite low. Our findings support the design of the revised Basel II VaR methodology which has also been adopted under Basel III.