Biblioteca Digital

We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.

Veja mais

Design, construction and evaluation of systems to predict risk in obstetrics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Veja mais

Bayesian hierarchical analysis on crash prediction models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional crash prediction models, such as generalized linear regression models, are incapable of taking into account the multilevel data structure, which extensively exists in crash data. Disregarding the possible within-group correlations can lead to the production of models giving unreliable and biased estimates of unknowns. This study innovatively proposes a -level hierarchy, viz. (Geographic region level – Traffic site level – Traffic crash level – Driver-vehicle unit level – Vehicle-occupant level) Time level, to establish a general form of multilevel data structure in traffic safety analysis. To properly model the potential cross-group heterogeneity due to the multilevel data structure, a framework of Bayesian hierarchical models that explicitly specify multilevel structure and correctly yield parameter estimates is introduced and recommended. The proposed method is illustrated in an individual-severity analysis of intersection crashes using the Singapore crash records. This study proved the importance of accounting for the within-group correlations and demonstrated the flexibilities and effectiveness of the Bayesian hierarchical method in modeling multilevel structure of traffic crash data.

Veja mais

Detecting genetic and nutritional lung cancer risk factors related to folate metabolism using Bayesian generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^

Veja mais

Use of Bayesian geostatistical prediction to estimate local variations in Schistosoma haematobium infection in western Africa

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective We aimed to predict sub-national spatial variation in numbers of people infected with Schistosoma haematobium, and associated uncertainties, in Burkina Faso, Mali and Niger, prior to implementation of national control programmes. Methods We used national field survey datasets covering a contiguous area 2,750 × 850 km, from 26,790 school-aged children (5–14 years) in 418 schools. Bayesian geostatistical models were used to predict prevalence of high and low intensity infections and associated 95% credible intervals (CrI). Numbers infected were determined by multiplying predicted prevalence by numbers of school-aged children in 1 km2 pixels covering the study area. Findings Numbers of school-aged children with low-intensity infections were: 433,268 in Burkina Faso, 872,328 in Mali and 580,286 in Niger. Numbers with high-intensity infections were: 416,009 in Burkina Faso, 511,845 in Mali and 254,150 in Niger. 95% CrIs (indicative of uncertainty) were wide; e.g. the mean number of boys aged 10–14 years infected in Mali was 140,200 (95% CrI 6200, 512,100). Conclusion National aggregate estimates for numbers infected mask important local variation, e.g. most S. haematobium infections in Niger occur in the Niger River valley. Prevalence of high-intensity infections was strongly clustered in foci in western and central Mali, north-eastern and northwestern Burkina Faso and the Niger River valley in Niger. Populations in these foci are likely to carry the bulk of the urinary schistosomiasis burden and should receive priority for schistosomiasis control. Uncertainties in predicted prevalence and numbers infected should be acknowledged and taken into consideration by control programme planners.

Veja mais

On the nature of over-dispersion in motor vehicle crash prediction models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites

Veja mais

A comparison of alternative bankruptcy prediction models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Early models of bankruptcy prediction employed financial ratios drawn from pre-bankruptcy financial statements and performed well both in-sample and out-of-sample. Since then there has been an ongoing effort in the literature to develop models with even greater predictive performance. A significant innovation in the literature was the introduction into bankruptcy prediction models of capital market data such as excess stock returns and stock return volatility, along with the application of the Black–Scholes–Merton option-pricing model. In this note, we test five key bankruptcy models from the literature using an upto- date data set and find that they each contain unique information regarding the probability of bankruptcy but that their performance varies over time. We build a new model comprising key variables from each of the five models and add a new variable that proxies for the degree of diversification within the firm. The degree of diversification is shown to be negatively associated with the risk of bankruptcy. This more general model outperforms the existing models in a variety of in-sample and out-of-sample tests.

Veja mais

The benefit of modeling jumps in realized volatility for risk prediction : evidence from Chinese mainland stocks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent literature has focused on realized volatility models to predict financial risk. This paper studies the benefit of explicitly modeling jumps in this class of models for value at risk (VaR) prediction. Several popular realized volatility models are compared in terms of their VaR forecasting performances through a Monte Carlo study and an analysis based on empirical data of eight Chinese stocks. The results suggest that careful modeling of jumps in realized volatility models can largely improve VaR prediction, especially for emerging markets where jumps play a stronger role than those in developed markets.

Veja mais

Proposal and validation of methodologies for the categorisation of continuous variables in the development of prediction models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

158 p.

Veja mais

Genetic Markers Enhance Coronary Risk Prediction in Men: The MORGAM Prospective Cohorts

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: More accurate coronary heart disease (CHD) prediction, specifically in middle-aged men, is needed to reduce the burden of disease more effectively. We hypothesised that a multilocus genetic risk score could refine CHD prediction beyond classic risk scores and obtain more precise risk estimates using a prospective cohort design.

Methods: Using data from nine prospective European cohorts, including 26,221 men, we selected in a case-cohort setting 4,818 healthy men at baseline, and used Cox proportional hazards models to examine associations between CHD and risk scores based on genetic variants representing 13 genomic regions. Over follow-up (range: 5-18 years), 1,736 incident CHD events occurred. Genetic risk scores were validated in men with at least 10 years of follow-up (632 cases, 1361 non-cases). Genetic risk score 1 (GRS1) combined 11 SNPs and two haplotypes, with effect estimates from previous genome-wide association studies. GRS2 combined 11 SNPs plus 4 SNPs from the haplotypes with coefficients estimated from these prospective cohorts using 10-fold cross-validation. Scores were added to a model adjusted for classic risk factors comprising the Framingham risk score and 10-year risks were derived.

Results: Both scores improved net reclassification (NRI) over the Framingham score (7.5%, p = 0.017 for GRS1, 6.5%, p = 0.044 for GRS2) but GRS2 also improved discrimination (c-index improvement 1.11%, p = 0.048). Subgroup analysis on men aged 50-59 (436 cases, 603 non-cases) improved net reclassification for GRS1 (13.8%) and GRS2 (12.5%). Net reclassification improvement remained significant for both scores when family history of CHD was added to the baseline model for this male subgroup improving prediction of early onset CHD events.

Conclusions: Genetic risk scores add precision to risk estimates for CHD and improve prediction beyond classic risk factors, particularly for middle aged men.

Veja mais

Risk prediction for weed infestation using classification rules

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a fuzzy classification system for the risk of infestation by weeds in agricultural zones considering the variability of weeds. The inputs of the system are features of the infestation extracted from estimated maps by kriging for the weed seed production and weed coverage, and from the competitiveness, inferred from narrow and broad-leaved weeds. Furthermore, a Bayesian network classifier is used to extract rules from data which are compared to the fuzzy rule set obtained on the base of specialist knowledge. Results for the risk inference in a maize crop field are presented and evaluated by the estimated yield loss. © 2009 IEEE.

Veja mais

Risk prediction of fever in neutropenia in children with cancer: a step towards individually tailored supportive therapy?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Fever in severe chemotherapy-induced neutropenia (FN) is the most frequent manifestation of a potentially lethal complication of current intensive chemotherapy regimens. This study aimed at establishing models predicting the risk of FN, and of FN with bacteremia, in pediatric cancer patients. METHODS: In a single-centre cohort study, characteristics potentially associated with FN and episodes of FN were retrospectively extracted from charts. Poisson regression accounting for chemotherapy exposure time was used for analysis. Prediction models were constructed based on a derivation set of two thirds of observations, and validated based on the remaining third of observations. RESULTS: In 360 pediatric cancer patients diagnosed and treated for a cumulative chemotherapy exposure time of 424 years, 629 FN were recorded (1.48 FN per patient per year, 95% confidence interval (CI), 1.37-1.61), 145 of them with bacteremia (23% of FN; 0.34; 0.29-0.40). More intensive chemotherapy, shorter time since diagnosis, bone marrow involvement, central venous access device (CVAD), and prior FN were significantly and independently associated with a higher risk to develop both FN and FN with bacteremia. The prediction models explained more than 30% of the respective risks. CONCLUSIONS: The two models predicting FN and FN with bacteremia were based on five easily accessible clinical variables. Before clinical application, they need to be validated by prospective studies.

Veja mais

830 resultados para Bayesian risk prediction models

Filtro por publicador