926 resultados para predictive accuracy


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible out- comes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple symbolic sequence classification, meaning that they extract features from traces seen as sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex symbolic sequences, that is, sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex symbolic sequences and compares their predictive accuracy on real-life business process event logs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also independent of the metric used in the continuous space. Our experiments with gene expression data show that discretization plays a crucial role regarding the resulting network structure.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Falls are common and burdensome accidents among the elderly. About one third of the population aged 65 years or more experience at least one fall each year. Fall risk assessment is believed to be beneficial for fall prevention. This thesis is about prognostic tools for falls for community-dwelling older adults. We provide an overview of the state of the art. We then take different approaches: we propose a theoretical probabilistic model to investigate some properties of prognostic tools for falls; we present a tool whose parameters were derived from data of the literature; we train and test a data-driven prognostic tool. Finally, we present some preliminary results on prediction of falls through features extracted from wearable inertial sensors. Heterogeneity in validation results are expected from theoretical considerations and are observed from empirical data. Differences in studies design hinder comparability and collaborative research. According to the multifactorial etiology of falls, assessment on multiple risk factors is needed in order to achieve good predictive accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The positive and negative predictive value are standard measures used to quantify the predictive accuracy of binary biomarkers when the outcome being predicted is also binary. When the biomarkers are instead being used to predict a failure time outcome, there is no standard way of quantifying predictive accuracy. We propose a natural extension of the traditional predictive values to accommodate censored survival data. We discuss not only quantifying predictive accuracy using these extended predictive values, but also rigorously comparing the accuracy of two biomarkers in terms of their predictive values. Using a marginal regression framework, we describe how to estimate differences in predictive accuracy and how to test whether the observed difference is statistically significant.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

OBJECTIVES: The goal of the present study was to compare the accuracy of in vivo tissue characterization obtained by intravascular ultrasound (IVUS) radiofrequency (RF) data analysis, known as Virtual Histology (VH), to the in vitro histopathology of coronary atherosclerotic plaques obtained by directional coronary atherectomy. BACKGROUND: Vulnerable plaque leading to acute coronary syndrome (ACS) has been associated with specific plaque composition, and its characterization is an important clinical focus. METHODS: Virtual histology IVUS images were performed before and after a single debulking cut using directional coronary atherectomy. Debulking region of in vivo histology image was predicted by comparing pre- and post-debulking VH images. Analysis of VH images with the corresponding tissue cross section was performed. RESULTS: Fifteen stable angina pectoris (AP) and 15 ACS patients were enrolled. The results of IVUS RF data analysis correlated well with histopathologic examination (predictive accuracy from all patients data: 87.1% for fibrous, 87.1% for fibro-fatty, 88.3% for necrotic core, and 96.5% for dense calcium regions, respectively). In addition, the frequency of necrotic core was significantly higher in the ACS group than in the stable AP group (in vitro histopathology: 22.6% vs. 12.6%, p = 0.02; in vivo virtual histology: 24.5% vs. 10.4%, p = 0.002). CONCLUSIONS: Correlation of in vivo IVUS RF data analysis with histopathology shows a high accuracy. In vivo IVUS RF data analysis is a useful modality for the classification of different types of coronary components, and may play an important role in the detection of vulnerable plaque.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We provide a comprehensive study of out-of-sample forecasts for the EUR/USD exchange rate based on multivariate macroeconomic models and forecast combinations. We use profit maximization measures based on directional accuracy and trading strategies in addition to standard loss minimization measures. When comparing predictive accuracy and profit measures, data snooping bias free tests are used. The results indicate that forecast combinations, in particular those based on principal components of forecasts, help to improve over benchmark trading strategies, although the excess return per unit of deviation is limited.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND AND AIMS: Crohn's disease (CD) is an inflammatory bowel disease (IBD) caused by a combination of genetic, clinical, and environmental factors. Identification of CD patients at high risk of requiring surgery may assist clinicians to decide on a top-down or step-up treatment approach. METHODS: We conducted a retrospective case-control analysis of a population-based cohort of 503 CD patients. A regression-based data reduction approach was used to systematically analyse 63 genomic, clinical and environmental factors for association with IBD-related surgery as the primary outcome variable. RESULTS: A multi-factor model was identified that yielded the highest predictive accuracy for need for surgery. The factors included in the model were the NOD2 genotype (OR = 1.607, P = 2.3 × 10(-5)), having ever had perianal disease (OR = 2.847, P = 4 × 10(-6)), being post-diagnosis smokers (OR = 6.312, P = 7.4 × 10(-3)), being an ex-smoker at diagnosis (OR = 2.405, P = 1.1 × 10(-3)) and age (OR = 1.012, P = 4.4 × 10(-3)). Diagnostic testing for this multi-factor model produced an area under the curve of 0.681 (P = 1 × 10(-4)) and an odds ratio of 3.169, (95 % CI P = 1 × 10(-4)) which was higher than any factor considered independently. CONCLUSIONS: The results of this study require validation in other populations but represent a step forward in the development of more accurate prognostic tests for clinicians to prescribe the most optimal treatment approach for complicated CD patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Patients with Crohn’s disease (CD) often require surgery at some stage of disease course. Prediction of CD outcome is influenced by clinical, environmental, serological, and genetic factors (eg, NOD2). Being able to identify CD patients at high risk of surgical intervention should assist clinicians to decide whether or not to prescribe early aggressive treatment with immunomodulators. Methods: We performed a retrospective analysis of selected clinical (age at diagnosis, perianal disease, active smoking) and genetic (NOD2 genotype) data obtained for a population-based CD cohort from the Canterbury Inflammatory Bowel Disease study. Logistic regression was used to identify predictors of complicated outcome in these CD patients (ie, need for inflammatory bowel disease-related surgery). Results: Perianal disease and the NOD2 genotype were the only independent factors associated with the need for surgery in this patient group (odds ratio=2.84 and 1.60, respectively). By combining the associated NOD2 genotype with perianal disease we generated a single “clinicogenetic” variable. This was strongly associated with increased risk of surgery (odds ratio=3.84, P=0.00, confidence interval, 2.28-6.46) and offered moderate predictive accuracy (positive predictive value=0.62). Approximately 1/3 of surgical outcomes in this population are attributable to the NOD2+PA variable (attributable risk=0.32). Conclusions: Knowledge of perianal disease and NOD2 genotype in patients presenting with CD may offer clinicians some decision-making utility for early diagnosis of complicated CD progression and initiating intensive treatment to avoid surgical intervention. Future studies should investigate combination effects of other genetic, clinical, and environmental factors when attempting to identify predictors of complicated CD outcomes.