976 resultados para NCHS data brief (Series)
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
The thesis at hand adds to the existing literature by investigating the relationship between economic growth and outward foreign direct investments (OFDI) on a set of 16 emerging countries. Two different econometric techniques are employed: a panel data regression analysis and a time-series causality analysis. Results from the regression analysis indicate a positive and significant correlation between OFDI and economic growth. Additionally, the coefficient for the OFDI variable is robust in the sense specified by the Extreme Bound Analysis (EBA). On the other hand, the findings of the causality analysis are particularly heterogeneous. The vector autoregression (VAR) and the vector error correction model (VECM) approaches identify unidirectional Granger causality running either from OFDI to GDP or from GDP to OFDI in six countries. In four economies causality among the two variables is bidirectional, whereas in five countries no causality relationship between OFDI and GDP seems to be present.
Resumo:
Travel time prediction has long been the topic of transportation research. But most relevant prediction models in the literature are limited to motorways. Travel time prediction on arterial networks is challenging due to involving traffic signals and significant variability of individual vehicle travel time. The limited availability of traffic data from arterial networks makes travel time prediction even more challenging. Recently, there has been significant interest of exploiting Bluetooth data for travel time estimation. This research analysed the real travel time data collected by the Brisbane City Council using the Bluetooth technology on arterials. Databases, including experienced average daily travel time are created and classified for approximately 8 months. Thereafter, based on data characteristics, Seasonal Auto Regressive Integrated Moving Average (SARIMA) modelling is applied on the database for short-term travel time prediction. The SARMIA model not only takes the previous continuous lags into account, but also uses the values from the same time of previous days for travel time prediction. This is carried out by defining a seasonality coefficient which improves the accuracy of travel time prediction in linear models. The accuracy, robustness and transferability of the model are evaluated through comparing the real and predicted values on three sites within Brisbane network. The results contain the detailed validation for different prediction horizons (5 min to 90 minutes). The model performance is evaluated mainly on congested periods and compared to the naive technique of considering the historical average.
Resumo:
The ILR Impact Brief series highlights the research and project based work conducted by ILR faculty that is relevant to workplace issues and public policy. Brief #1 highlights the authors' research on employment arbitration, including a survey of the National Academy of Arbitrators.
Resumo:
We develop general model-free adjustment procedures for the calculation of unbiased volatility loss functions based on practically feasible realized volatility benchmarks. The procedures, which exploit recent nonparametric asymptotic distributional results, are both easy-to-implement and highly accurate in empirically realistic situations. We also illustrate that properly accounting for the measurement errors in the volatility forecast evaluations reported in the existing literature can result in markedly higher estimates for the true degree of return volatility predictability.
Resumo:
This note develops general model-free adjustment procedures for the calculation of unbiased volatility loss functions based on practically feasible realized volatility benchmarks. The procedures, which exploit the recent asymptotic distributional results in Barndorff-Nielsen and Shephard (2002a), are both easy to implement and highly accurate in empirically realistic situations. On properly accounting for the measurement errors in the volatility forecast evaluations reported in Andersen, Bollerslev, Diebold and Labys (2003), the adjustments result in markedly higher estimates for the true degree of return-volatility predictability.
Resumo:
BACKGROUND: In clinical practice a diagnosis is based on a combination of clinical history, physical examination and additional diagnostic tests. At present, studies on diagnostic research often report the accuracy of tests without taking into account the information already known from history and examination. Due to this lack of information, together with variations in design and quality of studies, conventional meta-analyses based on these studies will not show the accuracy of the tests in real practice. By using individual patient data (IPD) to perform meta-analyses, the accuracy of tests can be assessed in relation to other patient characteristics and allows the development or evaluation of diagnostic algorithms for individual patients. In this study we will examine these potential benefits in four clinical diagnostic problems in the field of gynaecology, obstetrics and reproductive medicine. METHODS/DESIGN: Based on earlier systematic reviews for each of the four clinical problems, studies are considered for inclusion. The first authors of the included studies will be invited to participate and share their original data. After assessment of validity and completeness the acquired datasets are merged. Based on these data, a series of analyses will be performed, including a systematic comparison of the results of the IPD meta-analysis with those of a conventional meta-analysis, development of multivariable models for clinical history alone and for the combination of history, physical examination and relevant diagnostic tests and development of clinical prediction rules for the individual patients. These will be made accessible for clinicians. DISCUSSION: The use of IPD meta-analysis will allow evaluating accuracy of diagnostic tests in relation to other relevant information. Ultimately, this could increase the efficiency of the diagnostic work-up, e.g. by reducing the need for invasive tests and/or improving the accuracy of the diagnostic workup. This study will assess whether these benefits of IPD meta-analysis over conventional meta-analysis can be exploited and will provide a framework for future IPD meta-analyses in diagnostic and prognostic research.