106 resultados para Misclassification
Resumo:
Objective: To examine the sources of coding discrepancy for injury morbidity data and explore the implications of these sources for injury surveillance.-------- Method: An on-site medical record review and recoding study was conducted for 4373 injury-related hospital admissions across Australia. Codes from the original dataset were compared to the recoded data to explore the reliability of coded data aand sources of discrepancy.---------- Results: The most common reason for differences in coding overall was assigning the case to a different external cause category with 8.5% assigned to a different category. Differences in the specificity of codes assigned within a category accounted for 7.8% of coder difference. Differences in intent assignment accounted for 3.7% of the differences in code assignment.---------- Conclusions: In the situation where 8 percent of cases are misclassified by major category, the setting of injury targets on the basis of extent of burden is a somewhat blunt instrument Monitoring the effect of prevention programs aimed at reducing risk factors is not possible in datasets with this level of misclassification error in injury cause subcategories. Future research is needed to build the evidence base around the quality and utility of the ICD classification system and application of use of this for injury surveillance in the hospital environment.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.
Resumo:
This study determined the rate and indication for revision between cemented, uncemented, hybrid and resurfacing groups from NJR (6 th edition) data. Data validity was determined by interrogating for episodes of misclassification. We identified 6,034 (2.7%) misclassified episodes, containing 97 (4.3%) revisions. Kaplan-Meier revision rates at 3 years were 0.9% cemented, 1.9% for uncemented, 1.2% for hybrids and 3.0% for resurfacings (significant difference across all groups, p<0.001, with identical pattern in patients <55 years). Regression analysis indicated both prosthesis group and age significantly influenced failure (p<0.001). Revision for pain, aseptic loosening, and malalignment were highest in uncemented and resurfacing arthroplasty. Revision for dislocation was highest in uncemented hips (significant difference between groups, p<0.001). Feedback to the NJR on data misclassification has been made for future analysis. © 2012 Wichtig Editore.
Resumo:
Background China has one of the highest suicide rates in the world; however, the recent trends in suicide have not been adequately studied. This study aimed to examine the potential changes in the rates and characteristics in a Chinese population. Methods Data on suicide deaths in 1991–2010 were extracted from the Shandong Disease Surveillance Point (DSP) mortality dataset based on ICD-10 codes. The temporal trend in age-adjusted suicide rates for each subpopulation was tested using log-linear Poisson regression analysis. Results From 1991 to 2010, there was a marked decrease in the overall suicide rate in Shandong, with an average reduction of 8% per year. The decrease trend was stronger in rural than in urban areas and more evident in females than in males. Similar decreases were observed for all age groups. Pesticide ingestion and hanging remained the top two methods for suicide. Limitations There are likely quality concerns in the morality data, such as underreporting and misclassification, as well as low accuracy in determining the underlying causes of deaths. The representativeness of the DSP system may also be problematic due to the rapid changes in economy and demography. Conclusions Completed suicides in Shandong have sharply declined over the past 20 years. Higher rates in females versus males and in rural versus urban areas, which were previously considered to be distinguishing features of suicide in China, are becoming less pronounced.
Resumo:
Objectives: To evaluate the clinical value of pre-operative serum CA125 in predicting the presence of extra-uterine disease in patients with apparent early stage endometrial cancer. Methods: Between October 6, 2005 and June 17, 2010, 760 patients were enrolled in an international, multicentre, prospective randomized trial (LACE) comparing laparotomy with laparoscopy in the management of endometrial cancer apparently confined to the uterus. This study is based on data from 657 patients with endometrial adenocarcinoma who had a pre-operative serum CA125 value, and was undertaken to correlate pre-operative serum CA125 with final stage. Results: Using a pre-operative CA-125 cutpoint of 30U/ml was associated with the smallest misclassification error (14.5%) using a multiple cross-validation method. Median pre-operative serum CA-125 was 14U/ml, and using a cutpoint of 30U/ml, 14.9% of patients had elevated CA-125 levels. Of 98 patients with elevated CA-125 level, 36 (36.7%) had evidence of extra-uterine disease. Of the 116 patients (17.7%) with evidence of extra-uterine disease, 31.0% had elevated CA-125 level. In univariate and multivariate logistic regression analysis, only pre-operative CA-125 level was found to be associated with extra-uterine spread of disease. Utilising a cutpoint of 30U/ml achieved a sensitivity, specificity, positive predictive value and negative predictive value of 31.0%, 88.5%, 36.7% and 85.7% respectively. Overall, 326/657 (49.6%) of patients had full surgical staging involving lymph node dissection. When analysis was limited to patients that had undergone full surgical staging, the outcomes remained essentially unchanged. Conclusions: Elevated CA-125 above 30U/ml in patients with apparent early stage disease is associated with a sensitivity of 31.0% and specificity of 88.5% in detecting extra-uterine disease. Pre-operative identification of this risk factor may assist to triage patients to tertiary centres and comprehensive surgical staging.
Resumo:
Purpose: Flat-detector, cone-beam computed tomography (CBCT) has enormous potential to improve the accuracy of treatment delivery in image-guided radiotherapy (IGRT). To assist radiotherapists in interpreting these images, we use a Bayesian statistical model to label each voxel according to its tissue type. Methods: The rich sources of prior information in IGRT are incorporated into a hidden Markov random field (MRF) model of the 3D image lattice. Tissue densities in the reference CT scan are estimated using inverse regression and then rescaled to approximate the corresponding CBCT intensity values. The treatment planning contours are combined with published studies of physiological variability to produce a spatial prior distribution for changes in the size, shape and position of the tumour volume and organs at risk (OAR). The voxel labels are estimated using the iterated conditional modes (ICM) algorithm. Results: The accuracy of the method has been evaluated using 27 CBCT scans of an electron density phantom (CIRS, Inc. model 062). The mean voxel-wise misclassification rate was 6.2%, with Dice similarity coefficient of 0.73 for liver, muscle, breast and adipose tissue. Conclusions: By incorporating prior information, we are able to successfully segment CBCT images. This could be a viable approach for automated, online image analysis in radiotherapy.
Resumo:
Cone-beam computed tomography (CBCT) has enormous potential to improve the accuracy of treatment delivery in image-guided radiotherapy (IGRT). To assist radiotherapists in interpreting these images, we use a Bayesian statistical model to label each voxel according to its tissue type. The rich sources of prior information in IGRT are incorporated into a hidden Markov random field model of the 3D image lattice. Tissue densities in the reference CT scan are estimated using inverse regression and then rescaled to approximate the corresponding CBCT intensity values. The treatment planning contours are combined with published studies of physiological variability to produce a spatial prior distribution for changes in the size, shape and position of the tumour volume and organs at risk. The voxel labels are estimated using iterated conditional modes. The accuracy of the method has been evaluated using 27 CBCT scans of an electron density phantom. The mean voxel-wise misclassification rate was 6.2\%, with Dice similarity coefficient of 0.73 for liver, muscle, breast and adipose tissue. By incorporating prior information, we are able to successfully segment CBCT images. This could be a viable approach for automated, online image analysis in radiotherapy.
Resumo:
Current stocks of the LCC15-MB cell line, which we originally isolated from a human breast-bone metastasis, were found to be genetically matched to the MDA-MB-435 cell line from the Lombardi Cancer Center (MDA-MB-435-LCC) using comparative genomic hybridisation, DNA microsatellite analysis and chromosomal number. LCC15-MB stocks used for our previously published studies as well as the earliest available LCC15-MB cells also showed identity to MDA-MB-435-LCC cells. The original karyotype reported for LCC15-MB cells was considerably different to that of MDA-MB-435 cells, indicating that the original LCC15-MB cells were lost to contamination by MDA-MB-435-LCC cells. Chromosome number is the simplest test to distinguish original LCC 15-MB cells (n ∼ 75) from MDA-MB-435 (n ∼ 52). Collectively, our results prove that LCC15-MB cells currently available are MDA-MB-435 cells and we suggest their re-designation as MDA-MB-435-LCC15 cells. We also review the known misclassification of breast and prostate cancer cell lines to date and have initiated a register maintained at http://www.svi.edu.au/cell_lines_registry.doc.
Resumo:
The unique physical and movement characteristics of children necessitate the development of accelerometer equations and cut points that are population specific. The purpose of this study is to develop an ecologically valid cut point for the Biotrainer Pro monitor that reflects a threshold for moderate-intensity physical activity in elementary school children. A sample of 30 children (ages 8-12) wore a Biotrainer monitor while completing a series of 7 movement tasks (calibration phase) and while participating in an organized group activity (cross-validation phase). Videotapes from each session were processed using a computerized direct-observation technique to provide a criterion measure of physical activity. Analyses involved the use of mixed-model regression and receiver operator characteristic (ROC) curves. The results indicated that a cut point of 4 counts/min provides the optimal balance between the related needs for sensitivity (accurately detecting activity) and specificity (limiting misclassification of activity as inactivity). Results with the cross-validation data demonstrated that this value yielded the best overall kappa (.58) and a high classification agreement (84%) for activity determination. The specificity of 93% demonstrates that the proposed cut point can accurately detect activity; however, the lower sensitivity value of 61% suggests that some minutes of activity might be incorrectly classified as inactivity. The cut point of 4 counts/min provides an ecologically valid cut point to capture physical activity in children using the Biotrainer Pro activity monitor.
Provincial mortality in South Africa, 2000 - priority-setting for now and a benchmark for the future
Resumo:
Background. Cause-of-death statistics are an essential component of health information. Despite improvements, underregistration and misclassification of causes make it difficult to interpret the official death statistics. Objective. To estimate consistent cause-specific death rates for the year 2000 and to identify the leading causes of death and premature mortality in the provinces. Methods. Total number of deaths and population size were estimated using the Actuarial Society of South Africa ASSA2000 AIDS and demographic model. Cause-of-death profiles based on Statistics South Africa's 15% sample, adjusted for misclassification of deaths due to ill-defined causes and AIDS deaths due to indicator conditions, were applied to the total deaths by age and sex. Age-standardised rates and years of life lost were calculated using age weighting and discounting. Results. Life expectancy in KwaZulu-Natal and Mpumalanga is about 10 years lower than that in the Western Cape, the province with the lowest mortality rate. HIV/AIDS is the leading cause of premature mortality for all provinces. Mortality due to pre-transitional causes, such as diarrhoea, is more pronounced in the poorer and more rural provinces. In contrast, non-communicable disease mortality is similar across all provinces, although the cause profiles differ. Injury mortality rates are particularly high in provinces with large metropolitan areas and in Mpumalanga. Conclusion. The quadruple burden experienced in all provinces requires a broad range of interventions, including improved access to health care; ensuring that basic needs such as those related to water and sanitation are met; disease and injury prevention; and promotion of a healthy lifestyle. High death rates as a result of HIV/AIDS highlight the urgent need to accelerate the implementation of the treatment and prevention plan. In addition, there is an urgent need to improve the cause-of-death data system to provide reliable cause-of-death statistics at health district level.
Resumo:
We propose expected attainable discrimination (EAD) as a measure to select discrete valued features for reliable discrimination between two classes of data. EAD is an average of the area under the ROC curves obtained when a simple histogram probability density model is trained and tested on many random partitions of a data set. EAD can be incorporated into various stepwise search methods to determine promising subsets of features, particularly when misclassification costs are difficult or impossible to specify. Experimental application to the problem of risk prediction in pregnancy is described.
Resumo:
A novel shape recognition algorithm was developed to autonomously classify the Northern Pacific Sea Star (Asterias amurenis) from benthic images that were collected by the Starbug AUV during 6km of transects in the Derwent estuary. Despite the effects of scattering, attenuation, soft focus and motion blur within the underwater images, an optimal joint classification rate of 77.5% and misclassification rate of 13.5% was achieved. The performance of algorithm was largely attributed to its ability to recognise locally deformed sea star shapes that were created during the segmentation of the distorted images.
An external field prior for the hidden Potts model with application to cone-beam computed tomography
Resumo:
In images with low contrast-to-noise ratio (CNR), the information gain from the observed pixel values can be insufficient to distinguish foreground objects. A Bayesian approach to this problem is to incorporate prior information about the objects into a statistical model. A method for representing spatial prior information as an external field in a hidden Potts model is introduced. This prior distribution over the latent pixel labels is a mixture of Gaussian fields, centred on the positions of the objects at a previous point in time. It is particularly applicable in longitudinal imaging studies, where the manual segmentation of one image can be used as a prior for automatic segmentation of subsequent images. The method is demonstrated by application to cone-beam computed tomography (CT), an imaging modality that exhibits distortions in pixel values due to X-ray scatter. The external field prior results in a substantial improvement in segmentation accuracy, reducing the mean pixel misclassification rate for an electron density phantom from 87% to 6%. The method is also applied to radiotherapy patient data, demonstrating how to derive the external field prior in a clinical context.