327 resultados para Data Accuracy
Dynamic analysis of on-board mass data to determine tampering in heavy vehicle on-board mass systems
Resumo:
Transport Certification Australia Limited, jointly with the National Transport Commission, has undertaken a project to investigate the feasibility of on-board mass monitoring (OBM) devices for regulatory purposes. OBM increases jurisdictional confidence in operational heavy vehicle compliance. This paper covers technical issues regarding potential use of dynamic data from OBM systems to indicate that tampering has occurred. Tamper-evidence and accuracy of current OBM systems needed to be determined before any regulatory schemes were put in place for its use. Tests performed to determine potential for, and ease of, tampering. An algorithm was developed to detect tamper events. Its results are detailed.
Resumo:
A pragmatic method for assessing the accuracy and precision of a given processing pipeline required for converting computed tomography (CT) image data of bones into representative three dimensional (3D) models of bone shapes is proposed. The method is based on coprocessing a control object with known geometry which enables the assessment of the quality of resulting 3D models. At three stages of the conversion process, distance measurements were obtained and statistically evaluated. For this study, 31 CT datasets were processed. The final 3D model of the control object contained an average deviation from reference values of −1.07±0.52 mm standard deviation (SD) for edge distances and −0.647±0.43 mm SD for parallel side distances of the control object. Coprocessing a reference object enables the assessment of the accuracy and precision of a given processing pipeline for creating CTbased 3D bone models and is suitable for detecting most systematic or human errors when processing a CT-scan. Typical errors have about the same size as the scan resolution.
Resumo:
Habitat models are widely used in ecology, however there are relatively few studies of rare species, primarily because of a paucity of survey records and lack of robust means of assessing accuracy of modelled spatial predictions. We investigated the potential of compiled ecological data in developing habitat models for Macadamia integrifolia, a vulnerable mid-stratum tree endemic to lowland subtropical rainforests of southeast Queensland, Australia. We compared performance of two binomial models—Classification and Regression Trees (CART) and Generalised Additive Models (GAM)—with Maximum Entropy (MAXENT) models developed from (i) presence records and available absence data and (ii) developed using presence records and background data. The GAM model was the best performer across the range of evaluation measures employed, however all models were assessed as potentially useful for informing in situ conservation of M. integrifolia, A significant loss in the amount of M. integrifolia habitat has occurred (p < 0.05), with only 37% of former habitat (pre-clearing) remaining in 2003. Remnant patches are significantly smaller, have larger edge-to-area ratios and are more isolated from each other compared to pre-clearing configurations (p < 0.05). Whilst the network of suitable habitat patches is still largely intact, there are numerous smaller patches that are more isolated in the contemporary landscape compared with their connectedness before clearing. These results suggest that in situ conservation of M. integrifolia may be best achieved through a landscape approach that considers the relative contribution of small remnant habitat fragments to the species as a whole, as facilitating connectivity among the entire network of habitat patches.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Aims: To develop clinical protocols for acquiring PET images, performing CT-PET registration and tumour volume definition based on the PET image data, for radiotherapy for lung cancer patients and then to test these protocols with respect to levels of accuracy and reproducibility. Method: A phantom-based quality assurance study of the processes associated with using registered CT and PET scans for tumour volume definition was conducted to: (1) investigate image acquisition and manipulation techniques for registering and contouring CT and PET images in a radiotherapy treatment planning system, and (2) determine technology-based errors in the registration and contouring processes. The outcomes of the phantom image based quality assurance study were used to determine clinical protocols. Protocols were developed for (1) acquiring patient PET image data for incorporation into the 3DCRT process, particularly for ensuring that the patient is positioned in their treatment position; (2) CT-PET image registration techniques and (3) GTV definition using the PET image data. The developed clinical protocols were tested using retrospective clinical trials to assess levels of inter-user variability which may be attributed to the use of these protocols. A Siemens Somatom Open Sensation 20 slice CT scanner and a Philips Allegro stand-alone PET scanner were used to acquire the images for this research. The Philips Pinnacle3 treatment planning system was used to perform the image registration and contouring of the CT and PET images. Results: Both the attenuation-corrected and transmission images obtained from standard whole-body PET staging clinical scanning protocols were acquired and imported into the treatment planning system for the phantom-based quality assurance study. Protocols for manipulating the PET images in the treatment planning system, particularly for quantifying uptake in volumes of interest and window levels for accurate geometric visualisation were determined. The automatic registration algorithms were found to have sub-voxel levels of accuracy, with transmission scan-based CT-PET registration more accurate than emission scan-based registration of the phantom images. Respiration induced image artifacts were not found to influence registration accuracy while inadequate pre-registration over-lap of the CT and PET images was found to result in large registration errors. A threshold value based on a percentage of the maximum uptake within a volume of interest was found to accurately contour the different features of the phantom despite the lower spatial resolution of the PET images. Appropriate selection of the threshold value is dependant on target-to-background ratios and the presence of respiratory motion. The results from the phantom-based study were used to design, implement and test clinical CT-PET fusion protocols. The patient PET image acquisition protocols enabled patients to be successfully identified and positioned in their radiotherapy treatment position during the acquisition of their whole-body PET staging scan. While automatic registration techniques were found to reduce inter-user variation compared to manual techniques, there was no significant difference in the registration outcomes for transmission or emission scan-based registration of the patient images, using the protocol. Tumour volumes contoured on registered patient CT-PET images using the tested threshold values and viewing windows determined from the phantom study, demonstrated less inter-user variation for the primary tumour volume contours than those contoured using only the patient’s planning CT scans. Conclusions: The developed clinical protocols allow a patient’s whole-body PET staging scan to be incorporated, manipulated and quantified in the treatment planning process to improve the accuracy of gross tumour volume localisation in 3D conformal radiotherapy for lung cancer. Image registration protocols which factor in potential software-based errors combined with adequate user training are recommended to increase the accuracy and reproducibility of registration outcomes. A semi-automated adaptive threshold contouring technique incorporating a PET windowing protocol, accurately defines the geometric edge of a tumour volume using PET image data from a stand alone PET scanner, including 4D target volumes.
Resumo:
Maintenance activities in a large-scale engineering system are usually scheduled according to the lifetimes of various components in order to ensure the overall reliability of the system. Lifetimes of components can be deduced by the corresponding probability distributions with parameters estimated from past failure data. While failure data of the components is not always readily available, the engineers have to be content with the primitive information from the manufacturers only, such as the mean and standard deviation of lifetime, to plan for the maintenance activities. In this paper, the moment-based piecewise polynomial model (MPPM) are proposed to estimate the parameters of the reliability probability distribution of the products when only the mean and standard deviation of the product lifetime are known. This method employs a group of polynomial functions to estimate the two parameters of the Weibull Distribution according to the mathematical relationship between the shape parameter of two-parameters Weibull Distribution and the ratio of mean and standard deviation. Tests are carried out to evaluate the validity and accuracy of the proposed methods with discussions on its suitability of applications. The proposed method is particularly useful for reliability-critical systems, such as railway and power systems, in which the maintenance activities are scheduled according to the expected lifetimes of the system components.
Resumo:
Autonomous underwater gliders are robust and widely-used ocean sampling platforms that are characterized by their endurance, and are one of the best approaches to gather subsurface data at the appropriate spatial resolution to advance our knowledge of the ocean environment. Gliders generally do not employ sophisticated sensors for underwater localization, but instead dead-reckon between set waypoints. Thus, these vehicles are subject to large positional errors between prescribed and actual surfacing locations. Here, we investigate the implementation of a large-scale, regional ocean model into the trajectory design for autonomous gliders to improve their navigational accuracy. We compute the dead-reckoning error for our Slocum gliders, and compare this to the average positional error recorded from multiple deployments conducted over the past year. We then compare trajectory plans computed on-board the vehicle during recent deployments to our prediction-based trajectory plans for 140 surfacing occurrences.
Applying incremental EM to Bayesian classifiers in the learning of hyperspectral remote sensing data
Resumo:
In this paper, we apply the incremental EM method to Bayesian Network Classifiers to learn and interpret hyperspectral sensor data in robotic planetary missions. Hyperspectral image spectroscopy is an emerging technique for geological investigations from airborne or orbital sensors. Many spacecraft carry spectroscopic equipment as wavelengths outside the visible light in the electromagnetic spectrum give much greater information about an object. The algorithm used is an extension to the standard Expectation Maximisation (EM). The incremental method allows us to learn and interpret the data as they become available. Two Bayesian network classifiers were tested: the Naive Bayes, and the Tree-Augmented-Naive Bayes structures. Our preliminary experiments show that incremental learning with unlabelled data can improve the accuracy of the classifier.
Resumo:
Objective: to assess the accuracy of data linkage across the spectrum of emergency care in the absence of a unique patient identifier, and to use the linked data to examine service delivery outcomes in an emergency department setting. Design: automated data linkage and manual data linkage were compared to determine their relative accuracy. Data were extracted from three separate health information systems: ambulance, ED and hospital inpatients, then linked to provide information about the emergency journey of each patient. The linking was done manually through physical review of records and automatically using a data linking tool (Health Data Integration) developed by the CSIRO. Match rate and quality of the linking were compared. Setting: 10, 835 patient presentations to a large, regional teaching hospital ED over a two month period (August-September 2007). Results: comparison of the manual and automated linkage outcomes for each pair of linked datasets demonstrated a sensitivity of between 95% and 99%; a specificity of between 75% and 99%; and a positive predictive value of between 88% and 95%. Conclusions: Our results indicate that automated linking provides a sound basis for health service analysis, even in the absence of a unique patient identifier. The use of an automated linking tool yields accurate data suitable for planning and service delivery purposes and enables the data to be linked regularly to examine service delivery outcomes.
Resumo:
Intelligible and accurate risk-based decision-making requires a complex balance of information from different sources, appropriate statistical analysis of this information and consequent intelligent inference and decisions made on the basis of these analyses. Importantly, this requires an explicit acknowledgement of uncertainty in the inputs and outputs of the statistical model. The aim of this paper is to progress a discussion of these issues in the context of several motivating problems related to the wider scope of agricultural production. These problems include biosecurity surveillance design, pest incursion, environmental monitoring and import risk assessment. The information to be integrated includes observational and experimental data, remotely sensed data and expert information. We describe our efforts in addressing these problems using Bayesian models and Bayesian networks. These approaches provide a coherent and transparent framework for modelling complex systems, combining the different information sources, and allowing for uncertainty in inputs and outputs. While the theory underlying Bayesian modelling has a long and well established history, its application is only now becoming more possible for complex problems, due to increased availability of methodological and computational tools. Of course, there are still hurdles and constraints, which we also address through sharing our endeavours and experiences.
Resumo:
Road safety is a major concern worldwide. Road safety will improve as road conditions and their effects on crashes are continually investigated. This paper proposes to use the capability of data mining to include the greater set of road variables for all available crashes with skid resistance values across the Queensland state main road network in order to understand the relationships among crash, traffic and road variables. This paper presents a data mining based methodology for the road asset management data to find out the various road properties that contribute unduly to crashes. The models demonstrate high levels of accuracy in predicting crashes in roads when various road properties are included. This paper presents the findings of these models to show the relationships among skid resistance, crashes, crash characteristics and other road characteristics such as seal type, seal age, road type, texture depth, lane count, pavement width, rutting, speed limit, traffic rates intersections, traffic signage and road design and so on.
Resumo:
Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches. Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.
Resumo:
Orthopaedic fracture fixation implants are increasingly being designed using accurate 3D models of long bones based on computer tomography (CT). Unlike CT, magnetic resonance imaging (MRI) does not involve ionising radiation and is therefore a desirable alternative to CT. This study aims to quantify the accuracy of MRI-based 3D models compared to CT-based 3D models of long bones. The femora of five intact cadaver ovine limbs were scanned using a 1.5T MRI and a CT scanner. Image segmentation of CT and MRI data was performed using a multi-threshold segmentation method. Reference models were generated by digitising the bone surfaces free of soft tissue with a mechanical contact scanner. The MRI- and CT-derived models were validated against the reference models. The results demonstrated that the CT-based models contained an average error of 0.15mm while the MRI-based models contained an average error of 0.23mm. Statistical validation shows that there are no significant differences between 3D models based on CT and MRI data. These results indicate that the geometric accuracy of MRI based 3D models was comparable to that of CT-based models and therefore MRI is a potential alternative to CT for generation of 3D models with high geometric accuracy.
Resumo:
Acoustic sensors play an important role in augmenting the traditional biodiversity monitoring activities carried out by ecologists and conservation biologists. With this ability however comes the burden of analysing large volumes of complex acoustic data. Given the complexity of acoustic sensor data, fully automated analysis for a wide range of species is still a significant challenge. This research investigates the use of citizen scientists to analyse large volumes of environmental acoustic data in order to identify bird species. Specifically, it investigates ways in which the efficiency of a user can be improved through the use of species identification tools and the use of reputation models to predict the accuracy of users with unidentified skill levels. Initial experimental results are reported.
Resumo:
High levels of sitting have been linked with poor health outcomes. Previously a pragmatic MTI accelerometer data cut-point (100 count/min-1) has been used to estimate sitting. Data on the accuracy of this cut-point is unavailable. PURPOSE: To ascertain whether the 100 count/min-1 cut-point accurately isolates sitting from standing activities. METHODS: Participants fitted with an MTI accelerometer were observed performing a range of sitting, standing, light & moderate activities. 1-min epoch MTI data were matched to observed activities, then re-categorized as either sitting or not using the 100 count/min-1 cut-point. Self-report demographics and current physical activity were collected. Generalized estimating equation for repeated measures with a binary logistic model analyses (GEE), corrected for age, gender and BMI, were conducted to ascertain the odds of the MTI data being misclassified. RESULTS: Data were from 26 healthy subjects (8 men; 50% aged <25 years; mean BMI (SD) 22.7(3.8)m/kg2). MTI sitting and standing data mode was 0 count/min-1, with 46% of sitting activities and 21% of standing activities recording 0 count/min-1. The GEE was unable to accurately isolate sitting from standing activities using the 100 count/min-1 cut-point, since all sitting activities were incorrectly predicted as standing (p=0.05). To further explore the sensitivity of MTI data to delineate sitting from standing, the upper 95% confidence interval of the mean for the sitting activities (46 count/min-1) was used to re-categorise the data; this resulted in the GEE correctly classifying 49% of sitting, and 69% of standing activities. Using the 100 count/min-1 cut-point the data were re-categorised into a combined ‘sit/stand’ category and tested against other light activities: 88% of sit/stand and 87% of light activities were accurately predicted. Using Freedson’s moderate cut-point of 1952 count/min-1 the GEE accurately predicted 97% of light vs. 90% of moderate activities. CONCLUSION: The distributions of MTI recorded sitting and standing data overlap considerably, as such the 100 count/min -1 cut-point did not accurately isolate sitting from other static standing activities. The 100 count/min -1 cut-point more accurately predicted sit/stand vs. other movement orientated activities.