413 resultados para Data validation
em Queensland University of Technology - ePrints Archive
Resumo:
This dissertation develops the model of a prototype system for the digital lodgement of spatial data sets with statutory bodies responsible for the registration and approval of land related actions under the Torrens Title system. Spatial data pertain to the location of geographical entities together with their spatial dimensions and are classified as point, line, area or surface. This dissertation deals with a sub-set of spatial data, land boundary data that result from the activities performed by surveying and mapping organisations for the development of land parcels. The prototype system has been developed, utilising an event-driven paradigm for the user-interface, to exploit the potential of digital spatial data being generated from the utilisation of electronic techniques. The system provides for the creation of a digital model of the cadastral network and dependent data sets for an area of interest from hard copy records. This initial model is calibrated on registered control and updated by field survey to produce an amended model. The field-calibrated model then is electronically validated to ensure it complies with standards of format and content. The prototype system was designed specifically to create a database of land boundary data for subsequent retrieval by land professionals for surveying, mapping and related activities. Data extracted from this database are utilised for subsequent field survey operations without the need to create an initial digital model of an area of interest. Statistical reporting of differences resulting when subsequent initial and calibrated models are compared, replaces the traditional checking operations of spatial data performed by a land registry office. Digital lodgement of survey data is fundamental to the creation of the database of accurate land boundary data. This creation of the database is fundamental also to the efficient integration of accurate spatial data about land being generated by modem technology such as global positioning systems, and remote sensing and imaging, with land boundary information and other information held in Government databases. The prototype system developed provides for the delivery of accurate, digital land boundary data for the land registration process to ensure the continued maintenance of the integrity of the cadastre. Such data should meet also the more general and encompassing requirements of, and prove to be of tangible, longer term benefit to the developing, electronic land information industry.
Resumo:
The objective of this chapter is to provide an overview of traffic data collection that can and should be used for the calibration and validation of traffic simulation models. There are big differences in availability of data from different sources. Some types of data such as loop detector data are widely available and used. Some can be measured with additional effort, for example, travel time data from GPS probe vehicles. Some types such as trajectory data are available only in rare situations such as research projects.
Resumo:
Background When large scale trials are investigating the effects of interventions on appetite, it is paramount to efficiently monitor large amounts of human data. The original hand-held Electronic Appetite Ratings System (EARS) was designed to facilitate the administering and data management of visual analogue scales (VAS) of subjective appetite sensations. The purpose of this study was to validate a novel hand-held method (EARS II (HP® iPAQ)) against the standard Pen and Paper (P&P) method and the previously validated EARS. Methods Twelve participants (5 male, 7 female, aged 18-40) were involved in a fully repeated measures design. Participants were randomly assigned in a crossover design, to either high fat (>48% fat) or low fat (<28% fat) meal days, one week apart and completed ratings using the three data capture methods ordered according to Latin Square. The first set of appetite sensations was completed in a fasted state, immediately before a fixed breakfast. Thereafter, appetite sensations were completed every thirty minutes for 4h. An ad libitum lunch was provided immediately before completing a final set of appetite sensations. Results Repeated measures ANOVAs were conducted for ratings of hunger, fullness and desire to eat. There were no significant differences between P&P compared with either EARS or EARS II (p > 0.05). Correlation coefficients between P&P and EARS II, controlling for age and gender, were performed on Area Under the Curve ratings. R2 for Hunger (0.89), Fullness (0.96) and Desire to Eat (0.95) were statistically significant (p < 0.05). Conclusions EARS II was sensitive to the impact of a meal and recovery of appetite during the postprandial period and is therefore an effective device for monitoring appetite sensations. This study provides evidence and support for further validation of the novel EARS II method for monitoring appetite sensations during large scale studies. The added versatility means that future uses of the system provides the potential to monitor a range of other behavioural and physiological measures often important in clinical and free living trials.
Resumo:
This article presents the field applications and validations for the controlled Monte Carlo data generation scheme. This scheme was previously derived to assist the Mahalanobis squared distance–based damage identification method to cope with data-shortage problems which often cause inadequate data multinormality and unreliable identification outcome. To do so, real-vibration datasets from two actual civil engineering structures with such data (and identification) problems are selected as the test objects which are then shown to be in need of enhancement to consolidate their conditions. By utilizing the robust probability measures of the data condition indices in controlled Monte Carlo data generation and statistical sensitivity analysis of the Mahalanobis squared distance computational system, well-conditioned synthetic data generated by an optimal controlled Monte Carlo data generation configurations can be unbiasedly evaluated against those generated by other set-ups and against the original data. The analysis results reconfirm that controlled Monte Carlo data generation is able to overcome the shortage of observations, improve the data multinormality and enhance the reliability of the Mahalanobis squared distance–based damage identification method particularly with respect to false-positive errors. The results also highlight the dynamic structure of controlled Monte Carlo data generation that makes this scheme well adaptive to any type of input data with any (original) distributional condition.
Resumo:
In recent years, increasing focus has been made on making good business decisions utilizing the product of data analysis. With the advent of the Big Data phenomenon, this is even more apparent than ever before. But the question is how can organizations trust decisions made on the basis of results obtained from analysis of untrusted data? Assurances and trust that data and datasets that inform these decisions have not been tainted by outside agency. This study will propose enabling the authentication of datasets specifically by the extension of the RESTful architectural scheme to include authentication parameters while operating within a larger holistic security framework architecture or model compliant to legislation.
Resumo:
- This paper presents a validation proposal for development of diagnostic and prognostic algorithms for SF6 puffer circuit-breakers reproduced from actual site waveforms. The re-ignition/restriking rates are duplicated in given circuits and the cumulative energy dissipated in interrupters by the restriking currents. The targeted objective is to provide a simulated database for diagnosis of re-ignition/restrikes relating to the phase to earth voltage and the number of re-ignition/restrikes as well as estimating the remaining life of SF6 circuit-breakers. The model-based diagnosis of a tool will be useful in monitoring re-ignition/restrikes as well as predicting a nozzle’s lifetime. This will help ATP users with practical study cases and component data compilation for shunt reactor switching and capacitor switching. This method can be easily applied with different data for the different dielectric curves of circuit breakers and networks. This paper presents modelling details and some of the available cases, required project support, the validation proposal, the specific plan for implementation and the propsed main contributions.
Resumo:
The validation of Computed Tomography (CT) based 3D models takes an integral part in studies involving 3D models of bones. This is of particular importance when such models are used for Finite Element studies. The validation of 3D models typically involves the generation of a reference model representing the bones outer surface. Several different devices have been utilised for digitising a bone’s outer surface such as mechanical 3D digitising arms, mechanical 3D contact scanners, electro-magnetic tracking devices and 3D laser scanners. However, none of these devices is capable of digitising a bone’s internal surfaces, such as the medullary canal of a long bone. Therefore, this study investigated the use of a 3D contact scanner, in conjunction with a microCT scanner, for generating a reference standard for validating the internal and external surfaces of a CT based 3D model of an ovine femur. One fresh ovine limb was scanned using a clinical CT scanner (Phillips, Brilliance 64) with a pixel size of 0.4 mm2 and slice spacing of 0.5 mm. Then the limb was dissected to obtain the soft tissue free bone while care was taken to protect the bone’s surface. A desktop mechanical 3D contact scanner (Roland DG Corporation, MDX 20, Japan) was used to digitise the surface of the denuded bone. The scanner was used with the resolution of 0.3 × 0.3 × 0.025 mm. The digitised surfaces were reconstructed into a 3D model using reverse engineering techniques in Rapidform (Inus Technology, Korea). After digitisation, the distal and proximal parts of the bone were removed such that the shaft could be scanned with a microCT (µCT40, Scanco Medical, Switzerland) scanner. The shaft, with the bone marrow removed, was immersed in water and scanned with a voxel size of 0.03 mm3. The bone contours were extracted from the image data utilising the Canny edge filter in Matlab (The Mathswork).. The extracted bone contours were reconstructed into 3D models using Amira 5.1 (Visage Imaging, Germany). The 3D models of the bone’s outer surface reconstructed from CT and microCT data were compared against the 3D model generated using the contact scanner. The 3D model of the inner canal reconstructed from the microCT data was compared against the 3D models reconstructed from the clinical CT scanner data. The disparity between the surface geometries of two models was calculated in Rapidform and recorded as average distance with standard deviation. The comparison of the 3D model of the whole bone generated from the clinical CT data with the reference model generated a mean error of 0.19±0.16 mm while the shaft was more accurate(0.08±0.06 mm) than the proximal (0.26±0.18 mm) and distal (0.22±0.16 mm) parts. The comparison between the outer 3D model generated from the microCT data and the contact scanner model generated a mean error of 0.10±0.03 mm indicating that the microCT generated models are sufficiently accurate for validation of 3D models generated from other methods. The comparison of the inner models generated from microCT data with that of clinical CT data generated an error of 0.09±0.07 mm Utilising a mechanical contact scanner in conjunction with a microCT scanner enabled to validate the outer surface of a CT based 3D model of an ovine femur as well as the surface of the model’s medullary canal.
Resumo:
Objective: To examine the reliability of work-related activity coding for injury-related hospitalisations in Australia. Method: A random sample of 4373 injury-related hospital separations from 1 July 2002 to 30 June 2004 were obtained from a stratified random sample of 50 hospitals across 4 states in Australia. From this sample, cases were identified as work-related if they contained an ICD-10-AM work-related activity code (U73) allocated by either: (i) the original coder; (ii) an independent auditor, blinded to the original code; or (iii) a research assistant, blinded to both the original and auditor codes, who reviewed narrative text extracted from the medical record. The concordance of activity coding and number of cases identified as work-related using each method were compared. Results: Of the 4373 cases sampled, 318 cases were identified as being work-related using any of the three methods for identification. The original coder identified 217 and the auditor identified 266 work-related cases (68.2% and 83.6% of the total cases identified, respectively). Around 10% of cases were only identified through the text description review. The original coder and auditor agreed on the assignment of work-relatedness for 68.9% of cases. Conclusions and Implications: The current best estimates of the frequency of hospital admissions for occupational injury underestimate the burden by around 32%. This is a substantial underestimate that has major implications for public policy, and highlights the need for further work on improving the quality and completeness of routine, administrative data sources for a more complete identification of work-related injuries.
Resumo:
A method is presented for the development of a regional Landsat-5 Thematic Mapper (TM) and Landsat-7 Enhanced Thematic Mapper plus (ETM+) spectral greenness index, coherent with a six-dimensional index set, based on a single ETM+ spectral image of a reference landscape. The first three indices of the set are determined by a polar transformation of the first three principal components of the reference image and relate to scene brightness, percent foliage projective cover (FPC) and water related features. The remaining three principal components, of diminishing significance with respect to the reference image, complete the set. The reference landscape, a 2200 km2 area containing a mix of cattle pasture, native woodland and forest, is located near Injune in South East Queensland, Australia. The indices developed from the reference image were tested using TM spectral images from 19 regionally dispersed areas in Queensland, representative of dissimilar landscapes containing woody vegetation ranging from tall closed forest to low open woodland. Examples of image transformations and two-dimensional feature space plots are used to demonstrate image interpretations related to the first three indices. Coherent, sensible, interpretations of landscape features in images composed of the first three indices can be made in terms of brightness (red), foliage cover (green) and water (blue). A limited comparison is made with similar existing indices. The proposed greenness index was found to be very strongly related to FPC and insensitive to smoke. A novel Bayesian, bounded space, modelling method, was used to validate the greenness index as a good predictor of FPC. Airborne LiDAR (Light Detection and Ranging) estimates of FPC along transects of the 19 sites provided the training and validation data. Other spectral indices from the set were found to be useful as model covariates that could improve FPC predictions. They act to adjust the greenness/FPC relationship to suit different spectral backgrounds. The inclusion of an external meteorological covariate showed that further improvements to regional-scale predictions of FPC could be gained over those based on spectral indices alone.
Resumo:
The two outcome indices described in a companion paper (Sanson et al., Child Indicators Research, 2009) were developed using data from the Longitudinal Study of Australian Children (LSAC). These indices, one for infants and the other for 4 year to 5 year old children, were designed to fill the need for parsimonious measures of children’s developmental status to be used in analyses by a broad range of data users and to guide government policy and interventions to support young children’s optimal development. This paper presents evidence from Wave 1data from LSAC to support the validity of these indices and their three domain scores of Physical, Social/Emotional, and Learning. Relationships between the indices and child, maternal, family, and neighborhood factors which are known to relate concurrently to child outcomes were examined. Meaningful associations were found with the selected variables, thereby demonstrating the usefulness of the outcome indices as tools for understanding children’s development in their family and socio-cultural contexts. It is concluded that the outcome indices are valuable tools for increasing understanding of influences on children’s development, and for guiding policy and practice to optimize children’s life chances.
Resumo:
The Longitudinal Study of Australian Children (LSAC) is a major national study examining the lives of Australian children, using a cross-sequential cohort design and data from parents, children, and teachers for 5,107 infants (3–19 months) and 4,983 children (4–5 years). Its data are publicly accessible and are used by researchers from many disciplinary backgrounds. It contains multiple measures of children’s developmental outcomes as well as a broad range of information on the contexts of their lives. This paper reports on the development of summary outcome indices of child development using the LSAC data. The indices were developed to fill the need for indicators suitable for use by diverse data users in order to guide government policy and interventions which support young children’s optimal development. The concepts underpinning the indices and the methods of their development are presented. Two outcome indices (infant and child) were developed, each consisting of three domains—health and physical development, social and emotional functioning, and learning competency. A total of 16 measures are used to make up these three domains in the Outcome Index for the Child Cohort and six measures for the Infant Cohort. These measures are described and evidence supporting the structure of the domains and their underlying latent constructs is provided for both cohorts. The factorial structure of the Outcome Index is adequate for both cohorts, but was stronger for the child than infant cohort. It is concluded that the LSAC Outcome Index is a parsimonious measure representing the major components of development which is suitable for non-specialist data users. A companion paper (Sanson et al. 2010) presents evidence of the validity of the Index.
Resumo:
OBJECTIVES: To develop and validate a wandering typology. ---------- DESIGN: Cross-sectional, correlational descriptive design. ---------- SETTING:: Twenty-two nursing homes and six assisted living facilities. ---------- PARTICIPANTS: One hundred forty-two residents with dementia who spoke English, met Diagnostic and Statistical Manual for Mental Disorders, Fourth Edition, criteria for dementia, scored less than 24 on the Mini-Mental State Examination (MMSE), were ambulatory (with or without assistive device), and maintained a stable regime of psychotropic medications were studied. ---------- MEASUREMENTS: Data on wandering were collected using direct observations, plotted serially according to rate and duration to yield 21 parameters, and reduced through factor analysis to four components: high rate, high duration, low to moderate rate and duration, and time of day. Other measures included the MMSE, Minimum Data Set 2.0 mobility items, Cumulative Illness Rating Scale—Geriatric, and tympanic body temperature readings. ---------- RESULTS: Three groups of wanderers were identified through cluster analysis: classic, moderate, and subclinical. MMSE, mobility, and cardiac and upper and lower gastrointestinal problems differed between groups of wanderers and in comparison with nonwanderers. ---------- CONCLUSION: Results have implications for improving identification of wanderers and treatment of possible contributing factors.
Resumo:
Frontline employee behaviours are recognised as vital for achieving a competitive advantage for service organisations. The services marketing literature has comprehensively examined ways to improve frontline employee behaviours in service delivery and recovery. However, limited attention has been paid to frontline employee behaviours that favour customers in ways that go against organisational norms or rules. This study examines these behaviours by introducing a behavioural concept of Customer-Oriented Deviance (COD). COD is defined as, “frontline employees exhibiting extra-role behaviours that they perceive to defy existing expectations or prescribed rules of higher authority through service adaptation, communication and use of resources to benefit customers during interpersonal service encounters.” This thesis develops a COD measure and examines the key determinants of these behaviours from a frontline employee perspective. Existing research on similar behaviours that has originated in the positive deviance and pro-social behaviour domains has limitations and is considered inadequate to examine COD in the services context. The absence of a well-developed body of knowledge on non-conforming service behaviours has implications for both theory and practice. The provision of ‘special favours’ increases customer satisfaction but the over-servicing of customers is also counterproductive for the service delivery and costly for the organisation. Despite these implications of non-conforming service behaviours, there is little understanding about the nature of these behaviours and its key drivers. This research builds on inadequacies in prior research on positive deviance, pro-social and pro-customer literature to develop the theoretical foundation of COD. The concept of positive deviance which has predominantly been used to study organisational behaviours is applied within a services marketing setting. Further, it addresses previous limitations in pro-social and pro-customer behavioural literature that has examined limited forms of behaviours with no clear understanding on the nature of these behaviours. Building upon these literature streams, this research adopts a holistic approach towards the conceptualisation of COD. It addresses previous shortcomings in the literature by providing a well bounded definition, developing a psychometrically sound measure of COD and a conceptually well-founded model of COD. The concept of COD was examined across three separate studies and based on the theoretical foundations of role theory and social identity theory. Study 1 was exploratory and based on in-depth interviews using the Critical Incident Technique (CIT). The aim of Study 1 was to understand the nature of COD and qualitatively identify its key drivers. Thematic analysis was conducted to analyse the data and the two potential dimensions of COD behaviours of Deviant Service Adaptation (DSA) and Deviant Service Communication (DSC) were revealed in the analysis. In addition, themes representing the potential influences of COD were broadly classified as individual factors, situational factors, and organisational factors. Study 2 was a scale development procedure that involved the generation and purification of items for the measure based on two student samples working in customer service roles (Pilot sample, N=278; Initial validation sample, N=231). The results for the reliability and Exploratory Factor Analyses (EFA) on the pilot sample suggested the scale had poor psychometric properties. As a result, major revisions were made in terms of item wordings and new items were developed based on the literature to reflect a new dimension, Deviant Use of Resources (DUR). The revised items were tested on the initial validation sample with the EFA analysis suggesting a four-factor structure of COD. The aim of Study 3 was to further purify the COD measure and test for nomological validity based on its theoretical relationships with key antecedents and similar constructs (key correlates). The theoretical model of COD consisting of nine hypotheses was tested on a retail and hospitality sample of frontline employees (Retail N=311; Hospitality N=305) of a market research panel using an online survey. The data was analysed using Structural Equation Modelling (SEM). The results provided support for a re-specified second-order three-factor model of COD which consists of 11 items. Overall, the COD measure was found to be reliable and valid, demonstrating convergent validity, discriminant validity and marginal partial invariance for the factor loadings. The results showed support for nomological validity, although the antecedents had differing impact on COD across samples. Specifically, empathy and perspective-taking, role conflict, and job autonomy significantly influenced COD in the retail sample, whereas empathy and perspective-taking, risk-taking propensity and role conflict were significant predictors in the hospitality sample. In addition, customer orientation-selling orientation, the altruistic dimension of organisational citizenship behaviours, workplace deviance, and social desirability responding were found to correlate with COD. This research makes several contributions to theory. First, the findings of this thesis extend the literature on positive deviance, pro-social and pro-customer behaviours. Second, the research provides an empirically tested model which describes the antecedents of COD. Third, this research contributes by providing a reliable and valid measure of COD. Finally, the research investigates the differential effects of the key antecedents in different service sectors on COD. The research findings also contribute to services marketing practice. Based on the research findings, service practitioners can better understand the phenomenon of COD and utilise the measurement tool to calibrate COD levels within their organisations. Knowledge on the key determinants of COD will help improve recruitment and training programs and drive internal initiatives within the firm.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.