953 resultados para DATA QUALITY
Resumo:
© Springer International Publishing Switzerland 2015. Making sound asset management decisions, such as whether to replace or maintain an ageing underground water pipe, are critical to ensure that organisations maximise the performance of their assets. These decisions are only as good as the data that supports them, and hence many asset management organisations are in desperate need to improve the quality of their data. This chapter reviews the key academic research on data quality (DQ) and Information Quality (IQ) (used interchangeably in this chapter) in asset management, combines this with the current DQ problems faced by asset management organisations in various business sectors, and presents a classification of the most important DQ problems that need to be tackled by asset management organisations. In this research, eleven semi structured interviews were carried out with asset management professionals in a range of business sectors in the UK. The problems described in the academic literature were cross checked against the problems found in industry. In order to support asset management professionals in solving these problems, we categorised them into seven different DQ dimensions, used in the academic literature, so that it is clear how these problems fit within the standard frameworks for assessing and improving data quality. Asset management professionals can therefore now use these frameworks to underpin their DQ improvement initiatives while focussing on the most critical DQ problems.
Resumo:
BACKGROUND: Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. METHODS AND PRINCIPAL FINDINGS: The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. CONCLUSIONS: Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks.
Resumo:
Background: SPARCLE is a cross-sectional survey in nine European regions, examining the relationship of the environment of children with cerebral palsy to their participation and quality of life. The objective of this report is to assess data quality, in particular heterogeneity between regions, family and item non-response and potential for bias. Methods: 1,174 children aged 8–12 years were selected from eight population-based registers of children with cerebral palsy; one further centre recruited 75 children from multiple sources. Families were visited by trained researchers who administered psychometric questionnaires. Logistic regression was used to assess factors related to family non-response and self-completion of questionnaires by children. Results: 431/1,174 (37%) families identified from registers did not respond: 146 (12%) were not traced; of the 1,028 traced families, 250 (24%) declined to participate and 35 (3%) were not approached. Families whose disabled children could walk unaided were more likely to decline to participate. 818 children entered the study of which 500 (61%) self-reported their quality of life; children with low IQ, seizures or inability to walk were less likely to self-report. There was substantial heterogeneity between regions in response rates and socio-demographic characteristics of families but not in age or gender of children. Item non-response was 2% for children and ranged from 0.4% to 5% for questionnaires completed by parents. Conclusion: While the proportion of untraced families was higher than in similar surveys, the refusal rate was comparable. To reduce bias, all analyses should allow for region, walking ability, age and socio-demographic characteristics. The 75 children in the region without a population based register are unlikely to introduce bias
Resumo:
Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.
Resumo:
Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.
Resumo:
To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.
Resumo:
Includes bibliography
Resumo:
Data on antimicrobial use play a key role in the development of policies for the containment of antimicrobial resistance. On-farm data could provide a detailed overview of the antimicrobial use, but technical and methodological aspects of data collection and interpretation, as well as data quality need to be further assessed. The aims of this study were (1) to quantify antimicrobial use in the study population using different units of measurement and contrast the results obtained, (2) to evaluate data quality of farm records on antimicrobial use, and (3) to compare data quality of different recording systems. During 1 year, data on antimicrobial use were collected from 97 dairy farms. Antimicrobial consumption was quantified using: (1) the incidence density of antimicrobial treatments; (2) the weight of active substance; (3) the used daily dose and (4) the used course dose for antimicrobials for intestinal, intrauterine and systemic use; and (5) the used unit dose, for antimicrobials for intramammary use. Data quality was evaluated by describing completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. Relative consumption of antimicrobials depended on the unit of measurement: used doses reflected the treatment intensity better than weight of active substance. The use of antimicrobials classified as high priority was low, although under- and overdosing were frequently observed. Electronic recording systems allowed better traceability of the animals treated. Recording drug name or dosage often resulted in incomplete or inaccurate information. Veterinarians tended to record more drugs than farmers. The integration of veterinarian and farm data would improve data quality.