918 resultados para data quality issues
Resumo:
Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets. © (2012) by the AIS/ICIS Administrative Office All rights reserved.
Resumo:
© Springer International Publishing Switzerland 2015. Making sound asset management decisions, such as whether to replace or maintain an ageing underground water pipe, are critical to ensure that organisations maximise the performance of their assets. These decisions are only as good as the data that supports them, and hence many asset management organisations are in desperate need to improve the quality of their data. This chapter reviews the key academic research on data quality (DQ) and Information Quality (IQ) (used interchangeably in this chapter) in asset management, combines this with the current DQ problems faced by asset management organisations in various business sectors, and presents a classification of the most important DQ problems that need to be tackled by asset management organisations. In this research, eleven semi structured interviews were carried out with asset management professionals in a range of business sectors in the UK. The problems described in the academic literature were cross checked against the problems found in industry. In order to support asset management professionals in solving these problems, we categorised them into seven different DQ dimensions, used in the academic literature, so that it is clear how these problems fit within the standard frameworks for assessing and improving data quality. Asset management professionals can therefore now use these frameworks to underpin their DQ improvement initiatives while focussing on the most critical DQ problems.
Resumo:
BACKGROUND: Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. METHODS AND PRINCIPAL FINDINGS: The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. CONCLUSIONS: Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks.
Resumo:
Background: SPARCLE is a cross-sectional survey in nine European regions, examining the relationship of the environment of children with cerebral palsy to their participation and quality of life. The objective of this report is to assess data quality, in particular heterogeneity between regions, family and item non-response and potential for bias. Methods: 1,174 children aged 8–12 years were selected from eight population-based registers of children with cerebral palsy; one further centre recruited 75 children from multiple sources. Families were visited by trained researchers who administered psychometric questionnaires. Logistic regression was used to assess factors related to family non-response and self-completion of questionnaires by children. Results: 431/1,174 (37%) families identified from registers did not respond: 146 (12%) were not traced; of the 1,028 traced families, 250 (24%) declined to participate and 35 (3%) were not approached. Families whose disabled children could walk unaided were more likely to decline to participate. 818 children entered the study of which 500 (61%) self-reported their quality of life; children with low IQ, seizures or inability to walk were less likely to self-report. There was substantial heterogeneity between regions in response rates and socio-demographic characteristics of families but not in age or gender of children. Item non-response was 2% for children and ranged from 0.4% to 5% for questionnaires completed by parents. Conclusion: While the proportion of untraced families was higher than in similar surveys, the refusal rate was comparable. To reduce bias, all analyses should allow for region, walking ability, age and socio-demographic characteristics. The 75 children in the region without a population based register are unlikely to introduce bias
Resumo:
Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.
Resumo:
To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.
Resumo:
Includes bibliography