936 resultados para Data quality problems
Resumo:
Tower platforms, with instrumentation at six levels above the surface to a height of 30 m, were used to record various atmospheric parameters in the surface layer. Sensors for measuring both mean and fluctuating quantities were used, with the majority of them indigenously built. Soil temperature sensors up to a depth of 30 cm from the surface were among the variables connected to the mean data logger. A PC-based data acquisition system built at the Centre for Atmospheric Sciences, IISc, was used to acquire the data from fast response sensors. This paper reports the various components of a typical MONTBLEX tower observatory and describes the actual experiments carried out in the surface layer at four sites over the monsoon trough region as a part of the MONTBLEX programme. It also describes and discusses several checks made on randomly selected tower data-sets acquired during the experiment. Checks made include visual inspection of time traces from various sensors, comparative plots of sensors measuring the same variable, wind and temperature profile plots calculation of roughness lengths, statistical and stability parameters, diurnal variation of stability parameters, and plots of probability density and energy spectrum for the different sensors. Results from these checks are found to be very encouraging and reveal the potential for further detailed analysis to understand more about surface layer characteristics.
Resumo:
Various problems associated with the quality of the fishery products like spoilage, discolouration, microbiological problems, etc., are outlined. The reasons and remedial measures are discussed. The importance of proper handling, processing and hygiene is stressed.
Resumo:
There is increasing evidence that many of the mitochondrial DNA (mtDNA) databases published in the fields of forensic science and molecular anthropology are flawed. An a posteriori phylogenetic analysis of the sequences could help to eliminate most of the errors and thus greatly improve data quality. However, previously published caveats and recommendations along these lines were not yet picked up by all researchers. Here we call for stringent quality control of mtDNA data by haplogroup-directed database comparisons. We take some problematic databases of East Asian mtDNAs, published in the Journal of Forensic Sciences and Forensic Science International, as examples to demonstrate the process of pinpointing obvious errors. Our results show that data sets are not only notoriously plagued by base shifts and artificial recombination but also by lab-specific phantom mutations, especially in the second hypervariable region (HVR-II). (C) 2003 Elsevier Ireland Ltd. All rights reserved.
Resumo:
The Dependency Structure Matrix (DSM) has proved to be a useful tool for system structure elicitation and analysis. However, as with any modelling approach, the insights gained from analysis are limited by the quality and correctness of input information. This paper explores how the quality of data in a DSM can be enhanced by elicitation methods which include comparison of information acquired from different perspectives and levels of abstraction. The approach is based on comparison of dependencies according to their structural importance. It is illustrated through two case studies: creation of a DSM showing the spatial connections between elements in a product, and a DSM capturing information flows in an organisation. We conclude that considering structural criteria can lead to improved data quality in DSM models, although further research is required to fully explore the benefits and limitations of our proposed approach.
Resumo:
BACKGROUND: Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. METHODS AND PRINCIPAL FINDINGS: The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. CONCLUSIONS: Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks.
Resumo:
Background: SPARCLE is a cross-sectional survey in nine European regions, examining the relationship of the environment of children with cerebral palsy to their participation and quality of life. The objective of this report is to assess data quality, in particular heterogeneity between regions, family and item non-response and potential for bias. Methods: 1,174 children aged 8–12 years were selected from eight population-based registers of children with cerebral palsy; one further centre recruited 75 children from multiple sources. Families were visited by trained researchers who administered psychometric questionnaires. Logistic regression was used to assess factors related to family non-response and self-completion of questionnaires by children. Results: 431/1,174 (37%) families identified from registers did not respond: 146 (12%) were not traced; of the 1,028 traced families, 250 (24%) declined to participate and 35 (3%) were not approached. Families whose disabled children could walk unaided were more likely to decline to participate. 818 children entered the study of which 500 (61%) self-reported their quality of life; children with low IQ, seizures or inability to walk were less likely to self-report. There was substantial heterogeneity between regions in response rates and socio-demographic characteristics of families but not in age or gender of children. Item non-response was 2% for children and ranged from 0.4% to 5% for questionnaires completed by parents. Conclusion: While the proportion of untraced families was higher than in similar surveys, the refusal rate was comparable. To reduce bias, all analyses should allow for region, walking ability, age and socio-demographic characteristics. The 75 children in the region without a population based register are unlikely to introduce bias
Resumo:
This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.