Biblioteca Digital

Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.

Veja mais

Communicating thematic data quality with web map services

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.

Veja mais

How does data quality in a network affect heuristic solutions?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.

Veja mais

Managing data quality in a statistical agency

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliography

Veja mais

Data quality in national statistics institutes

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Antimicrobial use in Swiss dairy farms: quantification and evaluation of data quality

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data on antimicrobial use play a key role in the development of policies for the containment of antimicrobial resistance. On-farm data could provide a detailed overview of the antimicrobial use, but technical and methodological aspects of data collection and interpretation, as well as data quality need to be further assessed. The aims of this study were (1) to quantify antimicrobial use in the study population using different units of measurement and contrast the results obtained, (2) to evaluate data quality of farm records on antimicrobial use, and (3) to compare data quality of different recording systems. During 1 year, data on antimicrobial use were collected from 97 dairy farms. Antimicrobial consumption was quantified using: (1) the incidence density of antimicrobial treatments; (2) the weight of active substance; (3) the used daily dose and (4) the used course dose for antimicrobials for intestinal, intrauterine and systemic use; and (5) the used unit dose, for antimicrobials for intramammary use. Data quality was evaluated by describing completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. Relative consumption of antimicrobials depended on the unit of measurement: used doses reflected the treatment intensity better than weight of active substance. The use of antimicrobials classified as high priority was low, although under- and overdosing were frequently observed. Electronic recording systems allowed better traceability of the animals treated. Recording drug name or dosage often resulted in incomplete or inaccurate information. Veterinarians tended to record more drugs than farmers. The integration of veterinarian and farm data would improve data quality.

Veja mais

Data Quality Assessment of Ungated Flow Cytometry Data in High

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.

Veja mais

Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To describe the electronic medical databases used in antiretroviral therapy (ART) programmes in lower-income countries and assess the measures such programmes employ to maintain and improve data quality and reduce the loss of patients to follow-up. METHODS: In 15 countries of Africa, South America and Asia, a survey was conducted from December 2006 to February 2007 on the use of electronic medical record systems in ART programmes. Patients enrolled in the sites at the time of the survey but not seen during the previous 12 months were considered lost to follow-up. The quality of the data was assessed by computing the percentage of missing key variables (age, sex, clinical stage of HIV infection, CD4+ lymphocyte count and year of ART initiation). Associations between site characteristics (such as number of staff members dedicated to data management), measures to reduce loss to follow-up (such as the presence of staff dedicated to tracing patients) and data quality and loss to follow-up were analysed using multivariate logit models. FINDINGS: Twenty-one sites that together provided ART to 50 060 patients were included (median number of patients per site: 1000; interquartile range, IQR: 72-19 320). Eighteen sites (86%) used an electronic database for medical record-keeping; 15 (83%) such sites relied on software intended for personal or small business use. The median percentage of missing data for key variables per site was 10.9% (IQR: 2.0-18.9%) and declined with training in data management (odds ratio, OR: 0.58; 95% confidence interval, CI: 0.37-0.90) and weekly hours spent by a clerk on the database per 100 patients on ART (OR: 0.95; 95% CI: 0.90-0.99). About 10 weekly hours per 100 patients on ART were required to reduce missing data for key variables to below 10%. The median percentage of patients lost to follow-up 1 year after starting ART was 8.5% (IQR: 4.2-19.7%). Strategies to reduce loss to follow-up included outreach teams, community-based organizations and checking death registry data. Implementation of all three strategies substantially reduced losses to follow-up (OR: 0.17; 95% CI: 0.15-0.20). CONCLUSION: The quality of the data collected and the retention of patients in ART treatment programmes are unsatisfactory for many sites involved in the scale-up of ART in resource-limited settings, mainly because of insufficient staff trained to manage data and trace patients lost to follow-up.

Veja mais

Analysis of in-cylinder pressure transducer data quality utilizing a SIDI turbocharged engine

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In-cylinder pressure transducers have been used for decades to record combustion pressure inside a running engine. However, due to the extreme operating environment, transducer design and installation must be considered in order to minimize measurement error. One such error is caused by thermal shock, where the pressure transducer experiences a high heat flux that can distort the pressure transducer diaphragm and also change the crystal sensitivity. This research focused on investigating the effects of thermal shock on in-cylinder pressure transducer data quality using a 2.0L, four-cylinder, spark-ignited, direct-injected, turbo-charged GM engine. Cylinder four was modified with five ports to accommodate pressure transducers of different manufacturers. They included an AVL GH14D, an AVL GH15D, a Kistler 6125C, and a Kistler 6054AR. The GH14D, GH15D, and 6054AR were M5 size transducers. The 6125C was a larger, 6.2mm transducer. Note that both of the AVL pressure transducers utilized a PH03 flame arrestor. Sweeps of ignition timing (spark sweep), engine speed, and engine load were performed to study the effects of thermal shock on each pressure transducer. The project consisted of two distinct phases which included experimental engine testing as well as simulation using a commercially available software package. A comparison was performed to characterize the quality of the data between the actual cylinder pressure and the simulated results. This comparison was valuable because the simulation results did not include thermal shock effects. All three sets of tests showed the peak cylinder pressure was basically unaffected by thermal shock. Comparison of the experimental data with the simulated results showed very good correlation. The spark sweep was performed at 1300 RPM and 3.3 bar NMEP and showed that the differences between the simulated results (no thermal shock) and the experimental data for the indicated mean effective pressure (IMEP) and the pumping mean effective pressure (PMEP) were significantly less than the published accuracies. All transducers had an IMEP percent difference less than 0.038% and less than 0.32% for PMEP. Kistler and AVL publish that the accuracy of their pressure transducers are within plus or minus 1% for the IMEP (AVL 2011; Kistler 2011). In addition, the difference in average exhaust absolute pressure between the simulated results and experimental data was the greatest for the two Kistler pressure transducers. The location and lack of flame arrestor are believed to be the cause of the increased error. For the engine speed sweep, the torque output was held constant at 203 Nm (150 ft-lbf) from 1500 to 4000 RPM. The difference in IMEP was less than 0.01% and the PMEP was less than 1%, except for the AVL GH14D which was 5% and the AVL GH15DK which was 2.25%. A noticeable error in PMEP appeared as the load increased during the engine speed sweeps, as expected. The load sweep was conducted at 2000 RPM over a range of NMEP from 1.1 to 14 bar. The difference in IMEP values were less 0.08% while the PMEP values were below 1% except for the AVL GH14D which was 1.8% and the AVL GH15DK which was at 1.25%. In-cylinder pressure transducer data quality was effectively analyzed using a combination of experimental data and simulation results. Several criteria can be used to investigate the impact of thermal shock on data quality as well as determine the best location and thermal protection for various transducers.

Veja mais

Data quality of animal health records on Swiss dairy farms

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-quality data are essential for veterinary surveillance systems, and their quality can be affected by the source and the method of collection. Data recorded on farms could provide detailed information about the health of a population of animals, but the accuracy of the data recorded by farmers is uncertain. The aims of this study were to evaluate the quality of the data on animal health recorded on 97 Swiss dairy farms, to compare the quality of the data obtained by different recording systems, and to obtain baseline data on the health of the animals on the 97 farms. Data on animal health were collected from the farms for a year. Their quality was evaluated by assessing the completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. The quality of the data provided by the farmers was satisfactory, although electronic recording systems made it easier to trace the animals treated. The farmers tended to record more health-related events than the veterinarians, although this varied with the event considered, and some events were recorded only by the veterinarians. The farmers' attitude towards data collection was positive. Factors such as motivation, feedback, training, and simplicity and standardisation of data collection were important because they influenced the quality of the data.

Veja mais

Topology and Link quality-aware Geographical opportunistic routing in wireless ad-hoc networks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Opportunistic routing (OR) takes advantage of the broadcast nature and spatial diversity of wireless transmission to improve the performance of wireless ad-hoc networks. Instead of using a predetermined path to send packets, OR postpones the choice of the next-hop to the receiver side, and lets the multiple receivers of a packet to coordinate and decide which one will be the forwarder. Existing OR protocols choose the next-hop forwarder based on a predefined candidate list, which is calculated using single network metrics. In this paper, we propose TLG - Topology and Link quality-aware Geographical opportunistic routing protocol. TLG uses multiple network metrics such as network topology, link quality, and geographic location to implement the coordination mechanism of OR. We compare TLG with well-known existing solutions and simulation results show that TLG outperforms others in terms of both QoS and QoE metrics.

Veja mais

966 resultados para Data quality-aware mechanisms

Filtro por publicador