Biblioteca Digital

969 resultados para Data Quality

Assessing vertical resolution requirements for operational weather radar data quality

Relevância:

100.00% 100.00%

Publicador:

Veja mais

An integrated view of data quality in Earth observation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.

Veja mais

Communicating thematic data quality with web map services

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.

Veja mais

How does data quality in a network affect heuristic solutions?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.

Veja mais

Managing data quality in a statistical agency

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliography

Veja mais

Data quality in national statistics institutes

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Antimicrobial use in Swiss dairy farms: quantification and evaluation of data quality

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data on antimicrobial use play a key role in the development of policies for the containment of antimicrobial resistance. On-farm data could provide a detailed overview of the antimicrobial use, but technical and methodological aspects of data collection and interpretation, as well as data quality need to be further assessed. The aims of this study were (1) to quantify antimicrobial use in the study population using different units of measurement and contrast the results obtained, (2) to evaluate data quality of farm records on antimicrobial use, and (3) to compare data quality of different recording systems. During 1 year, data on antimicrobial use were collected from 97 dairy farms. Antimicrobial consumption was quantified using: (1) the incidence density of antimicrobial treatments; (2) the weight of active substance; (3) the used daily dose and (4) the used course dose for antimicrobials for intestinal, intrauterine and systemic use; and (5) the used unit dose, for antimicrobials for intramammary use. Data quality was evaluated by describing completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. Relative consumption of antimicrobials depended on the unit of measurement: used doses reflected the treatment intensity better than weight of active substance. The use of antimicrobials classified as high priority was low, although under- and overdosing were frequently observed. Electronic recording systems allowed better traceability of the animals treated. Recording drug name or dosage often resulted in incomplete or inaccurate information. Veterinarians tended to record more drugs than farmers. The integration of veterinarian and farm data would improve data quality.

Veja mais

Data Quality Assessment of Ungated Flow Cytometry Data in High

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.

Veja mais

Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To describe the electronic medical databases used in antiretroviral therapy (ART) programmes in lower-income countries and assess the measures such programmes employ to maintain and improve data quality and reduce the loss of patients to follow-up. METHODS: In 15 countries of Africa, South America and Asia, a survey was conducted from December 2006 to February 2007 on the use of electronic medical record systems in ART programmes. Patients enrolled in the sites at the time of the survey but not seen during the previous 12 months were considered lost to follow-up. The quality of the data was assessed by computing the percentage of missing key variables (age, sex, clinical stage of HIV infection, CD4+ lymphocyte count and year of ART initiation). Associations between site characteristics (such as number of staff members dedicated to data management), measures to reduce loss to follow-up (such as the presence of staff dedicated to tracing patients) and data quality and loss to follow-up were analysed using multivariate logit models. FINDINGS: Twenty-one sites that together provided ART to 50 060 patients were included (median number of patients per site: 1000; interquartile range, IQR: 72-19 320). Eighteen sites (86%) used an electronic database for medical record-keeping; 15 (83%) such sites relied on software intended for personal or small business use. The median percentage of missing data for key variables per site was 10.9% (IQR: 2.0-18.9%) and declined with training in data management (odds ratio, OR: 0.58; 95% confidence interval, CI: 0.37-0.90) and weekly hours spent by a clerk on the database per 100 patients on ART (OR: 0.95; 95% CI: 0.90-0.99). About 10 weekly hours per 100 patients on ART were required to reduce missing data for key variables to below 10%. The median percentage of patients lost to follow-up 1 year after starting ART was 8.5% (IQR: 4.2-19.7%). Strategies to reduce loss to follow-up included outreach teams, community-based organizations and checking death registry data. Implementation of all three strategies substantially reduced losses to follow-up (OR: 0.17; 95% CI: 0.15-0.20). CONCLUSION: The quality of the data collected and the retention of patients in ART treatment programmes are unsatisfactory for many sites involved in the scale-up of ART in resource-limited settings, mainly because of insufficient staff trained to manage data and trace patients lost to follow-up.

Veja mais

Analysis of in-cylinder pressure transducer data quality utilizing a SIDI turbocharged engine

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In-cylinder pressure transducers have been used for decades to record combustion pressure inside a running engine. However, due to the extreme operating environment, transducer design and installation must be considered in order to minimize measurement error. One such error is caused by thermal shock, where the pressure transducer experiences a high heat flux that can distort the pressure transducer diaphragm and also change the crystal sensitivity. This research focused on investigating the effects of thermal shock on in-cylinder pressure transducer data quality using a 2.0L, four-cylinder, spark-ignited, direct-injected, turbo-charged GM engine. Cylinder four was modified with five ports to accommodate pressure transducers of different manufacturers. They included an AVL GH14D, an AVL GH15D, a Kistler 6125C, and a Kistler 6054AR. The GH14D, GH15D, and 6054AR were M5 size transducers. The 6125C was a larger, 6.2mm transducer. Note that both of the AVL pressure transducers utilized a PH03 flame arrestor. Sweeps of ignition timing (spark sweep), engine speed, and engine load were performed to study the effects of thermal shock on each pressure transducer. The project consisted of two distinct phases which included experimental engine testing as well as simulation using a commercially available software package. A comparison was performed to characterize the quality of the data between the actual cylinder pressure and the simulated results. This comparison was valuable because the simulation results did not include thermal shock effects. All three sets of tests showed the peak cylinder pressure was basically unaffected by thermal shock. Comparison of the experimental data with the simulated results showed very good correlation. The spark sweep was performed at 1300 RPM and 3.3 bar NMEP and showed that the differences between the simulated results (no thermal shock) and the experimental data for the indicated mean effective pressure (IMEP) and the pumping mean effective pressure (PMEP) were significantly less than the published accuracies. All transducers had an IMEP percent difference less than 0.038% and less than 0.32% for PMEP. Kistler and AVL publish that the accuracy of their pressure transducers are within plus or minus 1% for the IMEP (AVL 2011; Kistler 2011). In addition, the difference in average exhaust absolute pressure between the simulated results and experimental data was the greatest for the two Kistler pressure transducers. The location and lack of flame arrestor are believed to be the cause of the increased error. For the engine speed sweep, the torque output was held constant at 203 Nm (150 ft-lbf) from 1500 to 4000 RPM. The difference in IMEP was less than 0.01% and the PMEP was less than 1%, except for the AVL GH14D which was 5% and the AVL GH15DK which was 2.25%. A noticeable error in PMEP appeared as the load increased during the engine speed sweeps, as expected. The load sweep was conducted at 2000 RPM over a range of NMEP from 1.1 to 14 bar. The difference in IMEP values were less 0.08% while the PMEP values were below 1% except for the AVL GH14D which was 1.8% and the AVL GH15DK which was at 1.25%. In-cylinder pressure transducer data quality was effectively analyzed using a combination of experimental data and simulation results. Several criteria can be used to investigate the impact of thermal shock on data quality as well as determine the best location and thermal protection for various transducers.

Veja mais

Data quality of animal health records on Swiss dairy farms

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-quality data are essential for veterinary surveillance systems, and their quality can be affected by the source and the method of collection. Data recorded on farms could provide detailed information about the health of a population of animals, but the accuracy of the data recorded by farmers is uncertain. The aims of this study were to evaluate the quality of the data on animal health recorded on 97 Swiss dairy farms, to compare the quality of the data obtained by different recording systems, and to obtain baseline data on the health of the animals on the 97 farms. Data on animal health were collected from the farms for a year. Their quality was evaluated by assessing the completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. The quality of the data provided by the farmers was satisfactory, although electronic recording systems made it easier to trace the animals treated. The farmers tended to record more health-related events than the veterinarians, although this varied with the event considered, and some events were recorded only by the veterinarians. The farmers' attitude towards data collection was positive. Factors such as motivation, feedback, training, and simplicity and standardisation of data collection were important because they influenced the quality of the data.

Veja mais

Monitoring and data quality assessment of the ATLAS liquid argon calorimeter

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The liquid argon calorimeter is a key component of the ATLAS detector installed at the CERN Large Hadron Collider. The primary purpose of this calorimeter is the measurement of electron and photon kinematic properties. It also provides a crucial input for measuring jets and missing transverse momentum. An advanced data monitoring procedure was designed to quickly identify issues that would affect detector performance and ensure that only the best quality data are used for physics analysis. This article presents the validation procedure developed during the 2011 and 2012 LHC data-taking periods, in which more than 98% of the proton-proton luminosity recorded by ATLAS at a centre-of-mass energy of 7–8 TeV had calorimeter data quality suitable for physics analysis.

Veja mais

An Exploratory Study of Data Quality Management Practices in the ERP Software Systems Context

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quality data are not only relevant for successful Data Warehousing or Business Intelligence applications; they are also a precondition for efficient and effective use of Enterprise Resource Planning (ERP) systems. ERP professionals in all kinds of businesses are concerned with data quality issues, as a survey, conducted by the Institute of Information Systems at the University of Bern, has shown. This paper demonstrates, by using results of this survey, why data quality problems in modern ERP systems can occur and suggests how ERP researchers and practitioners can handle issues around the quality of data in an ERP software Environment.

Veja mais

Data quality in trauma transfusion studies and the impact of missing data on predicting massive transfusion

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^

Veja mais

Open business intelligence: on the importance of data quality awareness in user-friendly data mining

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.

Veja mais

969 resultados para Data Quality

Filtro por publicador