969 resultados para Data Quality


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although managers consider accurate, timely, and relevant information as critical to the quality of their decisions, evidence of large variations in data quality abounds. Over a period of twelve months, the action research project reported herein attempted to investigate and track data quality initiatives undertaken by the participating organisation. The investigation focused on two types of errors: transaction input errors and processing errors. Whenever the action research initiative identified non-trivial errors, the participating organisation introduced actions to correct the errors and prevent similar errors in the future. Data quality metrics were taken quarterly to measure improvements resulting from the activities undertaken during the action research project. The action research project results indicated that for a mission-critical database to ensure and maintain data quality, commitment to continuous data quality improvement is necessary. Also, communication among all stakeholders is required to ensure common understanding of data quality improvement goals. The action research project found that to further substantially improve data quality, structural changes within the organisation and to the information systems are sometimes necessary. The major goal of the action research study is to increase the level of data quality awareness within all organisations and to motivate them to examine the importance of achieving and maintaining high-quality data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Indicators which summarise the characteristics of spatiotemporal data coverages significantly simplify quality evaluation, decision making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow. Criteria which are commonly prioritised in evaluating spatial data quality and assessing a dataset’s fitness for use include lineage, completeness, logical consistency, positional accuracy, temporal and attribute accuracy. However, user requirements may go far beyond these broadlyaccepted spatial quality metrics, to incorporate specific and complex factors which are less easily measured. This paper discusses the results of a study of high level user requirements in geospatial data selection and data quality evaluation. It reports on the geospatial data quality indicators which were identified as user priorities, and which can potentially be standardised to enable intercomparison of datasets against user requirements. We briefly describe the implications for tools and standards to support the communication and intercomparison of data quality, and the ways in which these can contribute to the generation of a GEO label.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot full all user needs or cover all concepts of data quality. In this paper we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specification on data quality, and propose an integrated model for data quality in the eld of Earth observation. We also propose a practical mechanism for applying the integrated quality information model to large number of datasets through metadata inheritance. While our data quality management approach is in the domain of Earth observation, we believe the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Data Quality Campaign (DQC) has been focused since 2005 on advocating for states to build robust state longitudinal data systems (SLDS). While states have made great progress in their data infrastructure, and should continue to emphasize this work, t data systems alone will not improve outcomes. It is time for both DQC and states to focus on building capacity to use the information that these systems are producing at every level – from classrooms to state houses. To impact system performance and student achievement, the ingrained culture must be replaced with one that focuses on data use for continuous improvement. The effective use of data to inform decisions, provide transparency, improve the measurement of outcomes, and fuel continuous improvement will not come to fruition unless there is a system wide focus on building capacity around the collection, analysis, dissemination, and use of this data, including through research.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As the number of data sources publishing their data on the Web of Data is growing, we are experiencing an immense growth of the Linked Open Data cloud. The lack of control on the published sources, which could be untrustworthy or unreliable, along with their dynamic nature that often invalidates links and causes conflicts or other discrepancies, could lead to poor quality data. In order to judge data quality, a number of quality indicators have been proposed, coupled with quality metrics that quantify the “quality level” of a dataset. In addition to the above, some approaches address how to improve the quality of the datasets through a repair process that focuses on how to correct invalidities caused by constraint violations by either removing or adding triples. In this paper we argue that provenance is a critical factor that should be taken into account during repairs to ensure that the most reliable data is kept. Based on this idea, we propose quality metrics that take into account provenance and evaluate their applicability as repair guidelines in a particular data fusion setting.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background and purpose Survey data quality is a combination of the representativeness of the sample, the accuracy and precision of measurements, data processing and management with several subcomponents in each. The purpose of this paper is to show how, in the final risk factor surveys of the WHO MONICA Project, information on data quality were obtained, quantified, and used in the analysis. Methods and results In the WHO MONICA (Multinational MONItoring of trends and determinants in CArdiovascular disease) Project, the information about the data quality components was documented in retrospective quality assessment reports. On the basis of the documented information and the survey data, the quality of each data component was assessed and summarized using quality scores. The quality scores were used in sensitivity testing of the results both by excluding populations with low quality scores and by weighting the data by its quality scores. Conclusions Detailed documentation of all survey procedures with standardized protocols, training, and quality control are steps towards optimizing data quality. Quantifying data quality is a further step. Methods used in the WHO MONICA Project could be adopted to improve quality in other health surveys.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The evaluation of geospatial data quality and trustworthiness presents a major challenge to geospatial data users when making a dataset selection decision. The research presented here therefore focused on defining and developing a GEO label – a decision support mechanism to assist data users in efficient and effective geospatial dataset selection on the basis of quality, trustworthiness and fitness for use. This thesis thus presents six phases of research and development conducted to: (a) identify the informational aspects upon which users rely when assessing geospatial dataset quality and trustworthiness; (2) elicit initial user views on the GEO label role in supporting dataset comparison and selection; (3) evaluate prototype label visualisations; (4) develop a Web service to support GEO label generation; (5) develop a prototype GEO label-based dataset discovery and intercomparison decision support tool; and (6) evaluate the prototype tool in a controlled human-subject study. The results of the studies revealed, and subsequently confirmed, eight geospatial data informational aspects that were considered important by users when evaluating geospatial dataset quality and trustworthiness, namely: producer information, producer comments, lineage information, compliance with standards, quantitative quality information, user feedback, expert reviews, and citations information. Following an iterative user-centred design (UCD) approach, it was established that the GEO label should visually summarise availability and allow interrogation of these key informational aspects. A Web service was developed to support generation of dynamic GEO label representations and integrated into a number of real-world GIS applications. The service was also utilised in the development of the GEO LINC tool – a GEO label-based dataset discovery and intercomparison decision support tool. The results of the final evaluation study indicated that (a) the GEO label effectively communicates the availability of dataset quality and trustworthiness information and (b) GEO LINC successfully facilitates ‘at a glance’ dataset intercomparison and fitness for purpose-based dataset selection.