962 resultados para Data quality problems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quality data are not only relevant for successful Data Warehousing or Business Intelligence applications; they are also a precondition for efficient and effective use of Enterprise Resource Planning (ERP) systems. ERP professionals in all kinds of businesses are concerned with data quality issues, as a survey, conducted by the Institute of Information Systems at the University of Bern, has shown. This paper demonstrates, by using results of this survey, why data quality problems in modern ERP systems can occur and suggests how ERP researchers and practitioners can handle issues around the quality of data in an ERP software Environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Websites are, nowadays, the face of institutions, but they are often neglected, especially when it comes to contents. In the present paper, we put forth an investigation work whose final goal is the development of a model for the measurement of data quality in institutional websites for health units. To that end, we have carried out a bibliographic review of the available approaches for the evaluation of website content quality, in order to identify the most recurrent dimensions and the attributes, and we are currently carrying out a Delphi Method process, presently in its second stage, with the purpose of reaching an adequate set of attributes for the measurement of content quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents a research work, the goal of which was to achieve a model for the evaluation of data quality in institutional websites of health units in a broad and balanced way. We have carried out a literature review of the available approaches for the evaluation of website content quality, in order to identify the most recurrent dimensions and the attributes, and we have also carried out a Delphi method process with experts in order to reach an adequate set of attributes and their respective weights for the measurement of content quality. The results obtained revealed a high level of consensus among the experts who participated in the Delphi process. On the other hand, the different statistical analysis and techniques implemented are robust and attach confidence to our results and consequent model obtained.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The clinical content of administrative databases includes, among others, patient demographic characteristics, and codes for diagnoses and procedures. The data in these databases is standardized, clearly defined, readily available, less expensive than collected by other means, and normally covers hospitalizations in entire geographic areas. Although with some limitations, this data is often used to evaluate the quality of healthcare. Under these circumstances, the quality of the data, for instance, errors, or it completeness, is of central importance and should never be ignored. Both the minimization of data quality problems and a deep knowledge about this data (e.g., how to select a patient group) are important for users in order to trust and to correctly interpret results. In this paper we present, discuss and give some recommendations for some problems found in these administrative databases. We also present a simple tool that can be used to screen the quality of data through the use of domain specific data quality indicators. These indicators can significantly contribute to better data, to give steps towards a continuous increase of data quality and, certainly, to better informed decision-making.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biofilm research is growing more diverse and dependent on high-throughput technologies and the large-scale production of results aggravates data substantiation. In particular, it is often the case that experimental protocols are adapted to meet the needs of a particular laboratory and no statistical validation of the modified method is provided. This paper discusses the impact of intra-laboratory adaptation and non-rigorous documentation of experimental protocols on biofilm data interchange and validation. The case study is a non-standard, but widely used, workflow for Pseudomonas aeruginosa biofilm development, considering three analysis assays: the crystal violet (CV) assay for biomass quantification, the XTT assay for respiratory activity assessment, and the colony forming units (CFU) assay for determination of cell viability. The ruggedness of the protocol was assessed by introducing small changes in the biofilm growth conditions, which simulate minor protocol adaptations and non-rigorous protocol documentation. Results show that even minor variations in the biofilm growth conditions may affect the results considerably, and that the biofilm analysis assays lack repeatability. Intra-laboratory validation of non-standard protocols is found critical to ensure data quality and enable the comparison of results within and among laboratories.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We construct estimates of educational attainment for a sample of OECD countries using previously unexploited sources. We follow a heuristic approach to obtain plausible time profiles for attainment levels by removing sharp breaks in the data that seem to reflect changes in classification criteria. We then construct indicators of the information content of our series and a number of previously available data sets and examine their performance in several growth specifications. We find a clear positive correlation between data quality and the size and significance of human capital coefficients in growth regressions. Using an extension of the classical errors in variables model, we construct a set of meta-estimates of the coefficient of years of schooling in an aggregate Cobb-Douglas production function. Our results suggest that, after correcting for measurement error bias, the value of this parameter is well above 0.50.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Surveillance of Congenital Anomalies (EUROCAT) network of population-based congenital anomaly registries is an important source of epidemiologic information on congenital anomalies in Europe covering live births, fetal deaths from 20 weeks gestation, and terminations of pregnancy for fetal anomaly. EUROCAT's policy is to strive for high-quality data, while ensuring consistency and transparency across all member registries. A set of 30 data quality indicators (DQIs) was developed to assess five key elements of data quality: completeness of case ascertainment, accuracy of diagnosis, completeness of information on EUROCAT variables, timeliness of data transmission, and availability of population denominator information. This article describes each of the individual DQIs and presents the output for each registry as well as the EUROCAT (unweighted) average, for 29 full member registries for 2004-2008. This information is also available on the EUROCAT website for previous years. The EUROCAT DQIs allow registries to evaluate their performance in relation to other registries and allows appropriate interpretations to be made of the data collected. The DQIs provide direction for improving data collection and ascertainment, and they allow annual assessment for monitoring continuous improvement. The DQI are constantly reviewed and refined to best document registry procedures and processes regarding data collection, to ensure appropriateness of DQI, and to ensure transparency so that the data collected can make a substantial and useful contribution to epidemiologic research on congenital anomalies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Breastfeeding has important health benefits for both mother and child. Breastfed babies are less likely to report with gastric, respiratory and urinary tract infections and allergic diseases, while they are also less likely to become obese in later childhood. Improving breastfeeding initiation has become a national priority, and a national target has been set ̢?oto deliver an increase of two percentage points per annum in breastfeeding initiation rate, focusing especially on women from disadvantaged areas̢?. Despite improvements in data quality in previous years, it still remains difficult to construct an accurate and reliable picture of variations and trends in breastfeeding in the East Midlands. It is essential that nationally standardised data collection systems are put in place to enable effective and accurate monitoring and evaluation of breastfeeding status both at a local and national level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One in a series of six data briefings based on regional-level analysis of data from the National Child Measurement Programme (NCMP) undertaken by the National Obesity Observatory (NOO). The briefings are intended to complement the headline results for the region published in January 2010. This briefing covers issues relating to the quality and completeness of the NCMP data. Detailed analysis of the NCMP at national level is available from NOO at http://www.noo.org.uk/NOO_pubInformation on the methods used to

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The NRS state data quality standards identify the policies, processes and materials that states and local programs should have in place to collect valid and reliable data for the National Reporting System (NRS). The Division of Adult Education (DAEL) within the Office of Vocational and Adult Education developed the standards to define the characteristics of high quality state and local data collection systems for the NRS. The standards provide an organized way for DAEL to understand the quality of NRS data collection within the states and also provide guidance to states on how to improve their systems. States are to complete this checklist, which incorporates the standards, with their annual NRS data submission to rate their level of implementation of the standards. The accompanying policy document describes DAELâs requirements for state conformance to the standards and explains the use of the information from this checklist.