936 resultados para Data quality problems


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70â5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractorâs Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractorâs perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Two seismic surveys were carried out on the high-altitude glacier saddle, Colle Gnifetti, Monte Rosa, Italy/Switzerland. Explosive and vibroseismic sources were tested to explore the best way to generate seismic waves to deduce shallow and intermediate properties (<100 m) of firn and ice. The explosive source (SISSY) excites strong surface and diving waves, degrading data quality for processing; no englacial reflections besides the noisy bed reflector are visible. However, the strong diving waves are analyzed to derive the density distribution of the firn pack, yielding results similar to a nearby ice core. The vibrator source (ElViS), used in both P- and SH-wave modes, produces detectable laterally coherent reflections within the firn and ice column. We compare these with ice-core and radar data. The SH-wave data are particularly useful in providing detailed, high-resolution information on firn and ice stratigraphy. Our analyses demonstrate the potential of seismic methods to determine physical properties of firn and ice, particularly density and potentially also crystal-orientation fabric.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sedimentary sequences in ancient or long-lived lakes can reach several thousands of meters in thickness and often provide an unrivalled perspective of the lake's regional climatic, environmental, and biological history. Over the last few years, deep-drilling projects in ancient lakes became increasingly multi- and interdisciplinary, as, among others, seismological, sedimentological, biogeochemical, climatic, environmental, paleontological, and evolutionary information can be obtained from sediment cores. However, these multi- and interdisciplinary projects pose several challenges. The scientists involved typically approach problems from different scientific perspectives and backgrounds, and setting up the program requires clear communication and the alignment of interests. One of the most challenging tasks, besides the actual drilling operation, is to link diverse datasets with varying resolution, data quality, and age uncertainties to answer interdisciplinary questions synthetically and coherently. These problems are especially relevant when secondary data, i.e., datasets obtained independently of the drilling operation, are incorporated in analyses. Nonetheless, the inclusion of secondary information, such as isotopic data from fossils found in outcrops or genetic data from extant species, may help to achieve synthetic answers. Recent technological and methodological advances in paleolimnology are likely to increase the possibilities of integrating secondary information. Some of the new approaches have started to revolutionize scientific drilling in ancient lakes, but at the same time, they also add a new layer of complexity to the generation and analysis of sediment-core data. The enhanced opportunities presented by new scientific approaches to study the paleolimnological history of these lakes, therefore, come at the expense of higher logistic, communication, and analytical efforts. Here we review types of data that can be obtained in ancient lake drilling projects and the analytical approaches that can be applied to empirically and statistically link diverse datasets to create an integrative perspective on geological and biological data. In doing so, we highlight strengths and potential weaknesses of new methods and analyses, and provide recommendations for future interdisciplinary deep-drilling projects.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information. Here, we describe the current state of the art in computational image analysis in the zebrafish system. We discuss the challenges encountered when handling high-content image data, especially with regard to data quality, annotation, and storage. We survey methods for preprocessing image data for further analysis, and describe selected examples of automated image analysis, including the tracking of cells during embryogenesis, heartbeat detection, identification of dead embryos, recognition of tissues and anatomical landmarks, and quantification of behavioral patterns of adult fish. We review recent examples for applications using such methods, such as the comprehensive analysis of cell lineages during early development, the generation of a three-dimensional brain atlas of zebrafish larvae, and high-throughput drug screens based on movement patterns. Finally, we identify future challenges for the zebrafish image analysis community, notably those concerning the compatibility of algorithms and data formats for the assembly of modular analysis pipelines.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The research described here is supported by the award made by the RCUK Digital Economy program to the dot.rural Digital Economy Hub; award reference: EP/G066051/1.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Phase equilibrium data regression is an unavoidable task necessary to obtain the appropriate values for any model to be used in separation equipment design for chemical process simulation and optimization. The accuracy of this process depends on different factors such as the experimental data quality, the selected model and the calculation algorithm. The present paper summarizes the results and conclusions achieved in our research on the capabilities and limitations of the existing GE models and about strategies that can be included in the correlation algorithms to improve the convergence and avoid inconsistencies. The NRTL model has been selected as a representative local composition model. New capabilities of this model, but also several relevant limitations, have been identified and some examples of the application of a modified NRTL equation have been discussed. Furthermore, a regression algorithm has been developed that allows for the advisable simultaneous regression of all the condensed phase equilibrium regions that are present in ternary systems at constant T and P. It includes specific strategies designed to avoid some of the pitfalls frequently found in commercial regression tools for phase equilibrium calculations. Most of the proposed strategies are based on the geometrical interpretation of the lowest common tangent plane equilibrium criterion, which allows an unambiguous comprehension of the behavior of the mixtures. The paper aims to show all the work as a whole in order to reveal the necessary efforts that must be devoted to overcome the difficulties that still exist in the phase equilibrium data regression problem.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper reviews the key features of an environment to support domain users in spatial information system (SIS) development. It presents a full design and prototype implementation of a repository system for the storage and management of metadata, focusing on a subset of spatial data integrity constraint classes. The system is designed to support spatial system development and customization by users within the domain that the system will operate.