953 resultados para DATA QUALITY
Resumo:
Pyridoxal kinase (PdxK; EC 2.7.1.35) belongs to the phosphotransferase family of enzymes and catalyzes the conversion of the three active forms of vitamin B-6, pyridoxine, pyridoxal and pyridoxamine, to their phosphorylated forms and thereby plays a key role in pyridoxal 5 `-phosphate salvage. In the present study, pyridoxal kinase from Salmonella typhimurium was cloned and overexpressed in Escherichia coli, purified using Ni-NTA affinity chromatography and crystallized. X-ray diffraction data were collected to 2.6 angstrom resolution at 100 K. The crystal belonged to the primitive orthorhombic space group P2(1)2(1)2(1), with unitcell parameters a = 65.11, b = 72.89, c = 107.52 angstrom. The data quality obtained by routine processing was poor owing to the presence of strong diffraction rings caused by a polycrystalline material of an unknown small molecule in all oscillation images. Excluding the reflections close to powder/polycrystalline rings provided data of sufficient quality for structure determination. A preliminary structure solution has been obtained by molecular replacement with the Phaser program in the CCP4 suite using E. coli pyridoxal kinase (PDB entry 2ddm) as the phasing model. Further refinement and analysis of the structure are likely to provide valuable insights into catalysis by pyridoxal kinases.
Resumo:
Displacement estimation is a key step in the evaluation of tissue elasticity by quasistatic strain imaging. An efficient approach may incorporate a tracking strategy whereby each estimate is initially obtained from its neighbours' displacements and then refined through a localized search. This increases the accuracy and reduces the computational expense compared with exhaustive search. However, simple tracking strategies fail when the target displacement map exhibits complex structure. For example, there may be discontinuities and regions of indeterminate displacement caused by decorrelation between the pre- and post-deformation radio frequency (RF) echo signals. This paper introduces a novel displacement tracking algorithm, with a search strategy guided by a data quality indicator. Comparisons with existing methods show that the proposed algorithm is more robust when the displacement distribution is challenging.
Resumo:
The mapping and geospatial analysis of benthic environments are multidisciplinary tasks that have become more accessible in recent years because of advances in technology and cost reductions in survey systems. The complex relationships that exist among physical, biological, and chemical seafloor components require advanced, integrated analysis techniques to enable scientists and others to visualize patterns and, in so doing, allow inferences to be made about benthic processes. Effective mapping, analysis, and visualization of marine habitats are particularly important because the subtidal seafloor environment is not readily viewed directly by eye. Research in benthic environments relies heavily, therefore, on remote sensing techniques to collect effective data. Because many benthic scientists are not mapping professionals, they may not adequately consider the links between data collection, data analysis, and data visualization. Projects often start with clear goals, but may be hampered by the technical details and skills required for maintaining data quality through the entire process from collection through analysis and presentation. The lack of technical understanding of the entire data handling process can represent a significant impediment to success. While many benthic mapping efforts have detailed their methodology as it relates to the overall scientific goals of a project, only a few published papers and reports focus on the analysis and visualization components (Paton et al. 1997, Weihe et al. 1999, Basu and Saxena 1999, Bruce et al. 1997). In particular, the benthic mapping literature often briefly describes data collection and analysis methods, but fails to provide sufficiently detailed explanation of particular analysis techniques or display methodologies so that others can employ them. In general, such techniques are in large part guided by the data acquisition methods, which can include both aerial and water-based remote sensing methods to map the seafloor without physical disturbance, as well as physical sampling methodologies (e.g., grab or core sampling). The terms benthic mapping and benthic habitat mapping are often used synonymously to describe seafloor mapping conducted for the purpose of benthic habitat identification. There is a subtle yet important difference, however, between general benthic mapping and benthic habitat mapping. The distinction is important because it dictates the sequential analysis and visualization techniques that are employed following data collection. In this paper general seafloor mapping for identification of regional geologic features and morphology is defined as benthic mapping. Benthic habitat mapping incorporates the regional scale geologic information but also includes higher resolution surveys and analysis of biological communities to identify the biological habitats. In addition, this paper adopts the definition of habitats established by Kostylev et al. (2001) as a “spatially defined area where the physical, chemical, and biological environment is distinctly different from the surrounding environment.” (PDF contains 31 pages)
Resumo:
Smartphones and other powerful sensor-equipped consumer devices make it possible to sense the physical world at an unprecedented scale. Nearly 2 million Android and iOS devices are activated every day, each carrying numerous sensors and a high-speed internet connection. Whereas traditional sensor networks have typically deployed a fixed number of devices to sense a particular phenomena, community networks can grow as additional participants choose to install apps and join the network. In principle, this allows networks of thousands or millions of sensors to be created quickly and at low cost. However, making reliable inferences about the world using so many community sensors involves several challenges, including scalability, data quality, mobility, and user privacy.
This thesis focuses on how learning at both the sensor- and network-level can provide scalable techniques for data collection and event detection. First, this thesis considers the abstract problem of distributed algorithms for data collection, and proposes a distributed, online approach to selecting which set of sensors should be queried. In addition to providing theoretical guarantees for submodular objective functions, the approach is also compatible with local rules or heuristics for detecting and transmitting potentially valuable observations. Next, the thesis presents a decentralized algorithm for spatial event detection, and describes its use detecting strong earthquakes within the Caltech Community Seismic Network. Despite the fact that strong earthquakes are rare and complex events, and that community sensors can be very noisy, our decentralized anomaly detection approach obtains theoretical guarantees for event detection performance while simultaneously limiting the rate of false alarms.
Resumo:
Technology-supported citizen science has created huge volumes of data with increasing potential to facilitate scientific progress, however, verifying data quality is still a substantial hurdle due to the limitations of existing data quality mechanisms. In this study, we adopted a mixed methods approach to investigate community-based data validation practices and the characteristics of records of wildlife species observations that affected the outcomes of collaborative data quality management in an online community where people record what they see in the nature. The findings describe the processes that both relied upon and added to information provenance through information stewardship behaviors, which led to improved reliability and informativity. The likelihood of community-based validation interactions were predicted by several factors, including the types of organisms observed and whether the data were submitted from a mobile device. We conclude with implications for technology design, citizen science practices, and research.
Resumo:
The GEOTRACES Intermediate Data Product 2014 (IDP2014) is the first publicly available data product of the international GEOTRACES programme, and contains data measured and quality controlled before the end of 2013. It consists of two parts: (1) a compilation of digital data for more than 200 trace elements and isotopes (TEls) as well as classical hydrographic parameters, and (2) the eGEOTRACES Electronic Atlas providing a strongly inter-linked on-line atlas including more than 300 section plots and 90 animated 3D scenes. The IDP2014 covers the Atlantic, Arctic, and Indian oceans, exhibiting highest data density in the Atlantic. The TEI data in the IDP2014 are quality controlled by careful assessment of intercalibration results and multi-laboratory data comparisons at cross-over stations. The digital data are provided in several formats, including ASCII spreadsheet, Excel spreadsheet, netCDF, and Ocean Data View collection. In addition to the actual data values the IDP2014 also contains data quality flags and 1-sigma data error values where available. Quality flags and error values are useful for data filtering. Metadata about data originators, analytical methods and original publications related to the data are linked to the data in an easily accessible way. The eGEOTRACES Electronic Atlas is the visual representation of the IDP2014 data providing section plots and a new kind of animated 3D scenes. The basin-wide 3D scenes allow for viewing of data from many cruises at the same time, thereby providing quick overviews of large-scale tracer distributions. In addition, the 3D scenes provide geographical and bathymetric context that is crucial for the interpretation and assessment of observed tracer plumes, as well as for making inferences about controlling processes.
Resumo:
In many environmental valuation applications standard sample sizes for choice modelling surveys are impractical to achieve. One can improve data quality using more in-depth surveys administered to fewer respondents. We report on a study using high quality rank-ordered data elicited with the best-worst approach. The resulting "exploded logit" choice model, estimated on 64 responses per person, was used to study the willingness to pay for external benefits by visitors for policies which maintain the cultural heritage of alpine grazing commons. We find evidence supporting this approach and reasonable estimates of mean WTP, which appear theoretically valid and policy informative. © The Author (2011).
Resumo:
This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.
Resumo:
The emergence of new business models, namely, the establishment of partnerships between organizations, the chance that companies have of adding existing data on the web, especially in the semantic web, to their information, led to the emphasis on some problems existing in databases, particularly related to data quality. Poor data can result in loss of competitiveness of the organizations holding these data, and may even lead to their disappearance, since many of their decision-making processes are based on these data. For this reason, data cleaning is essential. Current approaches to solve these problems are closely linked to database schemas and specific domains. In order that data cleaning can be used in different repositories, it is necessary for computer systems to understand these data, i.e., an associated semantic is needed. The solution presented in this paper includes the use of ontologies: (i) for the specification of data cleaning operations and, (ii) as a way of solving the semantic heterogeneity problems of data stored in different sources. With data cleaning operations defined at a conceptual level and existing mappings between domain ontologies and an ontology that results from a database, they may be instantiated and proposed to the expert/specialist to be executed over that database, thus enabling their interoperability.
Resumo:
The enhanced functional sensitivity offered by ultra-high field imaging may significantly benefit simultaneous EEG-fMRI studies, but the concurrent increases in artifact contamination can strongly compromise EEG data quality. In the present study, we focus on EEG artifacts created by head motion in the static B0 field. A novel approach for motion artifact detection is proposed, based on a simple modification of a commercial EEG cap, in which four electrodes are non-permanently adapted to record only magnetic induction effects. Simultaneous EEG-fMRI data were acquired with this setup, at 7T, from healthy volunteers undergoing a reversing-checkerboard visual stimulation paradigm. Data analysis assisted by the motion sensors revealed that, after gradient artifact correction, EEG signal variance was largely dominated by pulse artifacts (81-93%), but contributions from spontaneous motion (4-13%) were still comparable to or even larger than those of actual neuronal activity (3-9%). Multiple approaches were tested to determine the most effective procedure for denoising EEG data incorporating motion sensor information. Optimal results were obtained by applying an initial pulse artifact correction step (AAS-based), followed by motion artifact correction (based on the motion sensors) and ICA denoising. On average, motion artifact correction (after AAS) yielded a 61% reduction in signal power and a 62% increase in VEP trial-by-trial consistency. Combined with ICA, these improvements rose to a 74% power reduction and an 86% increase in trial consistency. Overall, the improvements achieved were well appreciable at single-subject and single-trial levels, and set an encouraging quality mark for simultaneous EEG-fMRI at ultra-high field.
Resumo:
Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.
Resumo:
For users of climate services, the ability to quickly determine the datasets that best fit one's needs would be invaluable. The volume, variety and complexity of climate data makes this judgment difficult. The ambition of CHARMe ("Characterization of metadata to enable high-quality climate services") is to give a wider interdisciplinary community access to a range of supporting information, such as journal articles, technical reports or feedback on previous applications of the data. The capture and discovery of this "commentary" information, often created by data users rather than data providers, and currently not linked to the data themselves, has not been significantly addressed previously. CHARMe applies the principles of Linked Data and open web standards to associate, record, search and publish user-derived annotations in a way that can be read both by users and automated systems. Tools have been developed within the CHARMe project that enable annotation capability for data delivery systems already in wide use for discovering climate data. In addition, the project has developed advanced tools for exploring data and commentary in innovative ways, including an interactive data explorer and comparator ("CHARMe Maps") and a tool for correlating climate time series with external "significant events" (e.g. instrument failures or large volcanic eruptions) that affect the data quality. Although the project focuses on climate science, the concepts are general and could be applied to other fields. All CHARMe system software is open-source, released under a liberal licence, permitting future projects to re-use the source code as they wish.
Resumo:
This work presents one software developed to process solar radiation data. This software can be used in meteorological and climatic stations, and also as a support for solar radiation measurements in researches of solar energy availability allowing data quality control, statistical calculations and validation of models, as well as ease interchanging of data. (C) 1999 Elsevier B.V. Ltd. All rights reserved.
Resumo:
Nowadays, with the expansion of the reference stations networks, several positioning techniques have been developed and/or improved. Among them, the VRS (Virtual Reference Station) concept has been very used. In this paper the goal is to generate VRS data in a modified technique. In the proposed methodology the DD (double difference) ambiguities are not computed. The network correction terms are obtained using only atmospheric (ionospheric and tropospheric) models. In order to carry out the experiments it was used data of five reference stations from the GPS Active Network of West of São Paulo State and an extra station. To evaluate the VRS data quality it was used three different strategies: PPP (Precise Point Positioning) and Relative Positioning in static and kinematic modes, and DGPS (Differential GPS). Furthermore, the VRS data were generated in the position of a real reference station. The results provided by the VRS data agree quite well with those of the real file data.