936 resultados para Data quality problems
Resumo:
National estimates of the prevalence of child abuse-related injuries are obtained from a variety of sectors including welfare, justice, and health resulting in inconsistent estimates across sectors. The International Classification of Diseases (ICD) is used as the international standard for categorising health data and aggregating data for statistical purposes, though there has been limited validation of the quality, completeness or concordance of these data with other sectors. This research study examined the quality of documentation and coding of child abuse recorded in hospital records in Queensland and the concordance of these data with child welfare records. A retrospective medical record review was used to examine the clinical documentation of over 1000 hospitalised injured children from 20 hospitals in Queensland. A data linkage methodology was used to link these records with records in the child welfare database. Cases were sampled from three sub-groups according to the presence of target ICD codes: Definite abuse, Possible abuse, unintentional injury. Less than 2% of cases coded as being unintentional were recoded after review as being possible abuse, and only 5% of cases coded as possible abuse cases were reclassified as unintentional, though there was greater variation in the classification of cases as definite abuse compared to possible abuse. Concordance of health data with child welfare data varied across patient subgroups. This study will inform the development of strategies to improve the quality, consistency and concordance of information between health and welfare agencies to ensure adequate system responses to children at risk of abuse.
Resumo:
Field robots often rely on laser range finders (LRFs) to detect obstacles and navigate autonomously. Despite recent progress in sensing technology and perception algorithms, adverse environmental conditions, such as the presence of smoke, remain a challenging issue for these robots. In this paper, we investigate the possibility to improve laser-based perception applications by anticipating situations when laser data are affected by smoke, using supervised learning and state-of-the-art visual image quality analysis. We propose to train a k-nearest-neighbour (kNN) classifier to recognise situations where a laser scan is likely to be affected by smoke, based on visual data quality features. This method is evaluated experimentally using a mobile robot equipped with LRFs and a visual camera. The strengths and limitations of the technique are identified and discussed, and we show that the method is beneficial if conservative decisions are the most appropriate.
Resumo:
Health Information Exchange (HIE) is an interesting phenomenon. It is a patient centric health and/or medical information management scenario enhanced by integration of Information and Communication Technologies (ICT). While health information systems are repositioning complex system directives, in the wake of the big data paradigm, extracting quality information is challenging. It is anticipated that in this talk, ICT enabled healthcare scenarios with big data analytics will be shared. In addition, research and development regarding big data analytics, such as current trends of using these technologies for health care services and critical research challenges when extracting quality of information to improve quality of life will be discussed.
Resumo:
This program of research linked police and health data collections to investigate the potential benefits for road safety in terms of enhancing the quality of data. This research has important implications for road safety because, although police collected data has historically underpinned efforts in the area, it is known that many road crashes are not reported to police and that these data lack specific injury severity information. This research shows that data linkage provides a more accurate quantification of the severity and prevalence of road crash injuries which is essential for: prioritising funding; targeting interventions; and estimating the burden and cost of road trauma.
Resumo:
Most real-life data analysis problems are difficult to solve using exact methods, due to the size of the datasets and the nature of the underlying mechanisms of the system under investigation. As datasets grow even larger, finding the balance between the quality of the approximation and the computing time of the heuristic becomes non-trivial. One solution is to consider parallel methods, and to use the increased computational power to perform a deeper exploration of the solution space in a similar time. It is, however, difficult to estimate a priori whether parallelisation will provide the expected improvement. In this paper we consider a well-known method, genetic algorithms, and evaluate on two distinct problem types the behaviour of the classic and parallel implementations.
Resumo:
Free software and open source projects are often perceived to be of high quality. It has been suggested that the high level of quality found in some free software projects is related to the open development model which promotes peer review. While the quality of some free software projects is comparable to, if not better than, that of closed source software, not all free software projects are successful and of high quality. Even mature and successful projects face quality problems; some of these are related to the unique characteristics of free software and open source as a distributed development model led primarily by volunteers. In exploratory interviews performed with free software and open source developers, several common quality practices as well as actual quality problems have been identified. The results of these interviews are presented in this paper in order to take stock of the current status of quality in free software projects and to act as a starting point for the implementation of quality process improvement strategies.
Resumo:
Tal como o ttulo indica, esta tese estuda problemas de cobertura com alcance limitado. Dado um conjunto de antenas (ou qualquer outro dispositivo sem fios capaz de receber ou transmitir sinais), o objectivo deste trabalho calcular o alcance mnimo das antenas de modo a que estas cubram completamente um caminho entre dois pontos numa regio. Um caminho que apresente estas caractersticas um itinerrio seguro. A definio de cobertura varivel e depende da aplicao a que se destina. No caso de situaes crticas como o controlo de fogos ou cenrios militares, a definio de cobertura recorre utilizao de mais do que uma antena para aumentar a eficcia deste tipo de vigilncia. No entanto, o alcance das antenas dever ser minimizado de modo a manter a vigilncia activa o maior tempo possvel. Consequentemente, esta tese est centrada na resoluo deste problema de optimizao e na obteno de uma soluo particular para cada caso. Embora este problema de optimizao tenha sido investigado como um problema de cobertura, possvel estabelecer um paralelismo entre problemas de cobertura e problemas de iluminao e vigilncia, que so habitualmente designados como problemas da Galeria de Arte. Para converter um problema de cobertura num de iluminao basta considerar um conjunto de luzes em vez de um conjunto de antenas e submet-lo a restries idnticas. O principal tema do conjunto de problemas da Galeria de Arte abordado nesta tese a 1-boa iluminao. Diz-se que um objecto est 1-bem iluminado por um conjunto de luzes se o invlucro convexo destas contm o objecto, tornando assim este conceito num tipo de iluminao de qualidade. O objectivo desta parte do trabalho ento minimizar o alcance das luzes de modo a manter uma iluminao de qualidade. So tambm apresentadas duas variantes da 1-boa iluminao: a iluminao ortogonal e a boa !-iluminao. Esta ltima tem aplicaes em problemas de profundidade e visualizao de dados, temas que so frequentemente abordados em estatstica. A resoluo destes problemas usando o diagrama de Voronoi Envolvente (uma variante do diagrama de Voronoi adaptada a problemas de boa iluminao) tambm proposta nesta tese.
Resumo:
The emergence of new business models, namely, the establishment of partnerships between organizations, the chance that companies have of adding existing data on the web, especially in the semantic web, to their information, led to the emphasis on some problems existing in databases, particularly related to data quality. Poor data can result in loss of competitiveness of the organizations holding these data, and may even lead to their disappearance, since many of their decision-making processes are based on these data. For this reason, data cleaning is essential. Current approaches to solve these problems are closely linked to database schemas and specific domains. In order that data cleaning can be used in different repositories, it is necessary for computer systems to understand these data, i.e., an associated semantic is needed. The solution presented in this paper includes the use of ontologies: (i) for the specification of data cleaning operations and, (ii) as a way of solving the semantic heterogeneity problems of data stored in different sources. With data cleaning operations defined at a conceptual level and existing mappings between domain ontologies and an ontology that results from a database, they may be instantiated and proposed to the expert/specialist to be executed over that database, thus enabling their interoperability.
Resumo:
Dissertation elaborated for the partial fulfilment of the requirements of the Master Degree in Civil Engineering in the Speciality Area of Hydarulics
Resumo:
In January 1992, there was a major pollutant event for the River Canon and downstream with its confluence to the River Fal and the Fal estuary in the west Cornwall. This incident was associated with the discharge of several million gallons of highly polluted water from the abandoned Wheal Jane tin mine that also extracted Ag, Cu and Zn ore. Later that year, the Centre for Ecology and Hydrology (CBH; then Institute of Hydrology) Wallingford undertook daily monitoring of the River Canon for a range of major, minor and trace elements to assess the nature and the dynamics of the pollutant discharges. These data cover an 18-month period when there remained major water-quality problems after the initial phase of surface water contamination. Here, a summary is provided of the water quality found, as a backdrop to set against subsequent remediation. Two types of water-quality determinant grouping were observed. The first type comprises the determinants B, Cs, Ca, Li, K, Na, SO4, Rb and Sr, and their concentrations are positively correlated with each other but inversely correlated with flow. This type of water-quality determinant shows variations in concentration that broadly link to the normal hydrogeochemical processes within the catchment, with limited confounding issues associated with mine drainage. The second type of water-quality determinant comprises Al, Be, Cd, Ce, Co, Cu, Fe, La, Pb, Pr, Nd, Ni, Si, Sb, U, Y and Zn, and concentrations for all this group are positively correlated. The determinants in this second group all have concentrations that are negatively correlated with pH. This group links primarily to pollutant mine discharge. The water-quality variations in the River Camon are described in relation to these two distinct hydrogeochemical groupings. (C) 2004 Elsevier B.V All rights reserved.
Resumo:
The Primary Care Information System (SIAB) concentrates basic healthcare information from all different regions of Brazil. The information is collected by primary care teams on a paper-based procedure that degrades the quality of information provided to the healthcare authorities and slows down the process of decision making. To overcome these problems we propose a new data gathering application that uses a mobile device connected to a 3G network and a GPS to be used by the primary care teams for collecting the families' data. A prototype was developed in which a digital version of one SIAB form is made available at the mobile device. The prototype was tested in a basic healthcare unit located in a suburb of Sao Paulo. The results obtained so far have shown that the proposed process is a better alternative for data collecting at primary care, both in terms of data quality and lower deployment time to health care authorities.
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the instrument familiarization plan), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
Bovine spongiform encephalopathy (BSE) rapid tests and routine BSE-testing laboratories underlie strict regulations for approval. Due to the lack of BSE-positive control samples, however, full assay validation at the level of individual test runs and continuous monitoring of test performance on-site is difficult. Most rapid tests use synthetic prion protein peptides, but it is not known to which extend they reflect the assay performance on field samples, and whether they are sufficient to indicate on-site assay quality problems. To address this question we compared the test scores of the provided kit peptide controls to those of standardized weak BSE-positive tissue samples in individual test runs as well as continuously over time by quality control charts in two widely used BSE rapid tests. Our results reveal only a weak correlation between the weak positive tissue control and the peptide control scores. We identified kit-lot related shifts in the assay performances that were not reflected by the peptide control scores. Vice versa, not all shifts indicated by the peptide control scores indeed reflected a shift in the assay performance. In conclusion these data highlight that the use of the kit peptide controls for continuous quality control purposes may result in unjustified rejection or acceptance of test runs. However, standardized weak positive tissue controls in combination with Shewhart-CUSUM control charts appear to be reliable in continuously monitoring assay performance on-site to identify undesired deviations.
Resumo:
The current state of health and biomedicine includes an enormity of heterogeneous data silos, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the maps between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.
Resumo:
The Data Quality Campaign (DQC) has been focused since 2005 on advocating for states to build robust state longitudinal data systems (SLDS). While states have made great progress in their data infrastructure, and should continue to emphasize this work, t data systems alone will not improve outcomes. It is time for both DQC and states to focus on building capacity to use the information that these systems are producing at every level from classrooms to state houses. To impact system performance and student achievement, the ingrained culture must be replaced with one that focuses on data use for continuous improvement. The effective use of data to inform decisions, provide transparency, improve the measurement of outcomes, and fuel continuous improvement will not come to fruition unless there is a system wide focus on building capacity around the collection, analysis, dissemination, and use of this data, including through research.