936 resultados para Data quality problems
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each nodes split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Resumo:
Recent marine long-offset transient electromagnetic (LOTEM) measurements yielded the offshore delineation of a fresh groundwater body beneath the seafloor in the region of Bat Yam, Israel. The LOTEM application was effective in detecting this freshwater body underneath the Mediterranean Sea and allowed an estimation of its seaward extent. However, the measured data set was insufficient to understand the hydrogeological configuration and mechanism controlling the occurrence of this fresh groundwater discovery. Especially the lateral geometry of the freshwater boundary, important for the hydrogeological modelling, could not be resolved. Without such an understanding, a rational management of this unexploited groundwater reservoir is not possible. Two new high-resolution marine time-domain electromagnetic methods are theoretically developed to derive the hydrogeological structure of the western aquifer boundary. The first is called Circular Electric Dipole (CED). It is the land-based analogous of the Vertical Electric Dipole (VED), which is commonly applied to detect resistive structures in the subsurface. Although the CED shows exceptional detectability characteristics in the step-off signal towards the sub-seafloor freshwater body, an actual application was not carried out in the extent of this study. It was found that the method suffers from an insufficient signal strength to adequately delineate the resistive aquifer under realistic noise conditions. Moreover, modelling studies demonstrated that severe signal distortions are caused by the slightest geometrical inaccuracies. As a result, a successful application of CED in Israel proved to be rather doubtful. A second method called Differential Electric Dipole (DED) is developed as an alternative to the intended CED method. Compared to the conventional marine time-domain electromagnetic system that commonly applies a horizontal electric dipole transmitter, the DED is composed of two horizontal electric dipoles in an in-line configuration that share a common central electrode. Theoretically, DED has similar detectability/resolution characteristics compared to the conventional LOTEM system. However, the superior lateral resolution towards multi-dimensional resistivity structures make an application desirable. Furthermore, the method is less susceptible towards geometrical errors making an application in Israel feasible. In the extent of this thesis, the novel marine DED method is substantiated using several one-dimensional (1D) and multi-dimensional (2D/3D) modelling studies. The main emphasis lies on the application in Israel. Preliminary resistivity models are derived from the previous marine LOTEM measurement and tested for a DED application. The DED method is effective in locating the two-dimensional resistivity structure at the western aquifer boundary. Moreover, a prediction regarding the hydrogeological boundary conditions are feasible, provided a brackish water zone exists at the head of the interface. A seafloor-based DED transmitter/receiver system is designed and built at the Institute of Geophysics and Meteorology at the University of Cologne. The first DED measurements were carried out in Israel in April 2016. The acquired data set is the first of its kind. The measured data is processed and subsequently interpreted using 1D inversion. The intended aim of interpreting both step-on and step-off signals failed, due to the insufficient data quality of the latter. Yet, the 1D inversion models of the DED step-on signals clearly detect the freshwater body for receivers located close to the Israeli coast. Additionally, a lateral resistivity contrast is observable in the 1D inversion models that allow to constrain the seaward extent of this freshwater body. A large-scale 2D modelling study followed the 1D interpretation. In total, 425 600 forward calculations are conducted to find a sub-seafloor resistivity distribution that adequately explains the measured data. The results indicate that the western aquifer boundary is located at 3600 m - 3700 m before the coast. Moreover, a brackish water zone of 3 Omega*m to 5 Omega*m with a lateral extent of less than 300 m is likely located at the head of the freshwater aquifer. Based on these results, it is predicted that the sub-seafloor freshwater body is indeed open to the sea and may be vulnerable to seawater intrusion.
Resumo:
Collecting ground truth data is an important step to be accomplished before performing a supervised classification. However, its quality depends on human, financial and time ressources. It is then important to apply a validation process to assess the reliability of the acquired data. In this study, agricultural infomation was collected in the Brazilian Amazonian State of Mato Grosso in order to map crop expansion based on MODIS EVI temporal profiles. The field work was carried out through interviews for the years 2005-2006 and 2006-2007. This work presents a methodology to validate the training data quality and determine the optimal sample to be used according to the classifier employed. The technique is based on the detection of outlier pixels for each class and is carried out by computing Mahalanobis distances for each pixel. The higher the distance, the further the pixel is from the class centre. Preliminary observations through variation coefficent validate the efficiency of the technique to detect outliers. Then, various subsamples are defined by applying different thresholds to exclude outlier pixels from the classification process. The classification results prove the robustness of the Maximum Likelihood and Spectral Angle Mapper classifiers. Indeed, those classifiers were insensitive to outlier exclusion. On the contrary, the decision tree classifier showed better results when deleting 7.5% of pixels in the training data. The technique managed to detect outliers for all classes. In this study, few outliers were present in the training data, so that the classification quality was not deeply affected by the outliers.
Resumo:
Objective: To assess extent of coder agreement for external causes of injury using ICD-10-AM for injury-related hospitalisations in Australian public hospitals. Methods: A random sample of 4850 discharges from 2002 to 2004 was obtained from a stratified random sample of 50 hospitals across four states in Australia. On-site medical record reviews were conducted and external cause codes were assigned blinded to the original coded data. Code agreement levels were grouped into the following agreement categories: block level, 3-character level, 4-character level, 5th-character level, and complete code level. Results: At a broad block level, code agreement was found in over 90% of cases for most mechanisms (eg, transport, fall). Percentage disagreement was 26.0% at the 3-character level; agreement for the complete external cause code was 67.6%. For activity codes, the percentage of disagreement at the 3-character level was 7.3% and agreement for the complete activity code was 68.0%. For place of occurrence codes, the percentage of disagreement at the 4-character level was 22.0%; agreement for the complete place code was 75.4%. Conclusions: With 68% agreement for complete codes and 74% agreement for 3-character codes, as well as variability in agreement levels across different code blocks, place and activity codes, researchers need to be aware of the reliability of their specific data of interest when they wish to undertake trend analyses or case selection for specific causes of interest.
Resumo:
Catheter-related bloodstream infections are a serious problem. Many interventions reduce risk, and some have been evaluated in cost-effectiveness studies. We review the usefulness and quality of these economic studies. Evidence is incomplete, and data required to inform a coherent policy are missing. The cost-effectiveness studies are characterized by a lack of transparency, short time-horizons, and narrow economic perspectives. Data quality is low for some important model parameters. Authors of future economic evaluations should aim to model the complete policy and not just single interventions. They should be rigorous in developing the structure of the economic model, include all relevant economic outcomes, use a systematic approach for selecting data sources for model parameters, and propagate the effect of uncertainty in model parameters on conclusions. This will inform future data collection and improve our understanding of the economics of preventing these infections.
Resumo:
SOH see significant benefit in digitising its drawings and operation and maintenance manuals. Since SOH do not currently have digital models of the Opera House structure or other components, there is an opportunity for this national case study to promote the application of Digital Facility Modelling using standardized Building Information Models (BIM). The digital modelling element of this project examined the potential of building information models for Facility Management focusing on the following areas: The re-usability of building information for FM purposes BIM as an Integrated information model for facility management Extendibility of the BIM to cope with business specific requirements Commercial facility management software using standardised building information models The ability to add (organisation specific) intelligence to the model A roadmap for SOH to adopt BIM for FM The project has established that BIM building information modelling - is an appropriate and potentially beneficial technology for the storage of integrated building, maintenance and management data for SOH. Based on the attributes of a BIM, several advantages can be envisioned: consistency in the data, intelligence in the model, multiple representations, source of information for intelligent programs and intelligent queries. The IFC open building exchange standard specification provides comprehensive support for asset and facility management functions, and offers new management, collaboration and procurement relationships based on sharing of intelligent building data. The major advantages of using an open standard are: information can be read and manipulated by any compliant software, reduced user lock in to proprietary solutions, third party software can be the best of breed to suit the process and scope at hand, standardised BIM solutions consider the wider implications of information exchange outside the scope of any particular vendor, information can be archived as ASCII files for archival purposes, and data quality can be enhanced as the now single source of users information has improved accuracy, correctness, currency, completeness and relevance. SOH current building standards have been successfully drafted for a BIM environment and are confidently expected to be fully developed when BIM is adopted operationally by SOH. There have been remarkably few technical difficulties in converting the Houses existing conventions and standards to the new model based environment. This demonstrates that the IFC model represents world practice for building data representation and management (see Sydney Opera House FM Exemplar Project Report Number 2005-001-C-3, Open Specification for BIM: Sydney Opera House Case Study). Availability of FM applications based on BIM is in its infancy but focussed systems are already in operation internationally and show excellent prospects for implementation systems at SOH. In addition to the generic benefits of standardised BIM described above, the following FM specific advantages can be expected from this new integrated facilities management environment: faster and more effective processes, controlled whole life costs and environmental data, better customer service, common operational picture for current and strategic planning, visual decision-making and a total ownership cost model. Tests with partial BIM data provided by several of SOHs current consultants show that the creation of a SOH complete model is realistic, but subject to resolution of compliance and detailed functional support by participating software applications. The showcase has demonstrated successfully that IFC based exchange is possible with several common BIM based applications through the creation of a new partial model of the building. Data exchanged has been geometrically accurate (the SOH building structure represents some of the most complex building elements) and supports rich information describing the types of objects, with their properties and relationships.
Resumo:
Introduction: Some types of antimicrobial-coated central venous catheters (A-CVC) have been shown to be cost-effective in preventing catheter-related bloodstream infection (CR-BSI). However, not all types have been evaluated, and there are concerns over the quality and usefulness of these earlier studies. There is uncertainty amongst clinicians over which, if any, antimicrobial-coated central venous catheters to use. We re-evaluated the cost-effectiveness of all commercially available antimicrobialcoated central venous catheters for prevention of catheter-related bloodstream infection in adult intensive care unit (ICU) patients. Methods: We used a Markov decision model to compare the cost-effectiveness of antimicrobial-coated central venous catheters relative to uncoated catheters. Four catheter types were evaluated; minocycline and rifampicin (MR)-coated catheters; silver, platinum and carbon (SPC)-impregnated catheters; and two chlorhexidine and silver sulfadiazine-coated catheters, one coated on the external surface (CH/SSD (ext)) and the other coated on both surfaces (CH/SSD (int/ext)). The incremental cost per qualityadjusted life-year gained and the expected net monetary benefits were estimated for each. Uncertainty arising from data estimates, data quality and heterogeneity was explored in sensitivity analyses. Results: The baseline analysis, with no consideration of uncertainty, indicated all four types of antimicrobial-coated central venous catheters were cost-saving relative to uncoated catheters. Minocycline and rifampicin-coated catheters prevented 15 infections per 1,000 catheters and generated the greatest health benefits, 1.6 quality-adjusted life-years, and cost-savings, AUD $130,289. After considering uncertainty in the current evidence, the minocycline and rifampicin-coated catheters returned the highest incremental monetary net benefits of $948 per catheter; but there was a 62% probability of error in this conclusion. Although the minocycline and rifampicin-coated catheters had the highest monetary net benefits across multiple scenarios, the decision was always associated with high uncertainty. Conclusions: Current evidence suggests that the cost-effectiveness of using antimicrobial-coated central venous catheters within the ICU is highly uncertain. Policies to prevent catheter-related bloodstream infection amongst ICU patients should consider the cost-effectiveness of competing interventions in the light of this uncertainty. Decision makers would do well to consider the current gaps in knowledge and the complexity of producing good quality evidence in this area.
Resumo:
The following paper proposes a novel application of Skid-to-Turn maneuvers for fixed wing Unmanned Aerial Vehicles (UAVs) inspecting locally linear infrastructure. Fixed wing UAVs, following the design of manned aircraft, commonly employ Bank-to-Turn ma- neuvers to change heading and thus direction of travel. Whilst effective, banking an aircraft during the inspection of ground based features hinders data collection, with body fixed sen- sors angled away from the direction of turn and a panning motion induced through roll rate that can reduce data quality. By adopting Skid-to-Turn maneuvers, the aircraft can change heading whilst maintaining wings level flight, thus allowing body fixed sensors to main- tain a downward facing orientation. An Image-Based Visual Servo controller is developed to directly control the position of features as captured by onboard inspection sensors. This improves on the indirect approach taken by other tracking controllers where a course over ground directly above the feature is assumed to capture it centered in the field of view. Performance of the proposed controller is compared against that of a Bank-to-Turn tracking controller driven by GPS derived cross track error in a simulation environment developed to replicate the field of view of a body fixed camera.
Resumo:
Post license advanced driver training programs in the US and early programs in Europe have often failed to accomplish their stated objectives because, it is suspected, that drivers gain self perceived driving skills that exceed their true skillsleading to increased post training crashes. The consensus from the evaluation of countless advanced driver training programs is that these programs are a detriment to safety, especially for novice, young, male drivers. Some European countries including Sweden, Finland, Austria, Luxembourg, and Norway, have continued to refine these programs, with an entirely new training philosophy emerging around 1990. These post-renewal programs have shown considerable promise, despite various data quality and availability concerns. These programs share in common a focus on teaching drivers about self assessment and anticipation of risk, as opposed to teaching drivers how to master driving at the limits of tire adhesion. The programs focus on factors such as self actualization and driving discipline, rather than low level mastery of skills. Drivers are meant to depart these renewed programs with a more realistic assessment of their driving abilities. These renewed programs require considerable specialized and costly infrastructure including dedicated driver training facilities with driving modules engineered specifically for advanced driver training and highly structured curricula. They are conspicuously missing from both the US road safety toolbox and academic literature. Given the considerable road safety concerns associated with US novice male drivers in particular, these programs warrant further attention. This paper reviews the predominant features and empirical evidence surrounding post licensing advanced driver training programs focused on novice drivers. A clear articulation of differences between the renewed and current US advanced driver training programs is provided. While the individual quantitative evaluations range from marginally to significantly effective in reducing novice driver crash risk, they have been criticized for evaluation deficiencies ranging from small sample sizes to confounding variables to lack of exposure metrics. Collectively, however, the programs sited in the paper suggest at least a marginally positive effect that needs to be validated with further studies. If additional well controlled studies can validate these programs, a pilot program in the US should be considered.
Resumo:
In this paper, the performance of voltage-source converter-based shunt and series compensators used for load voltage control in electrical power distribution systems has been analyzed and compared, when a nonlinear load is connected across the load bus. The comparison has been made based on the closed-loop frequency resopnse characteristics of the compensated distribution system. A distribution static compensator (DSTATCOM) as a shunt device and a dynamic voltage restorer (DVR) as a series device are considered in the voltage-control mode for the comparison. The power-quality problems which these compensator address include voltage sags/swells, load voltage harmonic distortions, and unbalancing. The effect of various system parameters on the control performance of the compensator can be studied using the proposed analysis. In particular, the performance of the two compensators are compared with the strong ac supply (stiff source) and weak ac-supply (non-still source) distribution system. The experimental verification of the analytical results derived has been obtained using a laboratory model of the single-phase DSTATCOM and DVR. A generalized converter topology using a cascaded multilevel inverter has been proposed for the medium-voltage distribution system. Simulation studies have been performed in the PSCAD/EMTDC software to verify the results in the three-phase system.
Resumo:
There has been an increasing interest by governments worldwide in the potential benefits of open access to public sector information (PSI). However, an important question remains: can a government incur tortious liability for incorrect information released online under an open content licence? This paper argues that the release of PSI online for free under an open content licence, specifically a Creative Commons licence, is within the bounds of an acceptable level of risk to government, especially where users are informed of the limitations of the data and appropriate information management policies and principles are in place to ensure accountability for data quality and accuracy.
Resumo:
The impact of urban development and climate change has created the impetus to monitor changes in the environment, particularly, the behaviour, habitat and movement of fauna species. The aim of this chapter is to present the design and development of a sensor network based on smart phones to automatically collect and analyse acoustic and visual data for environmental monitoring purposes. Due to the communication and sophisticated programming facilities offered by smart phones, software tools can be developed to allow data to be collected, partially processed and sent to a remote server over the network for storage and further processing. This sensor network which employs a client-server architecture has been deployed in three applications: monitoring a rare bird species near Brisbane Airport, study of koalas behaviour at St Bees Island, and detection of fruit flies. The users of this system include scientists (e.g. ecologists, ornithologists, computer scientists) and community groups participating in data collection or reporting on the environment (e.g. students, bird watchers). The chapter focuses on the following aspects of our research: issues involved in using smart phones as sensors; the overall framework for data acquisition, data quality control, data management and analysis; current and future applications of the smart phone-based sensor network, and our future research directions.
Resumo:
Online learning algorithms have recently risen to prominence due to their strong theoretical guarantees and an increasing number of practical applications for large-scale data analysis problems. In this paper, we analyze a class of online learning algorithms based on fixed potentials and nonlinearized losses, which yields algorithms with implicit update rules. We show how to efficiently compute these updates, and we prove regret bounds for the algorithms. We apply our formulation to several special cases where our approach has benefits over existing online learning methods. In particular, we provide improved algorithms and bounds for the online metric learning problem, and show improved robustness for online linear prediction problems. Results over a variety of data sets demonstrate the advantages of our framework.
Resumo:
This paper highlights the contemporary disadvantaged position of Indigenous peoples of Australia. It discusses a number of data quality issues on Indigenous data, before examining Indigenous disadvantage across five key areas: (1) education; (2) employment; (3) housing and living conditions; (4) health and wellbeing; and (5) crime and justice. Given the call for all governments to implement a framework to overcome Indigenous disadvantage, we recommend that future research begin with an investigation of non-Indigenous attitudes towards, and knowledge of, the position of Indigenous peoples in Australia. This is essential towards developing an understanding of the general publics current perceptions of Indigenous peoples position in Australia, particularly where the development of policies pertaining to Indigenous peoples requires cooperative action and the support of the broader Australian population.
Resumo:
The World Health Organization recommends that data on mortality in its member countries are collected utilising the Medical Certificate of Cause of Death published in the instruction volume of the ICD-10. However, investment in health information processes necessary to promote the use of this certificate and improve mortality information is lacking in many countries. An appeal for support to make improvements has been launched through the Health Metrics Networks MOVE-IT strategy (Monitoring of Vital Events Information Technology) [World Health Organization, 2011]. Despite this international spotlight on the need for capture of mortality data and in the use of the ICD-10 to code the data reported on such certificates, there is little cohesion in the way that certifiers of deaths receive instruction in how to complete the death certificate, which is the main source document for mortality statistics. Complete and accurate documentation of the immediate, underlying and contributory causes of death of the decedent on the death certificate is a requirement to produce standardised statistical information and to the ability to produce cause-specific mortality statistics that can be compared between populations and across time. This paper reports on a research project conducted to determine the efficacy and accessibility of the certification module of the WHOs newly-developed web based training tool for coders and certifiers of deaths. Involving a population of medical students from the Fiji School of Medicine and a pre and post research design, the study entailed completion of death certificates based on vignettes before and after access to the training tool. The ability of the participants to complete the death certificates and analysis of the completeness and specificity of the ICD-10 coding of the reported causes of death were used to measure the effect of the students learning from the training tool. The quality of death certificate completion was assessed using a Quality Index before and after the participants accessed the training tool. In addition, the views of the participants about accessibility and use of the training tool were elicited using a supplementary questionnaire. The results of the study demonstrated improvement in the ability of the participants to complete death certificates completely and accurately according to best practice. The training tool was viewed very positively and its implementation in the curriculum for medical students was encouraged. Participants also recommended that interactive discussions to examine the certification exercises would be an advantage.