858 resultados para Data cleaning

em Queensland University of Technology - ePrints Archive


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis provides a query model suitable for context sensitive access to a wide range of distributed linked datasets which are available to scientists using the Internet. The model is designed based on scientific research standards which require scientists to provide replicable methods in their publications. Although there are query models available that provide limited replicability, they do not contextualise the process whereby different scientists select dataset locations based on their trust and physical location. In different contexts, scientists need to perform different data cleaning actions, independent of the overall query, and the model was designed to accommodate this function. The query model was implemented as a prototype web application and its features were verified through its use as the engine behind a major scientific data access site, Bio2RDF.org. The prototype showed that it was possible to have context sensitive behaviour for each of the three mirrors of Bio2RDF.org using a single set of configuration settings. The prototype provided executable query provenance that could be attached to scientific publications to fulfil replicability requirements. The model was designed to make it simple to independently interpret and execute the query provenance documents using context specific profiles, without modifying the original provenance documents. Experiments using the prototype as the data access tool in workflow management systems confirmed that the design of the model made it possible to replicate results in different contexts with minimal additions, and no deletions, to query provenance documents.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Laboratory-based studies of human dietary behaviour benefit from highly controlled conditions; however, this approach can lack ecological validity. Identifying a reliable method to capture and quantify natural dietary behaviours represents an important challenge for researchers. In this study, we scrutinised cafeteria-style meals in the ‘Restaurant of the Future.’ Self-selected meals were weighed and photographed, both before and after consumption. Using standard portions of the same foods, these images were independently coded to produce accurate and reliable estimates of (i) initial self-served portions, and (ii) food remaining at the end of the meal. Plate cleaning was extremely common; in 86% of meals at least 90% of self-selected calories were consumed. Males ate a greater proportion of their self-selected meals than did females. Finally, when participants visited the restaurant more than once, the correspondence between selected portions was better predicted by the weight of the meal than by its energy content. These findings illustrate the potential benefits of meal photography in this context. However, they also highlight significant limitations, in particular, the need to exclude large amounts of data when one food obscures another.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Travel time in an important transport performance indicator. Different modes of transport (buses and cars) have different mechanical and operational characteristics, resulting in significantly different travel behaviours and complexities in multimodal travel time estimation on urban networks. This paper explores the relationship between bus and car travel time on urban networks by utilising the empirical Bluetooth and Bus Vehicle Identification data from Brisbane. The technologies and issues behind the two datasets are studied. After cleaning the data to remove outliers, the relationship between not-in-service bus and car travel time and the relationship between in-service bus and car travel time are discussed. The travel time estimation models reveal that the not-in-service bus travel time are similar to the car travel time and the in-service bus travel time could be used to estimate car travel time during off-peak hours

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The Researching Effective Approaches to Cleaning in Hospitals (REACH) study will generate evidence about the effectiveness and cost-effectiveness of a novel cleaning initiative that aims to improve the environmental cleanliness of hospitals. The initiative is an environmental cleaning bundle, with five interdependent, evidence-based components (training, technique, product, audit and communication) implemented with environmental services staff to enhance hospital cleaning practices. Methods/design The REACH study will use a stepped-wedge randomised controlled design to test the study intervention, an environmental cleaning bundle, in 11 Australian hospitals. All trial hospitals will receive the intervention and act as their own control, with analysis undertaken of the change within each hospital based on data collected in the control and intervention periods. Each site will be randomised to one of the 11 intervention timings with staggered commencement dates in 2016 and an intervention period between 20 and 50 weeks. All sites complete the trial at the same time in 2017. The inclusion criteria allow for a purposive sample of both public and private hospitals that have higher-risk patient populations for healthcare-associated infections (HAIs). The primary outcome (objective one) is the monthly number of Staphylococcus aureus bacteraemias (SABs), Clostridium difficile infections (CDIs) and vancomycin resistant enterococci (VRE) infections, per 10,000 bed days. Secondary outcomes for objective one include the thoroughness of hospital cleaning assessed using fluorescent marker technology, the bio-burden of frequent touch surfaces post cleaning and changes in staff knowledge and attitudes about environmental cleaning. A cost-effectiveness analysis will determine the second key outcome (objective two): the incremental cost-effectiveness ratio from implementation of the cleaning bundle. The study uses the integrated Promoting Action on Research Implementation in Health Services (iPARIHS) framework to support the tailored implementation of the environmental cleaning bundle in each hospital. Discussion Evidence from the REACH trial will contribute to future policy and practice guidelines about hospital environmental cleaning. It will be used by healthcare leaders and clinicians to inform decision-making and implementation of best-practice infection prevention strategies to reduce HAIs in hospitals. Trial registration Australia New Zealand Clinical Trial Registry ACTRN12615000325​505

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent data indicate that levels of overweight and obesity are increasing at an alarming rate throughout the world. At a population level (and commonly to assess individual health risk), the prevalence of overweight and obesity is calculated using cut-offs of the Body Mass Index (BMI) derived from height and weight. Similarly, the BMI is also used to classify individuals and to provide a notional indication of potential health risk. It is likely that epidemiologic surveys that are reliant on BMI as a measure of adiposity will overestimate the number of individuals in the overweight (and slightly obese) categories. This tendency to misclassify individuals may be more pronounced in athletic populations or groups in which the proportion of more active individuals is higher. This differential is most pronounced in sports where it is advantageous to have a high BMI (but not necessarily high fatness). To illustrate this point we calculated the BMIs of international professional rugby players from the four teams involved in the semi-finals of the 2003 Rugby Union World Cup. According to the World Health Organisation (WHO) cut-offs for BMI, approximately 65% of the players were classified as overweight and approximately 25% as obese. These findings demonstrate that a high BMI is commonplace (and a potentially desirable attribute for sport performance) in professional rugby players. An unanswered question is what proportion of the wider population, classified as overweight (or obese) according to the BMI, is misclassified according to both fatness and health risk? It is evident that being overweight should not be an obstacle to a physically active lifestyle. Similarly, a reliance on BMI alone may misclassify a number of individuals who might otherwise have been automatically considered fat and/or unfit.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a singularly perturbed ordinary differential equation with non-smooth data is considered. The numerical method is generated by means of a Petrov-Galerkin finite element method with the piecewise-exponential test function and the piecewise-linear trial function. At the discontinuous point of the coefficient, a special technique is used. The method is shown to be first-order accurate and singular perturbation parameter uniform convergence. Finally, numerical results are presented, which are in agreement with theoretical results.