Biblioteca Digital

991 resultados para Data-cleaning

Bluetooth and Wi-Fi MAC address based crowd data collection and monitoring : benefits, challenges and enhancement

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper firstly presents the benefits and critical challenges on the use of Bluetooth and Wi-Fi for crowd data collection and monitoring. The major challenges include antenna characteristics, environment’s complexity and scanning features. Wi-Fi and Bluetooth are compared in this paper in terms of architecture, discovery time, popularity of use and signal strength. Type of antennas used and the environment’s complexity such as trees for outdoor and partitions for indoor spaces highly affect the scanning range. The aforementioned challenges are empirically evaluated by “real” experiments using Bluetooth and Wi-Fi Scanners. The issues related to the antenna characteristics are also highlighted by experimenting with different antenna types. Novel scanning approaches including Overlapped Zones and Single Point Multi-Range detection methods will be then presented and verified by real-world tests. These novel techniques will be applied for location identification of the MAC IDs captured that can extract more information about people movement dynamics.

Measurement of energy expenditure of daily tasks among mothers of young children

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is currently some debate about whether the energy expenditure of domestic tasks is sufficient to confer health benefits. The aim of this study was therefore to measure the energy cost of five activities commonly undertaken by mothers of young children. Seven women with at least one child younger than five years of age spent 15 minutes in each of the following activities: sitting quietly, vacuum cleaning, washing windows, walking at moderate pace (approx 5km/hour), walking with a stroller and grocery shopping in a super-market. Each of the six 'trials' was completed on the same day, in random order. A carefully calibrated portable gas analyser was used to measure oxygen uptake during each activity, and data were converted to units of energy expenditure (METS). Vacuum cleaning, washing windows and walking with and without a stroller were found to be 'moderate intensity activities' (3 to 6 METs), but supermarket shopping did not reach this criterion. The MET values for these activities were similar to those reported in the Compendium of Physical Activities (Ainsworth et al., 2000). However, the energy expenditures of walking, both with and without a stroller, were higher than those reported in the Compendium. The findings suggest that some of the tasks associated with domestic caring duties are conducted at an intensity which is sufficient to confer some health benefit. Such benefits will only accrue however if the daily duration of these activities is sufficient to meet current guidelines.

Hunting and gathering : new imperatives in mapping and collecting student learning data to assure quality outcomes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Assurance of learning (AOL) is a quality enhancement and quality assurance process used in higher education. It involves a process of determining programme learning outcomes and standards, and systematically gathering evidence to measure students' performance on these. The systematic assessment of whole-of-programme outcomes provides a basis for curriculum development and management, continuous improvement, and accreditation. To better understand how AOL processes operate, a national study of university practices across one discipline area, business and management, was undertaken. To solicit data on AOL practice, interviews were undertaken with a sample of business school representatives (n = 25). Two key processes emerged: (1) mapping of graduate attributes and (2) collection of assurance data. External drivers such as professional accreditation and government legislation were the primary reasons for undertaking AOL outcomes but intrinsic motivators in relation to continuous improvement were also evident. The facilitation of academic commitment was achieved through an embedded approach to AOL by the majority of universities in the study. A sustainable and inclusive process of AOL was seen to support wider stakeholder engagement in the development of higher education learning outcomes.

Mortality following hip arthroplasty—inappropriate use of National Joint Registry (NJR) data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mortality following hip arthroplasty is affected by a large number of confounding variables each of which must be considered to enable valid interpretation. Relevant variables available from the 2011 NJR data set were included in the Cox model. Mortality rates in hip arthroplasty patients were lower than in the age-matched population across all hip types. Age at surgery, ASA grade, diagnosis, gender, provider type, hip type and lead surgeon grade all had a significant effect on mortality. Schemper's statistic showed that only 18.98% of the variation in mortality was explained by the variables available in the NJR data set. It is inappropriate to use NJR data to study an outcome affected by a multitude of confounding variables when these cannot be adequately accounted for in the available data set.

Methodology for developing real-time motorway traffic risk identification models using individual-vehicle data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most of existing motorway traffic safety studies using disaggregate traffic flow data aim at developing models for identifying real-time traffic risks by comparing pre-crash and non-crash conditions. One of serious shortcomings in those studies is that non-crash conditions are arbitrarily selected and hence, not representative, i.e. selected non-crash data might not be the right data comparable with pre-crash data; the non-crash/pre-crash ratio is arbitrarily decided and neglects the abundance of non-crash over pre-crash conditions; etc. Here, we present a methodology for developing a real-time MotorwaY Traffic Risk Identification Model (MyTRIM) using individual vehicle data, meteorological data, and crash data. Non-crash data are clustered into groups called traffic regimes. Thereafter, pre-crash data are classified into regimes to match with relevant non-crash data. Among totally eight traffic regimes obtained, four highly risky regimes were identified; three regime-based Risk Identification Models (RIM) with sufficient pre-crash data were developed. MyTRIM memorizes the latest risk evolution identified by RIM to predict near future risks. Traffic practitioners can decide MyTRIM’s memory size based on the trade-off between detection and false alarm rates. Decreasing the memory size from 5 to 1 precipitates the increase of detection rate from 65.0% to 100.0% and of false alarm rate from 0.21% to 3.68%. Moreover, critical factors in differentiating pre-crash and non-crash conditions are recognized and usable for developing preventive measures. MyTRIM can be used by practitioners in real-time as an independent tool to make online decision or integrated with existing traffic management systems.

The unfinished revolution : what is missing from the E&P industry’s move to “big data”

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One cannot help but be impressed by the inroads that digital oilfield technologies have made into the exploration and production (E&P) industry in the past decade. Today’s production systems can be monitored by “smart” sensors that allow engineers to observe almost any aspect of performance in real time. Our understanding of how reservoirs are behaving has improved considerably since the dawn of this revolution, and the industry has been able to move away from point answers to more holistic “big picture” integrated solutions. Indeed, the industry has already reaped the rewards of many of these kinds of investments. Many billions of dollars of value have been delivered by this heightened awareness of what is going on within our assets and the world around them (Van Den Berg et al. 2010).

Integrating the social and environmental determinants of physical activity : linking CATI and GIS data

Relevância:

20.00% 20.00%

Publicador:

Noisy Bluetooth traffic data?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traffic state estimation in an urban road network remains a challenge for traffic models and the question of how such a network performs remains a difficult one to answer for traffic operators. Lack of detailed traffic information has long restricted research in this area. The introduction of Bluetooth into the automotive world presented an alternative that has now developed to a stage where large-scale test-beds are becoming available, for traffic monitoring and model validation purposes. But how much confidence should we have in such data? This paper aims to give an overview of the usage of Bluetooth, primarily for the city-scale management of urban transport networks, and to encourage researchers and practitioners to take a more cautious look at what is currently understood as a mature technology for monitoring travellers in urban environments. We argue that the full value of this technology is yet to be realised, for the analytical accuracies peculiar to the data have still to be adequately resolved.

Using the Gini coefficient with BIOLOG substrate utilisation data to provide an alternative quantitative measure for comparing bacterial soil communities

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A measure quantifying unequal use of carbon sources, the Gini coefficient (G), has been developed to allow comparisons of the observed functional diversity of bacterial soil communities. This approach was applied to the analysis of substrate utilisation data obtained from using BIOLOG microtiter plates in a study which compared decomposition processes in two contrasting plant substrates in two different soils. The relevance of applying the Gini coefficient as a measure of observed functional diversity, for soil bacterial communities is evaluated against the Shannon index (H) and average well colour development (AWCD), a measure of the total microbial activity. Correlation analysis and analysis of variance of the experimental data show that the Gini coefficient, the Shannon index and AWCD provided similar information when used in isolation. However, analyses based on the Gini coefficient and the Shannon index, when total activity on the microtiter plates was maintained constant (i.e. AWCD as a covariate), indicate that additional information about the distribution of carbon sources being utilised can be obtained. We demonstrate that the Lorenz curve and its measure of inequality, the Gini coefficient, provides not only comparable information to AWCD and the Shannon index but when used together with AWCD encompasses measures of total microbial activity and absorbance inequality across all the carbon sources. This information is especially relevant for comparing the observed functional diversity of soil microbial communities.

Non-linear principal components analysis : an alternative method for finding patterns in environmental data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main purpose of this article is to gain an insight into the relationships between variables describing the environmental conditions of the Far Northern section of the Great Barrier Reef, Australia. Several of the variables describing these conditions had different measurement levels and often they had non-linear relationships. Using non-linear principal component analysis, it was possible to acquire an insight into these relationships. Furthermore, three geographical areas with unique environmental characteristics could be identified.

Connecting people to their resource consumption through real-time data visualisations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Combining human-computer interaction and urban informatics, this design research developed and tested novel interfaces offering users real-time feedback on their paper and energy consumption. Findings from deploying these interfaces in both domestic and office environments in Australia, the UK, and Ireland, will innovate future generations of resource monitoring technologies. The study draws conclusions with implications for government policy, the energy industry, and sustainability researchers.

eHealth-as-a-Service (eHaaS) : a data-driven decision making approach in Australian context

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A commitment in 2010 by the Australian Federal Government to spend $466.7 million dollars on the implementation of personally controlled electronic health records (PCEHR) heralded a shift to a more effective and safer patient centric eHealth system. However, deployment of the PCEHR has met with much criticism, emphasised by poor adoption rates over the first 12 months of operation. An indifferent response by the public and healthcare providers largely sceptical of its utility and safety speaks to the complex sociotechnical drivers and obstacles inherent in the embedding of large (national) scale eHealth projects. With government efforts to inflate consumer and practitioner engagement numbers giving rise to further consumer disillusionment, broader utilitarian opportunities available with the PCEHR are at risk. This paper discusses the implications of establishing the PCEHR as the cornerstone of a holistic eHealth strategy for the aggregation of longitudinal patient information. A viewpoint is offered that the real value in patient data lies not just in the collection of data but in the integration of this information into clinical processes within the framework of a commoditised data-driven approach. Consideration is given to the eHealth-as-a-Service (eHaaS) construct as a disruptive next step for co-ordinated individualised healthcare in the Australian context.

The analysis of large scale data taken from the world groundnut (Arachis hypogaea L.) germplasm collection I. Two-way quantitative data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data associated with germplasm collections are typically large and multivariate with a considerable number of descriptors measured on each of many accessions. Pattern analysis methods of clustering and ordination have been identified as techniques for statistically evaluating the available diversity in germplasm data. While used in many studies, the approaches have not dealt explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions). To consider the application of these techniques to germplasm evaluation data, 11328 accessions of groundnut (Arachis hypogaea L) from the International Research Institute for the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the rainy and post-rainy growing seasons were used. The ordination technique of principal component analysis was used to reduce the dimensionality of the germplasm data. The identification of phenotypically similar groups of accessions within large scale data via the computationally intensive hierarchical clustering techniques was not feasible and non-hierarchical techniques had to be used. Finite mixture models that maximise the likelihood of an accession belonging to a cluster were used to cluster the accessions in this collection. The patterns of response for the different growing seasons were found to be highly correlated. However, in relating the results to passport and other characterisation and evaluation descriptors, the observed patterns did not appear to be related to taxonomy or any other well known characteristics of groundnut.

The analysis of large scale data taken from the world groundnut (Arachis hypogaea L.) germplasm collection. II. Two-way data with mixed data types

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.

Taxonomic resolution and quantification of freshwater macroinvertebrate samples from an Australian dryland river : The benefits and costs of using species abundance data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In studies using macroinvertebrates as indicators for monitoring rivers and streams, species level identifications in comparison with lower resolution identifications can have greater information content and result in more reliable site classifications and better capacity to discriminate between sites, yet many such programmes identify specimens to the resolution of family rather than species. This is often because it is cheaper to obtain family level data than species level data. Choice of appropriate taxonomic resolution is a compromise between the cost of obtaining data at high taxonomic resolutions and the loss of information at lower resolutions. Optimum taxonomic resolution should be determined by the information required to address programme objectives. Costs saved in identifying macroinvertebrates to family level may not be justified if family level data can not give the answers required and expending the extra cost to obtain species level data may not be warranted if cheaper family level data retains sufficient information to meet objectives. We investigated the influence of taxonomic resolution and sample quantification (abundance vs. presence/absence) on the representation of aquatic macroinvertebrate species assemblage patterns and species richness estimates. The study was conducted in a physically harsh dryland river system (Condamine-Balonne River system, located in south-western Queensland, Australia), characterised by low macroinvertebrate diversity. Our 29 study sites covered a wide geographic range and a diversity of lotic conditions and this was reflected by differences between sites in macroinvertebrate assemblage composition and richness. The usefulness of expending the extra cost necessary to identify macroinvertebrates to species was quantified via the benefits this higher resolution data offered in its capacity to discriminate between sites and give accurate estimates of site species richness. We found that very little information (<6%) was lost by identifying taxa to family (or genus), as opposed to species, and that quantifying the abundance of taxa provided greater resolution for pattern interpretation than simply noting their presence/absence. Species richness was very well represented by genus, family and order richness, so that each of these could be used as surrogates of species richness if, for example, surveying to identify diversity hot-spots. It is suggested that sharing of common ecological responses among species within higher taxonomic units is the most plausible mechanism for the results. Based on a cost/benefit analysis, family level abundance data is recommended as the best resolution for resolving patterns in macroinvertebrate assemblages in this system. The relevance of these findings are discussed in the context of other low diversity, harsh, dryland river systems.

«
1
2
...
38
39
40
41
42
43
44
...
66
67
»