Biblioteca Digital

802 resultados para Data stream mining

Bayesian inference of traffic volumes based on Bluetooth data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of the relationship between macroscopic traffic parameters, such as flow, speed and travel time, is essential to the understanding of the behaviour of freeway and arterial roads. However, the temporal dynamics of these parameters are difficult to model, especially for arterial roads, where the process of traffic change is driven by a variety of variables. The introduction of the Bluetooth technology into the transportation area has proven exceptionally useful for monitoring vehicular traffic, as it allows reliable estimation of travel times and traffic demands. In this work, we propose an approach based on Bayesian networks for analyzing and predicting the complex dynamics of flow or volume, based on travel time observations from Bluetooth sensors. The spatio-temporal relationship between volume and travel time is captured through a first-order transition model, and a univariate Gaussian sensor model. The two models are trained and tested on travel time and volume data, from an arterial link, collected over a period of six days. To reduce the computational costs of the inference tasks, volume is converted into a discrete variable. The discretization process is carried out through a Self-Organizing Map. Preliminary results show that a simple Bayesian network can effectively estimate and predict the complex temporal dynamics of arterial volumes from the travel time data. Not only is the model well suited to produce posterior distributions over single past, current and future states; but it also allows computing the estimations of joint distributions, over sequences of states. Furthermore, the Bayesian network can achieve excellent prediction, even when the stream of travel time observation is partially incomplete.

Slicing big data - Twitter, gambling and time sensitive information

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Health big data analytics : current perspectives, challenges and potential solutions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.

A data analytics case study assessing factors affecting pavement deflection values

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.

Mining specific features for acquiring user information needs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.

A coding approach to the multicast stream authentication problem

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study the multicast stream authentication problem when an opponent can drop, reorder and introduce data packets into the communication channel. In such a model, packet overhead and computing efficiency are two parameters to be taken into account when designing a multicast stream protocol. In this paper, we propose to use two families of erasure codes to deal with this problem, namely, rateless codes and maximum distance separable codes. Our constructions will have the following advantages. First, our packet overhead will be small. Second, the number of signature verifications to be performed at the receiver is O(1). Third, every receiver will be able to recover all the original data packets emitted by the sender despite losses and injection occurred during the transmission of information.

The mining boom and Western Australia's changing landscape : towards sustainability or business as usual?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The practices and public reputation of mining have been changing over time. In the past, mining operations frequently stood accused of being socially and environmentally disruptive, whereas mining today invests heavily in ‘socially responsible’ and ‘sustainable’ business practices. Changes such as these can be witnessed internationally as well as in places like Western Australia (WA), where the mining sector has matured into an economic pillar of the state, and indeed the nation in the context of the recent resources boom. This paper explores the role of mining in WA, presenting a multi-disciplinary perspective on the sector's contribution to sustainable development in the state. The perspectives offered here are drawn from community-based research and the associated academic literature as well as data derived from government sources and the not-for-profit sector. Findings suggest that despite noteworthy attitudinal and operational improvements in the industry, social, economic and environmental problem areas remain. As mining in WA is expected to grow in the years to come, these problem areas require the attention of business and government alike to ensure the long-term sustainability of development as well as people and place.

“Scary” heterosexualities in a rural Australian mining town

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper draws upon Hubbard's (1999, p. 57) term ‘scary heterosexualities,’ that is non-normative heterosexuality, in the context of the rural drawing on data from fieldwork in the remote Western Australian mining town of Kalgoorlie. Our focus is ‘the skimpie’ – a female barmaid who serves in her underwear and who, in both historical and contemporary times, is strongly associated with rural mining communities. Interviews with skimpies and local residents as well as participant observation reveal how potential fears and anxieties about skimpies are managed. We identify the discursive and spatial processes by which skimpie work is contained in Kalgoorlie so that the potential scariness ‘the skimpie’ represents to the rural is muted and buttressed in terms of a more conventional and less threatening rural heterosexuality.

Key derivation function based on stream ciphers

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A key derivation function (KDF) is a function that transforms secret non-uniformly random source material together with some public strings into one or more cryptographic keys. These cryptographic keys are used with a cryptographic algorithm for protecting electronic data during both transmission over insecure channels and storage. In this thesis, we propose a new method for constructing a generic stream cipher based key derivation function. We show that our proposed key derivation function based on stream ciphers is secure if the under-lying stream cipher is secure. We simulate instances of this stream cipher based key derivation function using three eStream nalist: Trivium, Sosemanuk and Rabbit. The simulation results show these stream cipher based key derivation functions offer efficiency advantages over the more commonly used key derivation functions based on block ciphers and hash functions.

Interpretation of association rules with multi-tier granule mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study was a step forward to improve the performance for discovering useful knowledge – especially, association rules in this study – in databases. The thesis proposed an approach to use granules instead of patterns to represent knowledge implicitly contained in relational databases; and multi-tier structure to interpret association rules in terms of granules. Association mappings were proposed for the construction of multi-tier structure. With these tools, association rules can be quickly assessed and meaningless association rules can be justified according to the association mappings. The experimental results indicated that the proposed approach is promising.

Object tracking via non-Euclidean geometry : a Grassmann approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.

A comparison of spatially explicit landscape representation methods and their relationship to stream condition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Biodiversity, water quality and ecosystem processes in streams are known to be influenced by the terrestrial landscape over a range of spatial and temporal scales. Lumped attributes (i.e. per cent land use) are often used to characterise the condition of the catchment; however, they are not spatially explicit and do not account for the disproportionate influence of land located near the stream or connected by overland flow. 2. We compared seven landscape representation metrics to determine whether accounting for the spatial proximity and hydrological effects of land use can be used to account for additional variability in indicators of stream ecosystem health. The landscape metrics included the following: a lumped metric, four inverse-distance-weighted (IDW) metrics based on distance to the stream or survey site and two modified IDW metrics that also accounted for the level of hydrologic activity (HA-IDW). Ecosystem health data were obtained from the Ecological Health Monitoring Programme in Southeast Queensland, Australia and included measures of fish, invertebrates, physicochemistry and nutrients collected during two seasons over 4 years. Linear models were fitted to the stream indicators and landscape metrics, by season, and compared using an information-theoretic approach. 3. Although no single metric was most suitable for modelling all stream indicators, lumped metrics rarely performed as well as other metric types. Metrics based on proximity to the stream (IDW and HA-IDW) were more suitable for modelling fish indicators, while the HA-IDW metric based on proximity to the survey site generally outperformed others for invertebrates, irrespective of season. There was consistent support for metrics based on proximity to the survey site (IDW or HA-IDW) for all physicochemical indicators during the dry season, while a HA-IDW metric based on proximity to the stream was suitable for five of the six physicochemical indicators in the post-wet season. Only one nutrient indicator was tested and results showed that catchment area had a significant effect on the relationship between land use metrics and algal stable isotope ratios in both seasons. 4. Spatially explicit methods of landscape representation can clearly improve the predictive ability of many empirical models currently used to study the relationship between landscape, habitat and stream condition. A comparison of different metrics may provide clues about causal pathways and mechanistic processes behind correlative relationships and could be used to target restoration efforts strategically.

Analysis of the needs of the East Coast cluster regional natural resource management bodies in relation to planning for climate change adaptation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Climate Change Adaptation for Natural Resource Management (NRM) in East Coast Australia Project aims to foster and support an effective “community of practice” for climate change adaptation within the East Coast Cluster NRM regions that will increase the capacity for adaptation to climate change through enhancements in knowledge and skills and through the establishment of long‐term collaborations. It is being delivered by six consortium research partners: * The University of Queensland (project lead) * Griffith University * University of the Sunshine Coast * CSIRO * New South Wales Office of Environment and Heritage * Queensland Department of Science, IT, Innovation and the Arts (Queensland Herbarium). The project relates to the East Coast Cluster, comprising the six coastal NRM regions and regional bodies between Rockhampton and Sydney: * Fitzroy Basin Association (FBA) * Burnett‐Mary Regional Group (BMRG) * SEQ Catchments (SEQC) * Northern Rivers Catchment Management Authority (CMA) (NRCMA) * Hunter‐Central Rivers CMA (HCRCMA) * Hawkesbury Nepean CMA (HNCMA). The aims of this report are to summarise the needs of the regional bodies in relation to NRM planning for climate change adaptation, and provide a basis for developing the detailed work plan for the research consortium. Two primary methods were used to identify the needs of the regional bodies: (1) document analysis of the existing NRM/ Catchment Action Plans (CAPs) and applications by the regional bodies for funding under Stream 1 of the Regional NRM Planning for Climate Change Fund, and; (2) a needs analysis workshop, held in May 2013 involving representatives from the research consortium partners and the regional bodies. The East Coast Cluster includes five of the ten largest significant urban areas in Australia, world heritage listed natural environments, significant agriculture, mining and extensive grazing. The three NSW CMAs have recently completed strategic level CAPs, with implementation plans to be finalised in 2014/2015. SEQC and FBA are beginning a review of their existing NRM Plans, to be completed in 2014 and 2015 respectively; while BMRG is aiming to produce a NRM and Climate Variability Action Strategy. The regional bodies will receive funding from the Australian Government through the Regional NRM Planning for Climate Change Fund (NRM Fund) to improve regional planning for climate change and help guide the location of carbon and biodiversity activities, including wildlife corridors. The bulk of the funding will be available for activities in 2013/2014, with smaller amounts available in subsequent years. Most regional bodies aim to have a large proportion of the planning work complete by the end of 2014. In addition, NSW CMAs are undergoing major structural change and will be incorporated into semi‐autonomous statutory Local Land Services bodies from 2014. Boundaries will align with local government boundaries and there will be significant change in staff and structures. The regional bodies in the cluster have a varying degree of climate knowledge. All plans recognise climate change as a key driver of change, but there are few specific actions or targets addressing climate change. Regional bodies also have varying capacity to analyse large volumes of spatial or modelling data. Due to the complex nature of natural resource management, all regional bodies work with key stakeholders (e.g. local government, industry groups, and community groups) to deliver NRM outcomes. Regional bodies therefore require project outputs that can be used directly in stakeholder engagement activities, and are likely to require some form of capacity building associated with each of the outputs to maximise uptake. Some of the immediate needs of the regional bodies are a summary of information or tools that are able to be used immediately; and a summary of the key outputs and milestone dates for the project, to facilitate alignment of planning activities with research outputs. A project framework is useful to show the linkages between research elements and the relevance of the research to the adaptive management cycle for NRM planning in which the regional bodies are engaged. A draft framework is proposed to stimulate and promote discussion on research elements and linkages; this will be refined during and following the development of the detailed project work plan. The regional bodies strongly emphasised the need to incorporate a shift to a systems based resilience approach to NRM planning, and that approach is included in the framework. The regional bodies identified that information on climate projections would be most useful at regional and subregional scale, to feed into scenario planning and impact analysis. Outputs should be ‘engagement ready’ and there is a need for capacity building to enable regional bodies to understand and use the projections in stakeholder engagement. There was interest in understanding the impacts of climate change projections on ecosystems (e.g. ecosystem shift), and the consequent impacts on the production of ecosystem services. It was emphasised that any modelling should be able to be used by the regional bodies with their stakeholders to allow for community input (i.e. no black box models). The online regrowth benefits tool was of great interest to the regional bodies, as spatial mapping of carbon farming opportunities would be relevant to their funding requirements. The NSW CMAs identified an interest in development of the tool for NSW vegetation types. Needs relating to socio‐economic information included understanding the socio‐economic determinants of carbon farming uptake and managing community expectations. A need was also identified to understand the vulnerability of industry groups as well as community to climate change impacts, and in particular understanding how changes in the flow of ecosystem services would interact with the vulnerability of these groups to impact on the linked ecologicalsocio‐economic system. Responses to disasters (particularly flooding and storm surge) and recovery responses were also identified as being of interest. An ecosystem services framework was highlighted as a useful approach to synthesising biophysical and socioeconomic information in the context of a systems based, resilience approach to NRM planning. A need was identified to develop processes to move towards such an approach to NRM planning from the current asset management approach. Examples of best practice in incorporating climate science into planning, using scenarios for stakeholder engagement in planning and processes for institutionalising learning were also identified as cross‐cutting needs. The over‐arching theme identified was the need for capacity building for the NRM bodies to best use the information available at any point in time. To this end a planners working group has been established to support the building of a network of informed and articulate NRM agents with knowledge of current climate science and capacity to use current tools to engage stakeholders in NRM planning for climate change adaptation. The planners working group would form the core group of the community of practice, with the broader group of stakeholders participating when activities aligned with their interests. In this way, it is anticipated that the Project will contribute to building capacity within the wider community to effectively plan for climate change adaptation.

Weaknesses in the initialisation process of the Common Scrambling Algorithm Stream Cipher

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Common Scrambling Algorithm Stream Cipher (CSASC) is a shift register based stream cipher designed to encrypt digital video broadcast. CSA-SC produces a pseudo-random binary sequence that is used to mask the contents of the transmission. In this paper, we analyse the initialisation process of the CSA-SC keystream generator and demonstrate weaknesses which lead to state convergence, slid pairs and shifted keystreams. As a result, the cipher may be vulnerable to distinguishing attacks, time-memory-data trade-off attacks or slide attacks.

Measuring patient flow variations : a cross-organisational process mining approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.

«
1
2
...
28
29
30
31
32
33
34
...
53
54
»