200 resultados para Data stream mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Council of Australian Governments (COAG) in 2003 gave in-principle approval to a best-practice report recommending a holistic approach to managing natural disasters in Australia incorporating a move from a traditional response-centric approach to a greater focus on mitigation, recovery and resilience with community well-being at the core. Since that time, there have been a range of complementary developments that have supported the COAG recommended approach. Developments have been administrative, legislative and technological, both, in reaction to the COAG initiative and resulting from regular natural disasters. This paper reviews the characteristics of the spatial data that is becoming increasingly available at Federal, state and regional jurisdictions with respect to their being fit for the purpose for disaster planning and mitigation and strengthening community resilience. In particular, Queensland foundation spatial data, which is increasingly accessible by the public under the provisions of the Right to Information Act 2009, Information Privacy Act 2009, and recent open data reform initiatives are evaluated. The Fitzroy River catchment and floodplain is used as a case study for the review undertaken. The catchment covers an area of 142,545 km2, the largest river catchment flowing to the eastern coast of Australia. The Fitzroy River basin experienced extensive flooding during the 2010–2011 Queensland floods. The basin is an area of important economic, environmental and heritage values and contains significant infrastructure critical for the mining and agricultural sectors, the two most important economic sectors for Queensland State. Consequently, the spatial datasets for this area play a critical role in disaster management and for protecting critical infrastructure essential for economic and community well-being. The foundation spatial datasets are assessed for disaster planning and mitigation purposes using data quality indicators such as resolution, accuracy, integrity, validity and audit trail.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Smart Card data from Automated Fare Collection system has been considered as a promising source of information for transit planning. However, literature has been limited to mining travel patterns from transit users and suggesting the potential of using this information. This paper proposes a method for mining spatial regular origins-destinations and temporal habitual travelling time from transit users. These travel regularity are discussed as being useful for transit planning. After reconstructing the travel itineraries, three levels of Density-Based Spatial Clustering of Application with Noise (DBSCAN) have been utilised to retrieve travel regularity of each of each frequent transit users. Analyses of passenger classifications and personal travel time variability estimation are performed as the examples of using travel regularity in transit planning. The methodology introduced in this paper is of interest for transit authorities in planning and managements

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mining equipment technology services sector is driven by a reactive and user-centered design approach, with a technological focus on incremental new product development. As Australia moves out of its sustained mining boom, companies need to rethink their strategic position, to become agile to stay relevant in an enigmatic market. This paper reports on the first five months on an embedded case study within an Australian, family-owned mining manufacturer. The first author is currently engaged in a longitudinal design led innovation project, as a catalyst to guide the company’s journey to design integration. The results find that design led innovation could act as a channel for highlighting and exploring company disconnections with the marketplace and offer a customer-centric catalyst for internal change. Data collected for this study is from 12 analysed semistructured interviews, a focus group and a reflective journal, over a five-month period. This paper explores limitations to design integration, and highlights opportunities to explore and leverage entrepreneurial characteristics to stay agile, broaden innovation and future-proof through the next commodity cycle in the mining industry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A people-to-people matching system (or a match-making system) refers to a system in which users join with the objective of meeting other users with the common need. Some real-world examples of these systems are employer-employee (in job search networks), mentor-student (in university social networks), consume-to-consumer (in marketplaces) and male-female (in an online dating network). The network underlying in these systems consists of two groups of users, and the relationships between users need to be captured for developing an efficient match-making system. Most of the existing studies utilize information either about each of the users in isolation or their interaction separately, and develop recommender systems using the one form of information only. It is imperative to understand the linkages among the users in the network and use them in developing a match-making system. This study utilizes several social network analysis methods such as graph theory, small world phenomenon, centrality analysis, density analysis to gain insight into the entities and their relationships present in this network. This paper also proposes a new type of graph called “attributed bipartite graph”. By using these analyses and the proposed type of graph, an efficient hybrid recommender system is developed which generates recommendation for new users as well as shows improvement in accuracy over the baseline methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A5/1 is a shift register based stream cipher which provides privacy for the GSM system. In this paper, we analyse the loading of the secret key and IV during the initialisation process of A5/1. We demonstrate the existence of weak key-IV pairs in the A5/1 cipher due to this loading process; these weak key-IV pairs may generate one, two or three registers containing all-zero values, which may lead in turn to weak keystream sequences. In the case where two or three registers contain only zeros, we describe a distinguisher which leads to a complete decryption of the affected messages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transit passenger market segmentation enables transit operators to target different classes of transit users to provide customized information and services. The Smart Card (SC) data, from Automated Fare Collection system, facilitates the understanding of multiday travel regularity of transit passengers, and can be used to segment them into identifiable classes of similar behaviors and needs. However, the use of SC data for market segmentation has attracted very limited attention in the literature. This paper proposes a novel methodology for mining spatial and temporal travel regularity from each individual passenger’s historical SC transactions and segments them into four segments of transit users. After reconstructing the travel itineraries from historical SC transactions, the paper adopts the Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm to mine travel regularity of each SC user. The travel regularity is then used to segment SC users by an a priori market segmentation approach. The methodology proposed in this paper assists transit operators to understand their passengers and provide them oriented information and services.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stream ciphers are symmetric key cryptosystems that are used commonly to provide confidentiality for a wide range of applications; such as mobile phone, pay TV and Internet data transmissions. This research examines the features and properties of the initialisation processes of existing stream ciphers to identify flaws and weaknesses, then presents recommendations to improve the security of future cipher designs. This research investigates well-known stream ciphers: A5/1, Sfinks and the Common Scrambling Algorithm Stream Cipher (CSA-SC). This research focused on the security of the initialisation process. The recommendations given are based on both the results in the literature and the work in this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Business process analysis and process mining, particularly within the health care domain, remain under-utilised. Applied research that employs such techniques to routinely collected, health care data enables stakeholders to empirically investigate care as it is delivered by different health providers. However, cross-organisational mining and the comparative analysis of processes present a set of unique challenges in terms of ensuring population and activity comparability, visualising the mined models and interpreting the results. Without addressing these issues, health providers will find it difficult to use process mining insights, and the potential benefits of evidence-based process improvement within health will remain unrealised. In this paper, we present a brief introduction on the nature of health care processes; a review of the process mining in health literature; and a case study conducted to explore and learn how health care data, and cross-organisational comparisons with process mining techniques may be approached. The case study applies process mining techniques to administrative and clinical data for patients who present with chest pain symptoms at one of four public hospitals in South Australia. We demonstrate an approach that provides detailed insights into clinical (quality of patient health) and fiscal (hospital budget) pressures in health care practice. We conclude by discussing the key lessons learned from our experience in conducting business process analysis and process mining based on the data from four different hospitals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of the relationship between macroscopic traffic parameters, such as flow, speed and travel time, is essential to the understanding of the behaviour of freeway and arterial roads. However, the temporal dynamics of these parameters are difficult to model, especially for arterial roads, where the process of traffic change is driven by a variety of variables. The introduction of the Bluetooth technology into the transportation area has proven exceptionally useful for monitoring vehicular traffic, as it allows reliable estimation of travel times and traffic demands. In this work, we propose an approach based on Bayesian networks for analyzing and predicting the complex dynamics of flow or volume, based on travel time observations from Bluetooth sensors. The spatio-temporal relationship between volume and travel time is captured through a first-order transition model, and a univariate Gaussian sensor model. The two models are trained and tested on travel time and volume data, from an arterial link, collected over a period of six days. To reduce the computational costs of the inference tasks, volume is converted into a discrete variable. The discretization process is carried out through a Self-Organizing Map. Preliminary results show that a simple Bayesian network can effectively estimate and predict the complex temporal dynamics of arterial volumes from the travel time data. Not only is the model well suited to produce posterior distributions over single past, current and future states; but it also allows computing the estimations of joint distributions, over sequences of states. Furthermore, the Bayesian network can achieve excellent prediction, even when the stream of travel time observation is partially incomplete.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.