158 resultados para Data-stream balancing
em University of Queensland eSpace - Australia
Resumo:
In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.
Resumo:
Online multimedia data needs to be encrypted for access control. To be capable of working on mobile devices such as pocket PC and mobile phones, lightweight video encryption algorithms should be proposed. The two major problems in these algorithms are that they are either not fast enough or unable to work on highly compressed data stream. In this paper, we proposed a new lightweight encryption algorithm based on Huffman error diffusion. It is a selective algorithm working on compressed data. By carefully choosing the most significant parts (MSP), high performance is achieved with proper security. Experimental results has proved the algorithm to be fast. secure: and compression-compatible.
Resumo:
Gauging data are available from numerous streams throughout Australia, and these data provide a basis for historical analysis of geomorphic change in stream channels in response to both natural phenomena and human activities. We present a simple method for analysis of these data, and a briefcase study of an application to channel change in the Tully River, in the humid tropics of north Queensland. The analysis suggests that this channel has narrowed and deepened, rather than aggraded: channel aggradation was expected, given the intensification of land use in the catchment, upstream of the gauging station. Limitations of the method relate to the time periods over which stream gauging occurred; the spatial patterns of stream gauging sites; the quality and consistency of data collection; and the availability of concurrent land-use histories on which to base the interpretation of the channel changes.
Resumo:
Chemical engineering education is challenged around the world by demands and rapid changes encompassing a wide range of technical and social drivers. Graduates must be prepared for practice in increasingly diverse workplace environments in which generic or transferable attributes such as communication and teamwork together with technical excellence are mandated by prospective employers and society at large. If academe is to successfully deliver on these graduate attributes, effective curriculum design needs to include appropriate educational processes as well as course content. Conventional teacher centred approaches, stand-alone courses and retro-fitted remedial modules have not delivered the desired outcomes. Development of the broader spectrum of attributes is more likely when students are engaged with realistic and relevant experiences that demand the integration and practice of these attributes in contexts that the students find meaningful. This paper describes and evaluates The University of Queensland's Project Centred Curriculum in Chemical Engineering (PCC), a programme-wide approach to meeting these requirements. PCC strategically integrates project-based learning with more traditional instruction. Data collected shows improved levels of student attainment of generic skills with institutional and nationally benchmarked indicators showing significant increases in student perceptions of teaching quality, and overall satisfaction with the undergraduate experience. Endorsements from Australian academic, professional and industry bodies also support the approach as more effectively aligning engineering education with professional practice requirements.
Resumo:
This paper describes an experiment in designing, implementing and testing a Transport layer cluster scheduling and dispatching architecture. The motivation for the experiment was the hypothesis that a Transport layer clustering solution may offer advantantages over the existing industry-standard Network layer and Data Link Layer approaches. The critical success factors initially established to guide and evaluate the experiment were reduced dispatcher work load, reduced dispatcher internal state memory requirements, distributed denial of service resilience, and cluster software design simplicity. The functional design stage of the experiment produced a Transport layer strategy for scheduling and load balancing based on the specification of two new TCP options. Implementation required the introduction of the newly specified TCP options into the Linux (2.4) kernel. The implementation produced an extended Linux Socket API to facilitate user-process access to the additional TCP capability. The testing stage of the experiment confirmed the operational efficiency of the solution.
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.
Resumo:
The final-year project for Mechanical & Space Engineering students at UQ often involves the design and flight testing of an experiment. This report describes the design and use of a simple data logger that should be suitable for collecting data from the students' flight experiments. The exercise here was taken as far as the construction of a prototype device that is suitable for ground-based testing, say, the static firing of a hybrid rocket motor.
Resumo:
A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.
Resumo:
This paper reports a comparative study of Australian and New Zealand leadership attributes, based on the GLOBE (Global Leadership and Organizational Behavior Effectiveness) program. Responses from 344 Australian managers and 184 New Zealand managers in three industries were analyzed using exploratory and confirmatory factor analysis. Results supported some of the etic leadership dimensions identified in the GLOBE study, but also found some emic dimensions of leadership for each country. An interesting finding of the study was that the New Zealand data fitted the Australian model, but not vice versa, suggesting asymmetric perceptions of leadership in the two countries.