129 resultados para Data Streams Distribution
em Queensland University of Technology - ePrints Archive
Resumo:
This paper presents a single pass algorithm for mining discriminative Itemsets in data streams using a novel data structure and the tilted-time window model. Discriminative Itemsets are defined as Itemsets that are frequent in one data stream and their frequency in that stream is much higher than the rest of the streams in the dataset. In order to deal with the data structure size, we propose a pruning process that results in the compact tree structure containing discriminative Itemsets. Empirical analysis shows the sound time and space complexity of the proposed method.
Resumo:
Technological advances have led to an influx of affordable hardware that supports sensing, computation and communication. This hardware is increasingly deployed in public and private spaces, tracking and aggregating a wealth of real-time environmental data. Although these technologies are the focus of several research areas, there is a lack of research dealing with the problem of making these capabilities accessible to everyday users. This thesis represents a first step towards developing systems that will allow users to leverage the available infrastructure and create custom tailored solutions. It explores how this notion can be utilized in the context of energy monitoring to improve conventional approaches. The project adopted a user-centered design process to inform the development of a flexible system for real-time data stream composition and visualization. This system features an extensible architecture and defines a unified API for heterogeneous data streams. Rather than displaying the data in a predetermined fashion, it makes this information available as building blocks that can be combined and shared. It is based on the insight that individual users have diverse information needs and presentation preferences. Therefore, it allows users to compose rich information displays, incorporating personally relevant data from an extensive information ecosystem. The prototype was evaluated in an exploratory study to observe its natural use in a real-world setting, gathering empirical usage statistics and conducting semi-structured interviews. The results show that a high degree of customization does not warrant sustained usage. Other factors were identified, yielding recommendations for increasing the impact on energy consumption.
Resumo:
Increasing penetration of photovoltaic (PV) as well as increasing peak load demand has resulted in poor voltage profile for some residential distribution networks. This paper proposes coordinated use of PV and Battery Energy Storage (BES) to address voltage rise and/or dip problems. The reactive capability of PV inverter combined with droop based BES system is evaluated for rural and urban scenarios (having different R/X ratios). Results show that reactive compensation from PV inverters alone is sufficient to maintain acceptable voltage profile in an urban scenario (low resistance feeder), whereas, coordinated PV and BES support is required for the rural scenario (high resistance feeder). Constant as well as variable droop based BES schemes are analyzed. The required BES sizing and associated cost to maintain the acceptable voltage profile under both schemes is presented. Uncertainties in PV generation and load are considered, with probabilistic estimation of PV generation and randomness in load modeled to characterize the effective utilization of BES. Actual PV generation data and distribution system network data is used to verify the efficacy of the proposed method.
Resumo:
This paper reviews the use of multi-agent systems to model the impacts of high levels of photovoltaic (PV) system penetration in distribution networks and presents some preliminary data obtained from the Perth Solar City high penetration PV trial. The Perth Solar City trial consists of a low voltage distribution feeder supplying 75 customers where 29 consumers have roof top photovoltaic systems. Data is collected from smart meters at each consumer premises, from data loggers at the transformer low voltage (LV) side and from a nearby distribution network SCADA measurement point on the high voltage side (HV) side of the transformer. The data will be used to progressively develop MAS models.
Resumo:
The effects of fish density distribution and effort distribution on the overall catchability coefficient are examined. Emphasis is also on how aggregation and effort distribution interact to affect overall catch rate [catch per unit effort (cpue)]. In particular, it is proposed to evaluate three indices, the catchability index, the knowledge parameter, and the aggregation index, to describe the effectiveness of targeting and the effects on overall catchability in the stock area. Analytical expressions are provided so that these indices can easily be calculated. The average of the cpue calculated from small units where fishing is random is a better index for measuring the stock abundance. The overall cpue, the ratio of lumped catch and effort, together with the average cpue, can be used to assess the effectiveness of targeting. The proposed methods are applied to the commercial catch and effort data from the Australian northern prawn fishery. The indices are obtained assuming a power law for the effort distribution as an approximation of targeting during the fishing operation. Targeting increased catchability in some areas by 10%, which may have important implications on management advice.
Resumo:
Amphibian is an 10’00’’ musical work which explores new musical interfaces and approaches to hybridising performance practices from the popular music, electronic dance music and computer music traditions. The work is designed to be presented in a range of contexts associated with the electro-acoustic, popular and classical music traditions. The work is for two performers using two synchronised laptops, an electric guitar and a custom designed gestural interface for vocal performers - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate the voice in real time through the capture of physical gestures via an array of sensors - pressure, distance, tilt - along with ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone. In this work, data is also exchanged between performers via a local wireless network, allowing performers to work with shared data streams. The duo employs the gestural conventions of guitarist and singer (i.e. 'a band' in a popular music context), but transform these sounds and gestures into new digital music. The gestural language of popular music is deliberately subverted and taken into a new context. The piece thus explores the nexus between the sonic and performative practices of electro acoustic music and intelligent electronic dance music (‘idm’). This work was situated in the research fields of new musical interfacing, interaction design, experimental music composition and performance. The contexts in which the research was conducted were live musical performance and studio music production. The work investigated new methods for musical interfacing, performance data mapping, hybrid performance and compositional practices in electronic music. The research methodology was practice-led. New insights were gained from the iterative experimental workshopping of gestural inputs, musical data mapping, inter-performer data exchange, software patch design, data and audio processing chains. In respect of interfacing, there were innovations in the design and implementation of a novel sensor-based gestural interface for singers, the e-Mic, one of the only existing gestural controllers for singers. This work explored the compositional potential of sharing real time performance data between performers and deployed novel methods for inter-performer data exchange and mapping. As regards stylistic and performance innovation, the work explored and demonstrated an approach to the hybridisation of the gestural and sonic language of popular music with recent ‘post-digital’ approaches to laptop based experimental music The development of the work was supported by an Australia Council Grant. Research findings have been disseminated via a range of international conference publications, recordings, radio interviews (ABC Classic FM), broadcasts, and performances at international events and festivals. The work was curated into the major Australian international festival, Liquid Architecture, and was selected by an international music jury (through blind peer review) for presentation at the International Computer Music Conference in Belfast, N. Ireland.
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Currently, the GNSS computing modes are of two classes: network-based data processing and user receiver-based processing. A GNSS reference receiver station essentially contributes raw measurement data in either the RINEX file format or as real-time data streams in the RTCM format. Very little computation is carried out by the reference station. The existing network-based processing modes, regardless of whether they are executed in real-time or post-processed modes, are centralised or sequential. This paper describes a distributed GNSS computing framework that incorporates three GNSS modes: reference station-based, user receiver-based and network-based data processing. Raw data streams from each GNSS reference receiver station are processed in a distributed manner, i.e., either at the station itself or at a hosting data server/processor, to generate station-based solutions, or reference receiver-specific parameters. These may include precise receiver clock, zenith tropospheric delay, differential code biases, ambiguity parameters, ionospheric delays, as well as line-of-sight information such as azimuth and elevation angles. Covariance information for estimated parameters may also be optionally provided. In such a mode the nearby precise point positioning (PPP) or real-time kinematic (RTK) users can directly use the corrections from all or some of the stations for real-time precise positioning via a data server. At the user receiver, PPP and RTK techniques are unified under the same observation models, and the distinction is how the user receiver software deals with corrections from the reference station solutions and the ambiguity estimation in the observation equations. Numerical tests demonstrate good convergence behaviour for differential code bias and ambiguity estimates derived individually with single reference stations. With station-based solutions from three reference stations within distances of 22–103 km the user receiver positioning results, with various schemes, show an accuracy improvement of the proposed station-augmented PPP and ambiguity-fixed PPP solutions with respect to the standard float PPP solutions without station augmentation and ambiguity resolutions. Overall, the proposed reference station-based GNSS computing mode can support PPP and RTK positioning services as a simpler alternative to the existing network-based RTK or regionally augmented PPP systems.
Resumo:
Autonomous underwater vehicles (AUVs) are becoming commonplace in the study of inshore coastal marine habitats. Combined with shipboard systems, scientists are able to make in-situ measurements of water column and benthic properties. In CSIRO, autonomous gliders are used to collect water column data, while surface vessels are used to collect bathymetry information through the use of swath mapping, bottom grabs, and towed video systems. Although these methods have provided good data coverage for coastal and deep waters beyond 50m, there has been an increasing need for autonomous in-situ sampling in waters less than 50m deep. In addition, the collection of benthic and water column data has been conducted separately, requiring extensive post-processing to combine data streams. As such, a new AUV was developed for in-situ observations of both benthic habitat and water column properties in shallow waters. This paper provides an overview of the Starbug X AUV system, its operational characteristics including vision-based navigation and oceanographic sensor integration.
Resumo:
As network capacity has increased over the past decade, individuals and organisations have found it increasingly appealing to make use of remote services in the form of service-oriented architectures and cloud computing services. Data processed by remote services, however, is no longer under the direct control of the individual or organisation that provided the data, leaving data owners at risk of data theft or misuse. This paper describes a model by which data owners can control the distribution and use of their data throughout a dynamic coalition of service providers using digital rights management technology. Our model allows a data owner to establish the trustworthiness of every member of a coalition employed to process data, and to communicate a machine-enforceable usage policy to every such member.
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
This review outlines current international patterns in prostate cancer incidence and mortality rates and survival, including recent trends and a discussion of the possible impact of prostate-specific antigen (PSA) testing on the observed data. Internationally, prostate cancer is the second most common cancer diagnosed among men (behind lung cancer), and is the sixth most common cause of cancer death among men. Prostate cancer is particularly prevalent in developed countries such as the United States and the Scandinavian countries, with about a six-fold difference between high-incidence and low-incidence countries. Interpretation of trends in incidence and survival are complicated by the increasing impact of PSA testing, particularly in more developed countries. As Western influences become more pronounced in less developed countries, prostate cancer incidence rates in those countries are tending to increase, even though the prevalence of PSA testing is relatively low. Larger proportions of younger men are being diagnosed with prostate cancer and living longer following diagnosis of prostate cancer, which has many implications for health systems. Decreasing mortality rates are becoming widespread among more developed countries, although it is not clear whether this is due to earlier diagnosis (PSA testing), improved treatment, or some combination of these or other factors.
Resumo:
In the past decade, the utilization of ambulance data to inform the prevalence of nonfatal heroin overdose has increased. These data can assist public health policymakers, law enforcement agencies, and health providers in planning and allocating resources. This study examined the 672 ambulance attendances at nonfatal heroin overdoses in Queensland, Australia, in 2000. Gender distribution showed a typical 70/30 male-to-female ratio. An equal number of persons with nonfatal heroin overdose were between 15 and 24 years of age and 25 and 34 years of age. Police were present in only 1 of 6 cases, and 28.1% of patients reported using drugs alone. Ambulance data are proving to be a valuable population-based resource for describing the incidence and characteristics of nonfatal heroin overdose episodes. Future studies could focus on the differences between nonfatal heroin overdose and fatal heroin overdose samples.