8 resultados para Data Streams Distribution

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data on the occurrence of species are widely used to inform the design of reserve networks. These data contain commission errors (when a species is mistakenly thought to be present) and omission errors (when a species is mistakenly thought to be absent), and the rates of the two types of error are inversely related. Point locality data can minimize commission errors, but those obtained from museum collections are generally sparse, suffer from substantial spatial bias and contain large omission errors. Geographic ranges generate large commission errors because they assume homogenous species distributions. Predicted distribution data make explicit inferences on species occurrence and their commission and omission errors depend on model structure, on the omission of variables that determine species distribution and on data resolution. Omission errors lead to identifying networks of areas for conservation action that are smaller than required and centred on known species occurrences, thus affecting the comprehensiveness, representativeness and efficiency of selected areas. Commission errors lead to selecting areas not relevant to conservation, thus affecting the representativeness and adequacy of reserve networks. Conservation plans should include an estimation of commission and omission errors in underlying species data and explicitly use this information to influence conservation planning outcomes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The apposition compound eyes of gonodactyloid stomatopods are divided into a ventral and a dorsal hemisphere by six equatorial rows of enlarged ommatidia, the mid-band (MB). Whereas the hemispheres are specialized for spatial vision, the MB consists of four dorsal rows of ommatidia specialized for colour vision and two ventral rows specialized for polarization vision. The eight retinula cell axons (RCAs) from each ommatidium project retinotopically onto one corresponding lamina cartridge, so that the three retinal data streams (spatial, colour and polarization) remain anatomically separated. This study investigates whether the retinal specializations are reflected in differences in the RCA arrangement within the corresponding lamina cartridges. We have found that, in all three eye regions, the seven short visual fibres (svfs) formed by retinula cells 1-7 (R1-R7) terminate at two distinct lamina levels, geometrically separating the terminals of photoreceptors sensitive to either orthogonal e-vector directions or different wavelengths of light. This arrangement is required for the establishment of spectral and polarization opponency mechanisms. The long visual fibres (lvfs) of the eighth retinula cells (R8) pass through the lamina and project retinotopically to the distal medulla externa. Differences between the three eye regions exist in the packing of svf terminals and in the branching patterns of the lvfs within the lamina. We hypothesize that the R8 cells of MB rows 1-4 are incorporated into the colour vision system formed by R1-R7, whereas the R8 cells of MB rows 5 and 6 form a separate neural channel from R1 to R7 for polarization processing.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.