Biblioteca Digital

45 resultados para Data Streams Distribution

em University of Queensland eSpace - Australia

Approximate processing of massive continuous quantile queries over high-speed data streams

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Collaborative filtering on data streams

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.

Filtering duplicate items over distributed data streams

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.

Finding frequent itemsets in high-speed data streams

Relevância:

100.00% 100.00%

Publicador:

Tradeoffs of different types of species occurrence data for use in systematic conservation planning

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data on the occurrence of species are widely used to inform the design of reserve networks. These data contain commission errors (when a species is mistakenly thought to be present) and omission errors (when a species is mistakenly thought to be absent), and the rates of the two types of error are inversely related. Point locality data can minimize commission errors, but those obtained from museum collections are generally sparse, suffer from substantial spatial bias and contain large omission errors. Geographic ranges generate large commission errors because they assume homogenous species distributions. Predicted distribution data make explicit inferences on species occurrence and their commission and omission errors depend on model structure, on the omission of variables that determine species distribution and on data resolution. Omission errors lead to identifying networks of areas for conservation action that are smaller than required and centred on known species occurrences, thus affecting the comprehensiveness, representativeness and efficiency of selected areas. Commission errors lead to selecting areas not relevant to conservation, thus affecting the representativeness and adequacy of reserve networks. Conservation plans should include an estimation of commission and omission errors in underlying species data and explicitly use this information to influence conservation planning outcomes.

Space efficient quantile summary for constrained sliding windows on a data stream

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.

Parallel processing and image analysis in the eyes of mantis shrimps

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The compound eyes of mantis shrimps, a group of tropical marine crustaceans, incorporate principles of serial and parallel processing of visual information that may be applicable to artificial imaging systems. Their eyes include numerous specializations for analysis of the spectral and polarizational properties of light, and include more photoreceptor classes for analysis of ultraviolet light, color, and polarization than occur in any other known visual system. This is possible because receptors in different regions of the eye are anatomically diverse and incorporate unusual structural features, such as spectral filters, not seen in other compound eyes. Unlike eyes of most other animals, eyes of mantis shrimps must move to acquire some types of visual information and to integrate color and polarization with spatial vision. Information leaving the retina appears to be processed into numerous parallel data streams leading into the central nervous system, greatly reducing the analytical requirements at higher levels. Many of these unusual features of mantis shrimp vision may inspire new sensor designs for machine vision

Neuroarchitecture of the color and polarization vision system of the stomatopod haptosquilla

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The apposition compound eyes of stomatopod crustaceans contain a morphologically distinct eye region specialized for color and polarization vision, called the mid-band. In two stomatopod superfamilies, the mid-band is constructed from six rows of enlarged ommatidia containing multiple photoreceptor classes for spectral and polarization vision. The aim of this study was to begin to analyze the underlying neuroarchitecture, the design of which might reveal clues how the visual system interprets and communicates to deeper levels of the brain the multiple channels of information supplied by the retina. Reduced silver methods were used to investigate the axon pathways from different retinal regions to the lamina ganglionaris and from there to the medulla externa, the medulla interna, and the medulla terminalis. A swollen band of neuropil-here termed the accessory lobe-projects across the equator of. the lamina ganglionaris, the medulla externa, and the medulla interna and represents, structurally, the retina's mid-band. Serial semithin and ultrathin resin sections were used to reconstruct the projection of photoreceptor axons from the retina to the lamina ganglionaris. The eight axons originating from one ommatidium project to the same lamina cartridge. Seven short visual fibers end at two distinct levels in each lamina cartridge, thus geometrically separating the two channels of polarization and spectral information. The eighth visual fiber runs axially through the cartridge and terminates in the medulla externa. We conclude that spatial, color, and polarization information is divided into three parallel data streams from the retina to the central nervous system. (C) 2003 Wiley-Liss, Inc.

Photoreceptor projection and termination pattern in the lamina of gonodactyloid stomatopods (mantis shrimp)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The apposition compound eyes of gonodactyloid stomatopods are divided into a ventral and a dorsal hemisphere by six equatorial rows of enlarged ommatidia, the mid-band (MB). Whereas the hemispheres are specialized for spatial vision, the MB consists of four dorsal rows of ommatidia specialized for colour vision and two ventral rows specialized for polarization vision. The eight retinula cell axons (RCAs) from each ommatidium project retinotopically onto one corresponding lamina cartridge, so that the three retinal data streams (spatial, colour and polarization) remain anatomically separated. This study investigates whether the retinal specializations are reflected in differences in the RCA arrangement within the corresponding lamina cartridges. We have found that, in all three eye regions, the seven short visual fibres (svfs) formed by retinula cells 1-7 (R1-R7) terminate at two distinct lamina levels, geometrically separating the terminals of photoreceptors sensitive to either orthogonal e-vector directions or different wavelengths of light. This arrangement is required for the establishment of spectral and polarization opponency mechanisms. The long visual fibres (lvfs) of the eighth retinula cells (R8) pass through the lamina and project retinotopically to the distal medulla externa. Differences between the three eye regions exist in the packing of svf terminals and in the branching patterns of the lvfs within the lamina. We hypothesize that the R8 cells of MB rows 1-4 are incorporated into the colour vision system formed by R1-R7, whereas the R8 cells of MB rows 5 and 6 form a separate neural channel from R1 to R7 for polarization processing.

Privacy and security enhanced offline oblivious transfer for massive data distribution

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Unauthorized accesses to digital contents are serious threats to international security and informatics. We propose an offline oblivious data distribution framework that preserves the sender's security and the receiver's privacy using tamper-proof smart cards. This framework provides persistent content protections from digital piracy and promises private content consumption.

Estimating Income Inequality in China Using Grouped Data and the Generalized Beta Distribution

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There are two main types of data sources of income distributions in China: household survey data and grouped data. Household survey data are typically available for isolated years and individual provinces. In comparison, aggregate or grouped data are typically available more frequently and usually have national coverage. In principle, grouped data allow investigation of the change of inequality over longer, continuous periods of time, and the identification of patterns of inequality across broader regions. Nevertheless, a major limitation of grouped data is that only mean (average) income and income shares of quintile or decile groups of the population are reported. Directly using grouped data reported in this format is equivalent to assuming that all individuals in a quintile or decile group have the same income. This potentially distorts the estimate of inequality within each region. The aim of this paper is to apply an improved econometric method designed to use grouped data to study income inequality in China. A generalized beta distribution is employed to model income inequality in China at various levels and periods of time. The generalized beta distribution is more general and flexible than the lognormal distribution that has been used in past research, and also relaxes the assumption of a uniform distribution of income within quintile and decile groups of populations. The paper studies the nature and extent of inequality in rural and urban China over the period 1978 to 2002. Income inequality in the whole of China is then modeled using a mixture of province-specific distributions. The estimated results are used to study the trends in national inequality, and to discuss the empirical findings in the light of economic reforms, regional policies, and globalization of the Chinese economy.

Modelling pre-clearing vegetation distribution using GIS-integrated statistical, ecological and data models: A case study from the wet tropics of Northeastern Australia

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Distribution of transferrin saturation in an Australian population: Relevance to the early diagnosis of hemochromatosis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background & Aims: An elevated transferrin saturation is the earliest phenotypic abnormality in hereditary hemochromatosis. Determination of transferrin saturation remains the most useful noninvasive screening test for affected individuals, but there is debate as to the appropriate screening level. The aims of this study were to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals and to evaluate potential transferrin saturation screening levels. Methods: Statistical mixture modeling was applied to data from a survey of asymptomatic Australians to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals. To evaluate potential transferrin saturation screening levels, modeling results were compared with data from identified hemochromatosis heterozygotes and homozygotes. Results: After removal of hemochromatosis homozygotes, two populations of transferrin saturation were identified in asymptomatic Australians (P < 0.01). In men, 88.2% of the truncated sample had a lower mean transferrin saturation of 24.1%, whereas 11.8% had an increased mean transferrin saturation of 37.3%. Similar results were found in women, A transferrin saturation threshold of 45% identified 98% of homozygotes without misidentifying any normal individuals. Conclusions: The results confirm that hemochromatosis heterozygotes form a distinct transferrin saturation subpopulation and support the use of transferrin saturation as an inexpensive screening test for hemochromatosis. In practice, a fasting transferrin saturation of greater than or equal to 45% identifies virtually all affected homozygous subjects without necessitating further investigation of unaffected normal individuals.

Kinetic analysis of vascular marker distribution in perfused rat livers after regeneration following partial hepatectomy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background/Aims: Liver clearance models are based on information (or assumptions) on solute distribution kinetics within the microvasculatory system, The aim was to study albumin distribution kinetics in regenerated livers and in livers of normal adult rats, Methods: A novel mathematical model was used to evaluate the distribution space and the transit time dispersion of albumin in livers following regeneration after a two-thirds hepatectomy compared to livers of normal adult rats. Outflow curves of albumin measured after bolus injection in single-pass perfused rat livers were analyzed by correcting for the influence of catheters and fitting a long-tailed function to the data. Results: The curves were well described by the proposed model. The distribution volume and the transit time dispersion of albumin observed in the partial hepatectomy group were not significantly different from livers of normal adult rats. Conclusions: These findings suggest that the distribution space and the transit time dispersion of albumin (CV2) is relatively constant irrespective of the presence of rapid and extensive repair. This invariance of CV2 implies, as a first approximation, a similar degree of intrasinusoidal mixing, The finding that a sum of two (instead of one) inverse Gaussian densities is an appropriate empirical function to describe the outflow curve of vascular indicators has consequences for an improved prediction of hepatic solute extraction.

Flush air data system calibration using numerical simulation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of computational fluid dynamics simulations for calibrating a flush air data system is described, In particular, the flush air data system of the HYFLEX hypersonic vehicle is used as a case study. The HYFLEX air data system consists of nine pressure ports located flush with the vehicle nose surface, connected to onboard pressure transducers, After appropriate processing, surface pressure measurements can he converted into useful air data parameters. The processing algorithm requires an accurate pressure model, which relates air data parameters to the measured pressures. In the past, such pressure models have been calibrated using combinations of flight data, ground-based experimental results, and numerical simulation. We perform a calibration of the HYFLEX flush air data system using computational fluid dynamics simulations exclusively, The simulations are used to build an empirical pressure model that accurately describes the HYFLEX nose pressure distribution ol cr a range of flight conditions. We believe that computational fluid dynamics provides a quick and inexpensive way to calibrate the air data system and is applicable to a broad range of flight conditions, When tested with HYFLEX flight data, the calibrated system is found to work well. It predicts vehicle angle of attack and angle of sideslip to accuracy levels that generally satisfy flight control requirements. Dynamic pressure is predicted to within the resolution of the onboard inertial measurement unit. We find that wind-tunnel experiments and flight data are not necessary to accurately calibrate the HYFLEX flush air data system for hypersonic flight.

«
1
2
3
»