8 resultados para Data compression

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

It is generally assumed that the variability of neuronal morphology has an important effect on both the connectivity and the activity of the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure–function relationships in the brain. We are developing computational tools to describe, generate, store and render large sets of three–dimensional neuronal structures in a format that is compact, quantitative, accurate and readily accessible to the neuroscientist. Single–cell neuroanatomy can be characterized quantitatively at several levels. In computer–aided neuronal tracing files, a dendritic tree is described as a series of cylinders, each represented by diameter, spatial coordinates and the connectivity to other cylinders in the tree. This ‘Cartesian’ description constitutes a completely accurate mapping of dendritic morphology but it bears little intuitive information for the neuroscientist. In contrast, a classical neuroanatomical analysis characterizes neuronal dendrites on the basis of the statistical distributions of morphological parameters, e.g. maximum branching order or bifurcation asymmetry. This description is intuitively more accessible, but it only yields information on the collective anatomy of a group of dendrites, i.e. it is not complete enough to provide a precise ‘blueprint’ of the original data. We are adopting a third, intermediate level of description, which consists of the algorithmic generation of neuronal structures within a certain morphological class based on a set of ‘fundamental’, measured parameters. This description is as intuitive as a classical neuroanatomical analysis (parameters have an intuitive interpretation), and as complete as a Cartesian file (the algorithms generate and display complete neurons). The advantages of the algorithmic description of neuronal structure are immense. If an algorithm can measure the values of a handful of parameters from an experimental database and generate virtual neurons whose anatomy is statistically indistinguishable from that of their real counterparts, a great deal of data compression and amplification can be achieved. Data compression results from the quantitative and complete description of thousands of neurons with a handful of statistical distributions of parameters. Data amplification is possible because, from a set of experimental neurons, many more virtual analogues can be generated. This approach could allow one, in principle, to create and store a neuroanatomical database containing data for an entire human brain in a personal computer. We are using two programs, L–NEURON and ARBORVITAE, to investigate systematically the potential of several different algorithms for the generation of virtual neurons. Using these programs, we have generated anatomically plausible virtual neurons for several morphological classes, including guinea pig cerebellar Purkinje cells and cat spinal cord motor neurons. These virtual neurons are stored in an online electronic archive of dendritic morphology. This process highlights the potential and the limitations of the ‘computational neuroanatomy’ strategy for neuroscience databases.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Written for communications and electronic engineers, technicians and students, this book begins with an introduction to data communications, and goes on to explain the concept of layered communications. Other chapters deal with physical communications channels, baseband digital transmission, analog data transmission, error control and data compression codes, physical layer standards, the data link layer, the higher layers of the protocol hierarchy, and local are networks (LANS). Finally, the book explores some likely future developments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The resolution of remotely sensed data is becoming increasingly fine, and there are now many sources of data with a pixel size of 1 m x 1 m. This produces huge amounts of data that have to be stored, processed and transmitted. For environmental applications this resolution possibly provides far more data than are needed: data overload. This poses the question: how much is too much? We have explored two resolutions of data-20 in pixel SPOT data and I in pixel Computerized Airborne Multispectral Imaging System (CAMIS) data from Fort A. P. Hill (Virginia, USA), using the variogram of geostatistics. For both we used the normalized difference vegetation index (NDVI). Three scales of spatial variation were identified in both the SPOT and 1 in data: there was some overlap at the intermediate spatial scales of about 150 in and of 500 m-600 in. We subsampled the I in data and scales of variation of about 30 in and of 300 in were identified consistently until the separation between pixel centroids was 15 in (or 1 in 225pixels). At this stage, spatial scales of about 100m and 600m were described, which suggested that only now was there a real difference in the amount of spatial information available from an environmental perspective. These latter were similar spatial scales to those identified from the SPOT image. We have also analysed I in CAMIS data from Fort Story (Virginia, USA) for comparison and the outcome is similar.:From these analyses it seems that a pixel size of 20m is adequate for many environmental applications, and that if more detail is required the higher resolution data could be sub-sampled to a 10m separation between pixel centroids without any serious loss of information. This reduces significantly the amount of data that needs to be stored, transmitted and analysed and has important implications for data compression.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Empirical orthogonal function (EOF) analysis is a powerful tool for data compression and dimensionality reduction used broadly in meteorology and oceanography. Often in the literature, EOF modes are interpreted individually, independent of other modes. In fact, it can be shown that no such attribution can generally be made. This review demonstrates that in general individual EOF modes (i) will not correspond to individual dynamical modes, (ii) will not correspond to individual kinematic degrees of freedom, (iii) will not be statistically independent of other EOF modes, and (iv) will be strongly influenced by the nonlocal requirement that modes maximize variance over the entire domain. The goal of this review is not to argue against the use of EOF analysis in meteorology and oceanography; rather, it is to demonstrate the care that must be taken in the interpretation of individual modes in order to distinguish the medium from the message.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate the “flux excess” effect, whereby open solar flux estimates from spacecraft increase with increasing heliocentric distance. We analyze the kinematic effect on these open solar flux estimates of large-scale longitudinal structure in the solar wind flow, with particular emphasis on correcting estimates made using data from near-Earth satellites. We show that scatter, but no net bias, is introduced by the kinematic “bunching effect” on sampling and that this is true for both compression and rarefaction regions. The observed flux excesses, as a function of heliocentric distance, are shown to be consistent with open solar flux estimates from solar magnetograms made using the potential field source surface method and are well explained by the kinematic effect of solar wind speed variations on the frozen-in heliospheric field. Applying this kinematic correction to the Omni-2 interplanetary data set shows that the open solar flux at solar minimum fell from an annual mean of 3.82 × 1016 Wb in 1987 to close to half that value (1.98 × 1016 Wb) in 2007, making the fall in the minimum value over the last two solar cycles considerably faster than the rise inferred from geomagnetic activity observations over four solar cycles in the first half of the 20th century.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of pulse compression techniques to improve the sensitivity of meteorological radars has become increasingly common in recent years. An unavoidable side-effect of such techniques is the formation of ‘range sidelobes’ which lead to spreading of information across several range gates. These artefacts are particularly troublesome in regions where there is a sharp gradient in the power backscattered to the antenna as a function of range. In this article we present a simple method for identifying and correcting range sidelobe artefacts. We make use of the fact that meteorological targets produce an echo which fluctuates at random, and that this echo, like a fingerprint, is unique to each range gate. By cross-correlating the echo time series from pairs of gates therefore we can identify whether information from one gate has spread into another, and hence flag regions of contamination. In addition we show that the correlation coefficients contain quantitative information about the fraction of power leaked from one range gate to another, and we propose a simple algorithm to correct the corrupted reflectivity profile.