Biblioteca Digital

The need to estimate a particular quantile of a distribution is an important problem that frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semiautomatic surveillance analytics systems that detect abnormalities in close-circuit television footage using statistical models of low-level motion features. In this paper, we specifically address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We make the following several major contributions: 1) we highlight the limitations of approaches previously described in the literature that make them unsuitable for nonstationary streams; 2) we describe a novel principle for the utilization of the available storage space; 3) we introduce two novel algorithms that exploit the proposed principle in different ways; and 4) we present a comprehensive evaluation and analysis of the proposed algorithms and the existing methods in the literature on both synthetic data sets and three large real-world streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that both of the proposed methods are highly successful and vastly outperform the existing alternatives. We show that the better of the two algorithms (data-aligned histogram) exhibits far superior performance in comparison with the previously described methods, achieving more than 10 times lower estimate errors on real-world data, even when its available working memory is an order of magnitude smaller.

Veja mais

On the axiomatic approach to the maximum entropy principle of inference

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent axiomatic derivations of the maximum entropy principle from consistency conditions are critically examined. We show that proper application of consistency conditions alone allows a wider class of functionals, essentially of the form ∝ dx p(x)[p(x)/g(x)] s , for some real numbers, to be used for inductive inference and the commonly used form − ∝ dx p(x)ln[p(x)/g(x)] is only a particular case. The role of the prior densityg(x) is clarified. It is possible to regard it as a geometric factor, describing the coordinate system used and it does not represent information of the same kind as obtained by measurements on the system in the form of expectation values.

Veja mais

Maximum-entropy calculation of the electron density at 4 A resolution of Pf1 filamentous bacteriophage

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A 4 A electron-density map of Pf1 filamentous bacterial virus has been calculated from x-ray fiber diffraction data by using the maximum-entropy method. This method produces a map that is free of features due to noise in the data and enables incomplete isomorphous-derivative phase information to be supplemented by information about the nature of the solution. The map shows gently curved (banana-shaped) rods of density about 70 A long, oriented roughly parallel to the virion axis but slewing by about 1/6th turn while running from a radius of 28 A to one of 13 A. Within these rods, there is a helical periodicity with a pitch of 5 to 6 A. We interpret these rods to be the helical subunits of the virion. The position of strongly diffracted intensity on the x-ray fiber pattern shows that the basic helix of the virion is right handed and that neighboring nearly parallel protein helices cross one another in an unusual negative sense.

Veja mais

On the solution of bivariate population balance equations for aggregation: X-discretization of space for expansion and contraction of computational domain

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new structured discretization of 2D space, named X-discretization, is proposed to solve bivariate population balance equations using the framework of minimal internal consistency of discretization of Chakraborty and Kumar [2007, A new framework for solution of multidimensional population balance equations. Chem. Eng. Sci. 62, 4112-4125] for breakup and aggregation of particles. The 2D space of particle constituents (internal attributes) is discretized into bins by using arbitrarily spaced constant composition radial lines and constant mass lines of slope -1. The quadrilaterals are triangulated by using straight lines pointing towards the mean composition line. The monotonicity of the new discretization makes is quite easy to implement, like a rectangular grid but with significantly reduced numerical dispersion. We use the new discretization of space to automate the expansion and contraction of the computational domain for the aggregation process, corresponding to the formation of larger particles and the disappearance of smaller particles by adding and removing the constant mass lines at the boundaries. The results show that the predictions of particle size distribution on fixed X-grid are in better agreement with the analytical solution than those obtained with the earlier techniques. The simulations carried out with expansion and/or contraction of the computational domain as population evolves show that the proposed strategy of evolving the computational domain with the aggregation process brings down the computational effort quite substantially; larger the extent of evolution, greater is the reduction in computational effort. (C) 2011 Elsevier Ltd. All rights reserved.

Veja mais

On maximum entropy and minimum KL-divergence optimization by Grobner basis methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we study constrained maximum entropy and minimum divergence optimization problems, in the cases where integer valued sufficient statistics exists, using tools from computational commutative algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. We give an implicit description of maximum entropy models by embedding them in algebraic varieties for which we give a Grobner basis method to compute it. In the cases of minimum KL-divergence models we show that implicitization preserves specialization of prior distribution. This result leads us to a Grobner basis method to embed minimum KL-divergence models in algebraic varieties. (C) 2012 Elsevier Inc. All rights reserved.

Veja mais

Generative Maximum Entropy Learning for Multiclass Classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

Veja mais