977 resultados para approximate KNN query


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Monte Carlo algorithms often aim to draw from a distribution π by simulating a Markov chain with transition kernel P such that π is invariant under P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace P by an approximation Pˆ. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ’close’ the chain given by the transition kernel Pˆ is to the chain given by P . We apply these results to several examples from spatial statistics and network analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the feasibility of using approximate Bayesian computation (ABC) to calibrate and evaluate complex individual-based models (IBMs). As ABC evolves, various versions are emerging, but here we only explore the most accessible version, rejection-ABC. Rejection-ABC involves running models a large number of times, with parameters drawn randomly from their prior distributions, and then retaining the simulations closest to the observations. Although well-established in some fields, whether ABC will work with ecological IBMs is still uncertain. Rejection-ABC was applied to an existing 14-parameter earthworm energy budget IBM for which the available data consist of body mass growth and cocoon production in four experiments. ABC was able to narrow the posterior distributions of seven parameters, estimating credible intervals for each. ABC’s accepted values produced slightly better fits than literature values do. The accuracy of the analysis was assessed using cross-validation and coverage, currently the best available tests. Of the seven unnarrowed parameters, ABC revealed that three were correlated with other parameters, while the remaining four were found to be not estimable given the data available. It is often desirable to compare models to see whether all component modules are necessary. Here we used ABC model selection to compare the full model with a simplified version which removed the earthworm’s movement and much of the energy budget. We are able to show that inclusion of the energy budget is necessary for a good fit to the data. We show how our methodology can inform future modelling cycles, and briefly discuss how more advanced versions of ABC may be applicable to IBMs. We conclude that ABC has the potential to represent uncertainty in model structure, parameters and predictions, and to embed the often complex process of optimizing an IBM’s structure and parameters within an established statistical framework, thereby making the process more transparent and objective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Approximate Bayesian computation (ABC) is a popular family of algorithms which perform approximate parameter inference when numerical evaluation of the likelihood function is not possible but data can be simulated from the model. They return a sample of parameter values which produce simulations close to the observed dataset. A standard approach is to reduce the simulated and observed datasets to vectors of summary statistics and accept when the difference between these is below a specified threshold. ABC can also be adapted to perform model choice. In this article, we present a new software package for R, abctools which provides methods for tuning ABC algorithms. This includes recent dimension reduction algorithms to tune the choice of summary statistics, and coverage methods to tune the choice of threshold. We provide several illustrations of these routines on applications taken from the ABC literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-linear methods for estimating variability in time-series are currently of widespread use. Among such methods are approximate entropy (ApEn) and sample approximate entropy (SampEn). The applicability of ApEn and SampEn in analyzing data is evident and their use is increasing. However, consistency is a point of concern in these tools, i.e., the classification of the temporal organization of a data set might indicate a relative less ordered series in relation to another when the opposite is true. As highlighted by their proponents themselves, ApEn and SampEn might present incorrect results due to this lack of consistency. In this study, we present a method which gains consistency by using ApEn repeatedly in a wide range of combinations of window lengths and matching error tolerance. The tool is called volumetric approximate entropy, vApEn. We analyze nine artificially generated prototypical time-series with different degrees of temporal order (combinations of sine waves, logistic maps with different control parameter values, random noises). While ApEn/SampEn clearly fail to consistently identify the temporal order of the sequences, vApEn correctly do. In order to validate the tool we performed shuffled and surrogate data analysis. Statistical analysis confirmed the consistency of the method. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The exchange energy of an arbitrary collinear-spin many-body system in an external magnetic field is a functional of the spin-resolved charge and current densities, E(x)[n(up arrow), n(down arrow), j(up arrow), j(down arrow)]. Within the framework of density-functional theory (DFT), we show that the dependence of this functional on the four densities can be fully reconstructed from either of two extreme limits: a fully polarized system or a completely unpolarized system. Reconstruction from the limit of an unpolarized system yields a generalization of the Oliver-Perdew spin scaling relations from spin-DFT to current-DFT. Reconstruction from the limit of a fully polarized system is used to derive the high-field form of the local-spin-density approximation to current-DFT and to magnetic-field DFT.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, we introduce a necessary sequential Approximate-Karush-Kuhn-Tucker (AKKT) condition for a point to be a solution of a continuous variational inequality, and we prove its relation with the Approximate Gradient Projection condition (AGP) of Garciga-Otero and Svaiter. We also prove that a slight variation of the AKKT condition is sufficient for a convex problem, either for variational inequalities or optimization. Sequential necessary conditions are more suitable to iterative methods than usual punctual conditions relying on constraint qualifications. The AKKT property holds at a solution independently of the fulfillment of a constraint qualification, but when a weak one holds, we can guarantee the validity of the KKT conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Depolymerization of cellulose in homogeneous acidic medium is analyzed on the basis of autocatalytic model of hydrolysis with a positive feedback of acid production from the degraded biopolymer. The normalized number of scissions per cellulose chain, S(t)/nA degrees A = 1 - C(t)/C(0), follows a sigmoid behavior with reaction time t, and the cellulose concentration C(t) decreases exponentially with a linear and cubic time dependence, C(t) = C(0)exp[-at - bt (3)], where a and b are model parameters easier determined from data analysis.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador: