108 resultados para approximate membership extraction
Resumo:
Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.
Resumo:
Approximate Bayesian computation (ABC) is a popular family of algorithms which perform approximate parameter inference when numerical evaluation of the likelihood function is not possible but data can be simulated from the model. They return a sample of parameter values which produce simulations close to the observed dataset. A standard approach is to reduce the simulated and observed datasets to vectors of summary statistics and accept when the difference between these is below a specified threshold. ABC can also be adapted to perform model choice. In this article, we present a new software package for R, abctools which provides methods for tuning ABC algorithms. This includes recent dimension reduction algorithms to tune the choice of summary statistics, and coverage methods to tune the choice of threshold. We provide several illustrations of these routines on applications taken from the ABC literature.
Resumo:
The tiger nut tuber of the Cyperus esculentus L. plant is an unusual storage system with similar amounts of starch and lipid. The extraction of its oil employing both mechanical pressing and aqueous enzymatic extraction (AEE) methods was investigated and an examination of the resulting products was carried out. The effects of particle size and moisture content of the tuber on the yield of tiger nut oil with pressing were initially studied. Smaller particles were found to enhance oil yields while a range of moisture content was observed to favour higher oil yields. When samples were first subjected to high pressures up to 700 MPa before pressing at 38 MPa there was no increase in the oil yields. Ground samples incubated with a mixture of α- Amylase, Alcalase, and Viscozyme (a mixture of cell wall degrading enzyme) as a pre-treatment, increased oil yield by pressing and 90% of oil was recovered as a result. When aqueous enzymatic extraction was carried out on ground samples, the use of α- Amylase, Alcalase, and Celluclast independently improved extraction oil yields compared to oil extraction without enzymes by 34.5, 23.4 and 14.7% respectively. A mixture of the three enzymes further augmented the oil yield and different operational factors were individually studied for their effects on the process. These include time, total mixed enzyme concentration, linear agitation speed, and solid-liquid ratio. The largest oil yields were obtained with a solid-liquid ratio of 1:6, mixed enzyme concentration of 1% (w/w) and 6 h incubation time although the longer time allowed for the formation of an emulsion. Using stationary samples during incubation surprisingly gave the highest oil yields, and this was observed to be as a result of gravity separation occurring during agitation. Furthermore, the use of high pressure processing up to 300 MPa as a pre-treatment enhanced oil yields but additional pressure increments had a detrimental effect. The quality of oils recovered from both mechanical and aqueous enzymatic extraction based on the percentage free fatty acid (% FFA) and peroxide values (PV) all reflected the good stabilities of the oils with the highest % FFA of 1.8 and PV of 1.7. The fatty acid profiles of all oils also remained unchanged. The level of tocopherols in oils were enhanced with both enzyme aided pressing (EAP) and high pressure processing before AEE. Analysis on the residual meals revealed DP 3 and DP 4 oligosaccharides present in EAP samples but these would require further assessment on their identity and quality.