980 resultados para Approximate dynamic programming
Resumo:
Optimal stochastic controller pushes the closed-loop behavior as close as possible to the desired one. The fully probabilistic design (FPD) uses probabilistic description of the desired closed loop and minimizes Kullback-Leibler divergence of the closed-loop description to the desired one. Practical exploitation of the fully probabilistic design control theory continues to be hindered by the computational complexities involved in numerically solving the associated stochastic dynamic programming problem. In particular very hard multivariate integration and an approximate interpolation of the involved multivariate functions. This paper proposes a new fully probabilistic contro algorithm that uses the adaptive critic methods to circumvent the need for explicitly evaluating the optimal value function, thereby dramatically reducing computational requirements. This is a main contribution of this short paper.
Resumo:
A novel approach of normal ECG recognition based on scale-space signal representation is proposed. The approach utilizes curvature scale-space signal representation used to match visual objects shapes previously and dynamic programming algorithm for matching CSS representations of ECG signals. Extraction and matching processes are fast and experimental results show that the approach is quite robust for preliminary normal ECG recognition.
Resumo:
Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) adaptive critic method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterized by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the critic network is then calculated and shown to be equal to the analytically derived correct value.
Resumo:
AMS subject classification: 68Q22, 90C90
Resumo:
This paper investigates the validity of a simplified equivalent reservoir representation of a multi-reservoir hydroelectric system for modelling its optimal operation for power maximization. This simplification, proposed by Arvanitidis and Rosing (IEEE Trans Power Appar Syst 89(2):319-325, 1970), imputes a potential energy equivalent reservoir with energy inflows and outflows. The hydroelectric system is also modelled for power maximization considering individual reservoir characteristics without simplifications. Both optimization models employed MINOS package for solution of the non-linear programming problems. A comparison between total optimized power generation over the planning horizon by the two methods shows that the equivalent reservoir is capable of producing satisfactory power estimates with less than 6% underestimation. The generation and total reservoir storage trajectories along the planning horizon obtained by equivalent reservoir method, however, presented significant discrepancies as compared to those found in the detailed modelling. This study is motivated by the fact that Brazilian generation system operations are based on the equivalent reservoir method as part of the power dispatch procedures. The potential energy equivalent reservoir is an alternative which eliminates problems with the dimensionality of state variables in a dynamic programming model.
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
Resources can be aggregated both within and between patches. In this article, we examine how aggregation at these different scales influences the behavior and performance of foragers. We developed an optimal foraging model of the foraging behavior of the parasitoid wasp Cotesia rubecula parasitizing the larvae of the cabbage butterfly Pieris rapae. The optimal behavior was found using stochastic dynamic programming. The most interesting and novel result is that the effect of resource aggregation within and between patches depends on the degree of aggregation both within and between patches as well as on the local host density in the occupied patch, but lifetime reproductive success depends only on aggregation within patches. Our findings have profound implications for the way in which we measure heterogeneity at different scales and model the response of organisms to spatial heterogeneity.
Resumo:
Dissertação apresentada para obtenção do grau de Doutor em Matemática na especialidade de Equações Diferenciais, pela Universidade Nova de Lisboa,Faculdade de Ciências e Tecnologia
Resumo:
We present an envelope theorem for establishing first-order conditions in decision problems involving continuous and discrete choices. Our theorem accommodates general dynamic programming problems, even with unbounded marginal utilities. And, unlike classical envelope theorems that focus only on differentiating value functions, we accommodate other endogenous functions such as default probabilities and interest rates. Our main technical ingredient is how we establish the differentiability of a function at a point: we sandwich the function between two differentiable functions from above and below. Our theory is widely applicable. In unsecured credit models, neither interest rates nor continuation values are globally differentiable. Nevertheless, we establish an Euler equation involving marginal prices and values. In adjustment cost models, we show that first-order conditions apply universally, even if optimal policies are not (S,s). Finally, we incorporate indivisible choices into a classic dynamic insurance analysis.
Resumo:
Avui en dia la biologia aporta grans quantitats de dades que només la informàtica pot tractar. Les aplicacions bioinformàtiques són la més important eina d’anàlisi i comparació que tenim per entendre la vida i aconseguir desxifrar aquestes dades. Aquest projecte centra el seu esforç en l’estudi de les aplicacions dedicades a l’alineament de seqüències genètiques, i més concretament a dos algoritmes, basats en programació dinàmica i òptims: el Needleman&Wunsch i el Smith&Waterman. Amb l’objectiu de millorar el rendiment d’aquests algoritmes per a alineaments de seqüències grans, proposem diferents versions d’implementació. Busquem millorar rendiments en temps i espai. Per a aconseguir millorar els resultats aprofitem el paral·lelisme. Els resultats dels anàlisis de les versions els comparem per obtenir les dades necessàries per valorar cost, guany i rendiment.
Resumo:
Classical treatments of problems of sequential mate choice assume that the distribution of the quality of potential mates is known a priori. This assumption, made for analytical purposes, may seem unrealistic, opposing empirical data as well as evolutionary arguments. Using stochastic dynamic programming, we develop a model that includes the possibility for searching individuals to learn about the distribution and in particular to update mean and variance during the search. In a constant environment, a priori knowledge of the parameter values brings strong benefits in both time needed to make a decision and average value of mate obtained. Knowing the variance yields more benefits than knowing the mean, and benefits increase with variance. However, the costs of learning become progressively lower as more time is available for choice. When parameter values differ between demes and/or searching periods, a strategy relying on fixed a priori information might lead to erroneous decisions, which confers advantages on the learning strategy. However, time for choice plays an important role as well: if a decision must be made rapidly, a fixed strategy may do better even when the fixed image does not coincide with the local parameter values. These results help in delineating the ecological-behavior context in which learning strategies may spread.
Resumo:
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms.This article first presents a series of experiments carried outwith two state-of-the-art methods for cover song identification.We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multipleevaluations of this one confirm a large increase in identificationaccuracy when comparing it with alternative state-of-the-artapproaches.
Resumo:
Revenue management practices often include overbooking capacity to account for customerswho make reservations but do not show up. In this paper, we consider the network revenuemanagement problem with no-shows and overbooking, where the show-up probabilities are specificto each product. No-show rates differ significantly by product (for instance, each itinerary andfare combination for an airline) as sale restrictions and the demand characteristics vary byproduct. However, models that consider no-show rates by each individual product are difficultto handle as the state-space in dynamic programming formulations (or the variable space inapproximations) increases significantly. In this paper, we propose a randomized linear program tojointly make the capacity control and overbooking decisions with product-specific no-shows. Weestablish that our formulation gives an upper bound on the optimal expected total profit andour upper bound is tighter than a deterministic linear programming upper bound that appearsin the existing literature. Furthermore, we show that our upper bound is asymptotically tightin a regime where the leg capacities and the expected demand is scaled linearly with the samerate. We also describe how the randomized linear program can be used to obtain a bid price controlpolicy. Computational experiments indicate that our approach is quite fast, able to scale to industrialproblems and can provide significant improvements over standard benchmarks.
Resumo:
We obtain a recursive formulation for a general class of contractingproblems involving incentive constraints. Under these constraints,the corresponding maximization (sup) problems fails to have arecursive solution. Our approach consists of studying the Lagrangian.We show that, under standard assumptions, the solution to theLagrangian is characterized by a recursive saddle point (infsup)functional equation, analogous to Bellman's equation. Our approachapplies to a large class of contractual problems. As examples, westudy the optimal policy in a model with intertemporal participationconstraints (which arise in models of default) and intertemporalcompetitive constraints (which arise in Ramsey equilibria).
Resumo:
This paper presents and estimates a dynamic choice model in the attribute space considering rational consumers. In light of the evidence of several state-dependence patterns, the standard attribute-based model is extended by considering a general utility function where pure inertia and pure variety-seeking behaviors can be explained in the model as particular linear cases. The dynamics of the model are fully characterized by standard dynamic programming techniques. The model presents a stationary consumption pattern that can be inertial, where the consumer only buys one product, or a variety-seeking one, where the consumer shifts among varied products.We run some simulations to analyze the consumption paths out of the steady state. Underthe hybrid utility assumption, the consumer behaves inertially among the unfamiliar brandsfor several periods, eventually switching to a variety-seeking behavior when the stationary levels are approached. An empirical analysis is run using scanner databases for three different product categories: fabric softener, saltine cracker, and catsup. Non-linear specifications provide the best fit of the data, as hybrid functional forms are found in all the product categories for most attributes and segments. These results reveal the statistical superiority of the non-linear structure and confirm the gradual trend to seek variety as the level of familiarity with the purchased items increases.