976 resultados para Value function


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The network revenue management (RM) problem arises in airline, hotel, media,and other industries where the sale products use multiple resources. It can be formulatedas a stochastic dynamic program but the dynamic program is computationallyintractable because of an exponentially large state space, and a number of heuristicshave been proposed to approximate it. Notable amongst these -both for their revenueperformance, as well as their theoretically sound basis- are approximate dynamic programmingmethods that approximate the value function by basis functions (both affinefunctions as well as piecewise-linear functions have been proposed for network RM)and decomposition methods that relax the constraints of the dynamic program to solvesimpler dynamic programs (such as the Lagrangian relaxation methods). In this paperwe show that these two seemingly distinct approaches coincide for the network RMdynamic program, i.e., the piecewise-linear approximation method and the Lagrangianrelaxation method are one and the same.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The choice network revenue management model incorporates customer purchase behavioras a function of the offered products, and is the appropriate model for airline and hotel networkrevenue management, dynamic sales of bundles, and dynamic assortment optimization.The optimization problem is a stochastic dynamic program and is intractable. A certainty-equivalencerelaxation of the dynamic program, called the choice deterministic linear program(CDLP) is usually used to generate dyamic controls. Recently, a compact linear programmingformulation of this linear program was given for the multi-segment multinomial-logit (MNL)model of customer choice with non-overlapping consideration sets. Our objective is to obtaina tighter bound than this formulation while retaining the appealing properties of a compactlinear programming representation. To this end, it is natural to consider the affine relaxationof the dynamic program. We first show that the affine relaxation is NP-complete even for asingle-segment MNL model. Nevertheless, by analyzing the affine relaxation we derive a newcompact linear program that approximates the dynamic programming value function betterthan CDLP, provably between the CDLP value and the affine relaxation, and often comingclose to the latter in our numerical experiments. When the segment consideration sets overlap,we show that some strong equalities called product cuts developed for the CDLP remain validfor our new formulation. Finally we perform extensive numerical comparisons on the variousbounds to evaluate their performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We characterize the value function of maximizing the total discounted utility of dividend payments for a compound Poisson insurance risk model when strictly positive transaction costs are included, leading to an impulse control problem. We illustrate that well known simple strategies can be optimal in the case of exponential claim amounts. Finally we develop a numerical procedure to deal with general claim amount distributions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddllanguage by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations of the mdp where alternativeprobabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms, on the other hand, use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state with the greedy policy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We analyze the behavior of a nonrenewable resource cartel that anticipates being forced, at some date in the future, to break-up into an oligopolistic market in which its members will then have to compete as rivals. Under reasonable assumptions about the value function of the individual firms in the oligopolistic equilibrium that follows the break-up, we show that the cartel will then produce more over the same interval of time than it would if there were no threat of dissolution, and that its rate of extraction is a decreasing function of the cartel's life; that there are circumstances under which the cartel will attach a negative marginal value to the resource stocks, in which case the rate of depletion will be increasing over time during the cartel phase; that, for a given date of dissolution, the equilibrium stocks allocated to the post-cartel phase will increase as a function of the total initial stocks, whereas those allocated to the cartel phase will increase at first, but begin decreasing beyond some level of the total initial stocks.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les implications philosophiques de la Théorie de la Perspective de 1979, notamment celles qui concernent l’introduction d’une fonction de valeur sur les résultats et d’un coefficient de pondération sur les probabilités, n’ont à ce jour jamais été explorées. Le but de ce travail est de construire une théorie philosophique de la volonté à partir des résultats de la Théorie de la Perspective. Afin de comprendre comment cette théorie a pu être élaborée il faut étudier la Théorie de l’Utilité Attendue dont elle est l’aboutissement critique majeur, c’est-à-dire les axiomatisations de la décision de Ramsey (1926), von Neumann et Morgenstern (1947), et enfin Savage (1954), qui constituent les fondements de la théorie classique de la décision. C’est entre autres la critique – par l’économie et la psychologie cognitive – du principe d’indépendance, des axiomes d’ordonnancement et de transitivité qui a permis de faire émerger les éléments représentationnels subjectifs à partir desquels la Théorie de la Perspective a pu être élaborée. Ces critiques ont été menées par Allais (1953), Edwards (1954), Ellsberg (1961), et enfin Slovic et Lichtenstein (1968), l’étude de ces articles permet de comprendre comment s’est opéré le passage de la Théorie de l’Utilité Attendue, à la Théorie de la Perspective. À l’issue de ces analyses et de celle de la Théorie de la Perspective est introduite la notion de Système de Référence Décisionnel, qui est la généralisation naturelle des concepts de fonction de valeur et de coefficient de pondération issus de la Théorie de la Perspective. Ce système, dont le fonctionnement est parfois heuristique, sert à modéliser la prise de décision dans l’élément de la représentation, il s’articule autour de trois phases : la visée, l’édition et l’évaluation. À partir de cette structure est proposée une nouvelle typologie des décisions et une explication inédite des phénomènes d’akrasie et de procrastination fondée sur les concepts d’aversion au risque et de surévaluation du présent, tous deux issus de la Théorie de la Perspective.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En Colombia, después de casi dos décadas de la creación del régimen de cuentas privadas, se implementó una reforma donde se pasa de un sistema con un unico fondo a uno multifondos. Este tipo de reformas se vienen implementando en diferentes paises europeos y de Latino America. A la luz de las teorías clásicas dicha reforma trae mejoras en el bienestar de los individuos; sin embargo, la literatura sobre las nuevas teorías del comportamiento sugiere que los individuos no siempre toman decisiones que están de acuerdo con los supuestos de las teorías clásicas. Este trabajo estudia esta reforma en Colombia bajo algunas de las teorías del comportamiento financiero. Se encuentra que aún cuando el afiliado se quede en la opción default , o actúe con aversión a la pérdida, va a obtener valores en sus cuentas privadas mayores a las que obtendría con un sistema de un único fondo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En Colombia, después de casi dos décadas de la creación del régimen de cuentas privadas, se implementó una reforma donde se pasa de un sistema con un único fondo a uno multifondos. Este tipo de reformas se vienen implementando en diferentes países europeos y de Latino América. A la luz de las teorías clásicas dicha reforma trae mejoras en el bienestar de los individuos; sin embargo, la literatura sobre las nuevas teorías del comportamiento sugiere que los individuos no siempre toman decisiones que están de acuerdo con los supuestos de las teorías clásicas. Este trabajo estudia esta reforma en Colombia bajo algunas de las teorías del comportamiento financiero. Se encuentra que aún cuando el afiliado se quede en la opción default , o actúe con aversión a la pérdida, va a obtener valores en sus cuentas privadas mayores a las que obtendría con un sistema de un único fondo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The strategic equilibrium of an N-person cooperative game with transferable utility is a system composed of a cover collection of subsets of N and a set of extended imputations attainable through such equilibrium cover. The system describes a state of coalitional bargaining stability where every player has a bargaining alternative against any other player to support his corresponding equilibrium claim. Any coalition in the sable system may form and divide the characteristic value function of the coalition as prescribed by the equilibrium payoffs. If syndicates are allowed to form, a formed coalition may become a syndicate using the equilibrium payoffs as disagreement values in bargaining for a part of the complementary coalition incremental value to the grand coalition when formed. The emergent well known-constant sum derived game in partition function is described in terms of parameters that result from incumbent binding agreements. The strategic-equilibrium corresponding to the derived game gives an equal value claim to all players.  This surprising result is alternatively explained in terms of strategic-equilibrium based possible outcomes by a sequence of bargaining stages that when the binding agreements are in the right sequential order, von Neumann and Morgenstern (vN-M) non-discriminatory solutions emerge. In these solutions a preferred branch by a sufficient number of players is identified: the weaker players syndicate against the stronger player. This condition is referred to as the stronger player paradox.  A strategic alternative available to the stronger players to overcome the anticipated not desirable results is to voluntarily lower his bargaining equilibrium claim. In doing the original strategic equilibrium is modified and vN-M discriminatory solutions may occur, but also a different stronger player may emerge that has eventually will have to lower his equilibrium claim. A sequence of such measures converges to the equal opportunity for all vN-M solution anticipated by the strategic equilibrium of partition function derived game.    [298-words]

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we employ techniques from artificial intelligence such as reinforcement learning and agent based modeling as building blocks of a computational model for an economy based on conventions. First we model the interaction among firms in the private sector. These firms behave in an information environment based on conventions, meaning that a firm is likely to behave as its neighbors if it observes that their actions lead to a good pay off. On the other hand, we propose the use of reinforcement learning as a computational model for the role of the government in the economy, as the agent that determines the fiscal policy, and whose objective is to maximize the growth of the economy. We present the implementation of a simulator of the proposed model based on SWARM, that employs the SARSA(λ) algorithm combined with a multilayer perceptron as the function approximation for the action value function.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bellman's methods for dynamic optimization constitute the present mainstream in economics. However, some results associated with optimal controI can be particularly usefuI in certain problems. The purpose of this note is presenting such an example. The value function derived in Lucas' (2000) shopping-time economy in Infiation and Welfare need not be concave, leading this author to develop numerical analyses to determine if consumer utility is in fact maximized along the balanced path constructed from the first order conditions. We use Arrow's generalization of Mangasarian's results in optimal control theory and develop sufficient conditions for the problem. The analytical conclusions and the previous numerical results are compatible .

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work adds to Lucas (2000) by providing analytical solutions to two problems that are solved only numerically by the author. The first part uses a theorem in control theory (Arrow' s sufficiency theorem) to provide sufficiency conditions to characterize the optimum in a shopping-time problem where the value function need not be concave. In the original paper the optimality of the first-order condition is characterized only by means of a numerical analysis. The second part of the paper provides a closed-form solution to the general-equilibrium expression of the welfare costs of inflation when the money demand is double logarithmic. This closed-form solution allows for the precise calculation of the difference between the general-equilibrium and Bailey's partial-equilibrium estimates of the welfare losses due to inflation. Again, in Lucas's original paper, the solution to the general-equilibrium-case underlying nonlinear differential equation is done only numerically, and the posterior assertion that the general-equilibrium welfare figures cannot be distinguished from those derived using Bailey's formula rely only on numerical simulations as well.