967 resultados para Asymptotically optimal policy
Resumo:
Pigeons and other animals soon learn to wait (pause) after food delivery on periodic-food schedules before resuming the food-rewarded response. Under most conditions the steady-state duration of the average waiting time, t, is a linear function of the typical interfood interval. We describe three experiments designed to explore the limits of this process. In all experiments, t was associated with one key color and the subsequent food delay, T, with another. In the first experiment, we compared the relation between t (waiting time) and T (food delay) under two conditions: when T was held constant, and when T was an inverse function of t. The pigeons could maximize the rate of food delivery under the first condition by setting t to a consistently short value; optimal behavior under the second condition required a linear relation with unit slope between t and T. Despite this difference in optimal policy, the pigeons in both cases showed the same linear relation, with slope less than one, between t and T. This result was confirmed in a second parametric experiment that added a third condition, in which T + t was held constant. Linear waiting appears to be an obligatory rule for pigeons. In a third experiment we arranged for a multiplicative relation between t and T (positive feedback), and produced either very short or very long waiting times as predicted by a quasi-dynamic model in which waiting time is strongly determined by the just-preceding food delay.
Resumo:
In some supply chains, materials are ordered periodically according to local information. This paper investigates how to improve the performance of such a supply chain. Specifically, we consider a serial inventory system in which each stage implements a local reorder interval policy; i.e., each stage orders up to a local basestock level according to a fixed-interval schedule. A fixed cost is incurred for placing an order. Two improvement strategies are considered: (1) expanding the information flow by acquiring real-time demand information and (2) accelerating the material flow via flexible deliveries. The first strategy leads to a reorder interval policy with full information; the second strategy leads to a reorder point policy with local information. Both policies have been studied in the literature. Thus, to assess the benefit of these strategies, we analyze the local reorder interval policy. We develop a bottom-up recursion to evaluate the system cost and provide a method to obtain the optimal policy. A numerical study shows the following: Increasing the flexibility of deliveries lowers costs more than does expanding information flow; the fixed order costs and the system lead times are key drivers that determine the effectiveness of these improvement strategies. In addition, we find that using optimal batch sizes in the reorder point policy and demand rate to infer reorder intervals may lead to significant cost inefficiency. © 2010 INFORMS.
Resumo:
We assess different policies for reducing carbon dioxide emissions and promoting innovation and diffusion of renewable energy. We evaluate the relative performance of policies according to incentives provided for emissions reduction, efficiency, and other outcomes. We also assess how the nature of technological progress through learning and research and development (R&D), and the degree of knowledge spillovers, affects the desirability of different policies. Due to knowledge spillovers, optimal policy involves a portfolio of different instruments targeted at emissions, learning, and R&D. Although the relative cost of individual policies in achieving reductions depends on parameter values and the emissions target, in a numerical application to the U.S. electricity sector, the ranking is roughly as follows: (1) emissions price, (2) emissions performance standard, (3) fossil power tax, (4) renewables share requirement, (5) renewables subsidy, and (6) R&D subsidy. Nonetheless, an optimal portfolio of policies achieves emissions reductions at a significantly lower cost than any single policy. © 2007 Elsevier Inc. All rights reserved.
Resumo:
In this paper, we investigate the remanufacturing problem of pricing single-class used products (cores) in the face of random price-dependent returns and random demand. Specifically, we propose a dynamic pricing policy for the cores and then model the problem as a continuous-time Markov decision process. Our models are designed to address three objectives: finite horizon total cost minimization, infinite horizon discounted cost, and average cost minimization. Besides proving optimal policy uniqueness and establishing monotonicity results for the infinite horizon problem, we also characterize the structures of the optimal policies, which can greatly simplify the computational procedure. Finally, we use computational examples to assess the impacts of specific parameters on optimal price and reveal the benefits of a dynamic pricing policy. © 2013 Elsevier B.V. All rights reserved.
Resumo:
In remanufacturing, the supply of used products and the demand for remanufactured products are usually mismatched because of the great uncertainties on both sides. In this paper, we propose a dynamic pricing policy to balance this uncertain supply and demand. Specifically, we study a remanufacturer’s problem of pricing a single class of cores with random price-dependent returns and random demand for the remanufactured products with backlogs. We model this pricing task as a continuous-time Markov decision process, which addresses both the finite and infinite horizon problems, and provide managerial insights by analyzing the structural properties of the optimal policy. We then use several computational examples to illustrate the impacts of particular system parameters on pricing policy.
Resumo:
We consider the problem of self-healing in networks that are reconfigurable in the sense that they can change their topology during an attack. Our goal is to maintain connectivity in these networks, even in the presence of repeated adversarial node deletion, by carefully adding edges after each attack. We present a new algorithm, DASH, that provably ensures that: 1) the network stays connected even if an adversary deletes up to all nodes in the network; and 2) no node ever increases its degree by more than 2 log n, where n is the number of nodes initially in the network. DASH is fully distributed; adds new edges only among neighbors of deleted nodes; and has average latency and bandwidth costs that are at most logarithmic in n. DASH has these properties irrespective of the topology of the initial network, and is thus orthogonal and complementary to traditional topology- based approaches to defending against attack. We also prove lower-bounds showing that DASH is asymptotically optimal in terms of minimizing maximum degree increase over multiple attacks. Finally, we present empirical results on power-law graphs that show that DASH performs well in practice, and that it significantly outperforms naive algorithms in reducing maximum degree increase.
Resumo:
This paper concerns randomized leader election in synchronous distributed networks. A distributed leader election algorithm is presented for complete n-node networks that runs in O(1) rounds and (with high probability) uses only O(√ √nlog<sup>3/2</sup>n) messages to elect a unique leader (with high probability). When considering the "explicit" variant of leader election where eventually every node knows the identity of the leader, our algorithm yields the asymptotically optimal bounds of O(1) rounds and O(. n) messages. This algorithm is then extended to one solving leader election on any connected non-bipartite n-node graph G in O(τ(. G)) time and O(τ(G)n√log<sup>3/2</sup>n) messages, where τ(. G) is the mixing time of a random walk on G. The above result implies highly efficient (sublinear running time and messages) leader election algorithms for networks with small mixing times, such as expanders and hypercubes. In contrast, previous leader election algorithms had at least linear message complexity even in complete graphs. Moreover, super-linear message lower bounds are known for time-efficient deterministic leader election algorithms. Finally, we present an almost matching lower bound for randomized leader election, showing that Ω(n) messages are needed for any leader election algorithm that succeeds with probability at least 1/. e+. ε, for any small constant ε. >. 0. We view our results as a step towards understanding the randomized complexity of leader election in distributed networks.
Resumo:
We analyze a finite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to find an inventory policy and a pricing strategy maximizing expected profit over the finite horizon. We show that when the demand model is additive, the profit-to-go functions are k-concave and hence an (s,S,p) policy is optimal. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period. For more general demand functions, i.e., multiplicative plus additive functions, we demonstrate that the profit-to-go function is not necessarily k-concave and an (s,S,p) policy is not necessarily optimal. We introduce a new concept, the symmetric k-concave functions and apply it to provide a characterization of the optimal policy.
Resumo:
Traditional inventory models focus on risk-neutral decision makers, i.e., characterizing replenishment strategies that maximize expected total profit, or equivalently, minimize expected total cost over a planning horizon. In this paper, we propose a framework for incorporating risk aversion in multi-period inventory models as well as multi-period models that coordinate inventory and pricing strategies. In each case, we characterize the optimal policy for various measures of risk that have been commonly used in the finance literature. In particular, we show that the structure of the optimal policy for a decision maker with exponential utility functions is almost identical to the structure of the optimal risk-neutral inventory (and pricing) policies. Computational results demonstrate the importance of this approach not only to risk-averse decision makers, but also to risk-neutral decision makers with limited information on the demand distribution.
Resumo:
Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.
Resumo:
In this paper, we show that when the government is able to transfer wealth between generations, regressive policies are no longer optimal. The optimal educational policy can be decentralized through appropriate Pigouvian taxes and credit provision, is not regressive, and provides equality of opportunities in education (in the sense of irrelevance of parental income for the amount of education). Moreover, in the presence of default, the optimal policy can be implemented through income-contingent payments.
Resumo:
A simple exercise on growth and inflationary financing of public expenditures is presented in this note. In a parameterized overlapping generations mode1 where government expenses positivc1y affects the growth rate of human capital, steady state capital and output increase with inflation, reproducing the so called Tobin effect. For large inflation rates, however, government authorities cannot affect real variables and there are only nominal effects. It is also shown that the optimal policy implies some inflation but not growth maximization.
Resumo:
There are plenty of economic studies pointing out some requirements, like the inexistence of fiscal dominance, for inflation targeting framework be implemented in successful (credible) way. Essays on how public targets could be used in the absence of such requirements are unusual. In this papel' we appraise how central banks could use inflation targeting before soundness economic fundamentaIs have been achieved. First, based on concise framework, where confidence crises and imperfect information are neglected, we conclude that less ambitious (greater) target for inflation increases the credibility in the precommitment. Optimal target is higher than the one obtained using the Cukierman-Liviatan [7] model, where increasing credibility effect is not considered. Second, extending the model to make confidence crises possible, multiple equilibria solutions becomes possible too. In this case, to set greater targets for inflation may stimulate confidence crises and reduce the policymaker credibility. On the other hand, multiple (bad) equilibria may be avoided. The optimal target depends on the likelihood of each equilibrium be selected. Finally, when perturbing common knowledge uniqueness is restored even considering confidence crises, as in Morris-Shin[ 14]. The first result, i.e. less ambitious target for inflation increases credibility in precommitment, is also recovered. Adding a precise public signal, cOOl'dinated self-fulfilling actions and equilibrium multiplicity may still exist for some lack of common knowledge (as in Angeleto and Weming[l]). In this case, to set greater targets for inflation may stimulate confidence crisis again, reducing the policymaker credibility. From another aspect, multiple (bad) equilibria may be avoided. Optimal policy prescriptions depend on the likelihood of each equilibrium be selected. Results also indicate that more precise public information may open the door for bad equilibrium, contrary to the conventional wisdom that more central oank transparency is always good when considering inflation targeting framework.
Resumo:
Este trabalho apresenta uma solução para o problema de controle admissão de conexão e alocação dinâmica de recursos em redes IEEE 802.16 através da modelagem de um Processo Markoviano de Decisão (PMD) utilizando o conceito de degradação de largura de banda, o qual é baseado nos requisitos diferenciados de largura de banda das classes de serviço do IEEE 802.16. Para o critério de desempenho do PMD é feita a atribuição de diferentes retornos a cada classe de serviço, fazendo assim o tratamento diferenciado de cada fluxo. Nesse sentido, é possível avaliar a política ótima, obtida através de um algoritmo de iteração de valores, considerando aspectos como o nível de degradação médio das classes de serviço, utilização dos recursos e probabilidades de bloqueios de cada classe de serviço em relação à carga do sistema. Resultados obtidos mostram que o método de controle markoviano proposto é capaz de priorizar as classes de serviço consideradas mais relevantes para o sistema.
Resumo:
ABSTRACT: The femtocell concept aims to combine fixed-line broadband access with mobile telephony using the deployment of low-cost, low-power third and fourth generation base stations in the subscribers' homes. While the self-configuration of femtocells is a plus, it can limit the quality of service (QoS) for the users and reduce the efficiency of the network, based on outdated allocation parameters such as signal power level. To this end, this paper presents a proposal for optimized allocation of users on a co-channel macro-femto network, that enable self-configuration and public access, aiming to maximize the quality of service of applications and using more efficiently the available energy, seeking the concept of Green networking. Thus, when the user needs to connect to make a voice or a data call, the mobile phone has to decide which network to connect, using the information of number of connections, the QoS parameters (packet loss and throughput) and the signal power level of each network. For this purpose, the system is modeled as a Markov Decision Process, which is formulated to obtain an optimal policy that can be applied on the mobile phone. The policy created is flexible, allowing different analyzes, and adaptive to the specific characteristics defined by the telephone company. The results show that compared to traditional QoS approaches, the policy proposed here can improve energy efficiency by up to 10%.