919 resultados para Stochastic dynamic programming (SDP)
Quality-optimization algorithm based on stochastic dynamic programming for MPEG DASH video streaming
Resumo:
In contrast to traditional push-based protocols, adaptive streaming techniques like Dynamic Adaptive Streaming over HTTP (DASH) fix attention on the client, who dynamically requests different-quality portions of the content to cope with a limited and variable bandwidth but aiming at maximizing the quality perceived by the user. Since DASH adaptation logic at the client is not covered by the standard, we propose a solution based on Stochastic Dynamic Programming (SDP) techniques to find the optimal request policies that guarantee the users' Quality of Experience (QoE). Our algorithm is evaluated in a simulated streaming session and is compared with other adaptation approaches. The results show that our proposal outperforms them in terms of QoE, requesting higher qualities on average.
Resumo:
The aim of this thesis is to price options on equity index futures with an application to standard options on S&P 500 futures traded on the Chicago Mercantile Exchange. Our methodology is based on stochastic dynamic programming, which can accommodate European as well as American options. The model accommodates dividends from the underlying asset. It also captures the optimal exercise strategy and the fair value of the option. This approach is an alternative to available numerical pricing methods such as binomial trees, finite differences, and ad-hoc numerical approximation techniques. Our numerical and empirical investigations demonstrate convergence, robustness, and efficiency. We use this methodology to value exchange-listed options. The European option premiums thus obtained are compared to Black's closed-form formula. They are accurate to four digits. The American option premiums also have a similar level of accuracy compared to premiums obtained using finite differences and binomial trees with a large number of time steps. The proposed model accounts for deterministic, seasonally varying dividend yield. In pricing futures options, we discover that what matters is the sum of the dividend yields over the life of the futures contract and not their distribution.
Resumo:
In this paper, we consider dynamic programming for the election timing in the majoritarian parliamentary system such as in Australia, where the government has a constitutional right to call an early election. This right can give the government an advantage to remain in power for as long as possible by calling an election, when its popularity is high. On the other hand, the opposition's natural objective is to gain power, and it will apply controls termed as "boosts" to reduce the chance of the government being re-elected by introducing policy and economic responses. In this paper, we explore equilibrium solutions to the government, and the opposition strategies in a political game using stochastic dynamic programming. Results are given in terms of the expected remaining life in power, call and boost probabilities at each time at any level of popularity.
Resumo:
With the recent development of advanced metering infrastructure, real-time pricing (RTP) scheme is anticipated to be introduced in future retail electricity market. This paper proposes an algorithm for a home energy management scheduler (HEMS) to reduce the cost of energy consumption using RTP. The proposed algorithm works in three subsequent phases namely real-time monitoring (RTM), stochastic scheduling (STS) and real-time control (RTC). In RTM phase, characteristics of available controllable appliances are monitored in real-time and stored in HEMS. In STS phase, HEMS computes an optimal policy using stochastic dynamic programming (SDP) to select a set of appliances to be controlled with an objective of the total cost of energy consumption in a house. Finally, in RTC phase, HEMS initiates the control of the selected appliances. The proposed HEMS is unique as it intrinsically considers uncertainties in RTP and power consumption pattern of various appliances. In RTM phase, appliances are categorized according to their characteristics to ease the control process, thereby minimizing the number of control commands issued by HEMS. Simulation results validate the proposed method for HEMS.
Resumo:
The quality of environmental decisions are gauged according to the management objectives of a conservation project. Management objectives are generally about maximising some quantifiable measure of system benefit, for instance population growth rate. They can also be defined in terms of learning about the system in question, in such a case actions would be chosen that maximise knowledge gain, for instance in experimental management sites. Learning about a system can also take place when managing practically. The adaptive management framework (Walters 1986) formally acknowledges this fact by evaluating learning in terms of how it will improve management of the system and therefore future system benefit. This is taken into account when ranking actions using stochastic dynamic programming (SDP). However, the benefits of any management action lie on a spectrum from pure system benefit, when there is nothing to be learned about the system, to pure knowledge gain. The current adaptive management framework does not permit management objectives to evaluate actions over the full range of this spectrum. By evaluating knowledge gain in units distinct to future system benefit this whole spectrum of management objectives can be unlocked. This paper outlines six decision making policies that differ across the spectrum of pure system benefit through to pure learning. The extensions to adaptive management presented allow specification of the relative importance of learning compared to system benefit in management objectives. Such an extension means practitioners can be more specific in the construction of conservation project objectives and be able to create policies for experimental management sites in the same framework as practical management sites.
Resumo:
Money is often a limiting factor in conservation, and attempting to conserve endangered species can be costly. Consequently, a framework for optimizing fiscally constrained conservation decisions for a single species is needed. In this paper we find the optimal budget allocation among isolated subpopulations of a threatened species to minimize local extinction probability. We solve the problem using stochastic dynamic programming, derive a useful and simple alternative guideline for allocating funds, and test its performance using forward simulation. The model considers subpopulations that persist in habitat patches of differing quality, which in our model is reflected in different relationships between money invested and extinction risk. We discover that, in most cases, subpopulations that are less efficient to manage should receive more money than those that are more efficient to manage, due to higher investment needed to reduce extinction risk. Our simple investment guideline performs almost as well as the exact optimal strategy. We illustrate our approach with a case study of the management of the Sumatran tiger, Panthera tigris sumatrae, in Kerinci Seblat National Park (KSNP), Indonesia. We find that different budgets should be allocated to the separate tiger subpopulations in KSNP. The subpopulation that is not at risk of extinction does not require any management investment. Based on the combination of risks of extinction and habitat quality, the optimal allocation for these particular tiger subpopulations is an unusual case: subpopulations that occur in higher-quality habitat (more efficient to manage) should receive more funds than the remaining subpopulation that is in lower-quality habitat. Because the yearly budget allocated to the KSNP for tiger conservation is small, to guarantee the persistence of all the subpopulations that are currently under threat we need to prioritize those that are easier to save. When allocating resources among subpopulations of a threatened species, the combined effects of differences in habitat quality, cost of action, and current subpopulation probability of extinction need to be integrated. We provide a useful guideline for allocating resources among isolated subpopulations of any threatened species. © 2010 by the Ecological Society of America.
Resumo:
Strategic searching for invasive pests presents a formidable challenge for conservation managers. Limited funding can necessitate choosing between surveying many sites cursorily, or focussing intensively on fewer sites. While existing knowledge may help to target more likely sites, e.g. with species distribution models (maps), this knowledge is not flawless and improving it also requires management investment. 2.In a rare example of trading-off action against knowledge gain, we combine search coverage and accuracy, and its future improvement, within a single optimisation framework. More specifically we examine under which circumstances managers should adopt one of two search-and-control strategies (cursory or focussed), and when they should divert funding to improving knowledge, making better predictive maps that benefit future searches. 3.We use a family of Receiver Operating Characteristic curves to reflect the quality of maps that direct search efforts. We demonstrate our framework by linking these to a logistic model of invasive spread such as that for the red imported fire ant Solenopsis invicta in south-east Queensland, Australia. 4.Cursory widespread searching is only optimal if the pest is already widespread or knowledge is poor, otherwise focussed searching exploiting the map is preferable. For longer management timeframes, eradication is more likely if funds are initially devoted to improving knowledge, even if this results in a short-term explosion of the pest population. 5.Synthesis and applications. By combining trade-offs between knowledge acquisition and utilization, managers can better focus - and justify - their spending to achieve optimal results in invasive control efforts. This framework can improve the efficiency of any ecological management that relies on predicting occurrence. © 2010 The Authors. Journal of Applied Ecology © 2010 British Ecological Society.
Resumo:
Relatively few studies have addressed water management and adaptation measures in the face of changing water balances due to climate change. The current work studies climate change impact on a multipurpose reservoir performance and derives adaptive policies for possible futurescenarios. The method developed in this work is illustrated with a case study of Hirakud reservoir on the Mahanadi river in Orissa, India,which is a multipurpose reservoir serving flood control, irrigation and power generation. Climate change effects on annual hydropower generation and four performance indices (reliability with respect to three reservoir functions, viz. hydropower, irrigation and flood control, resiliency, vulnerability and deficit ratio with respect to hydropower) are studied. Outputs from three general circulation models (GCMs) for three scenarios each are downscaled to monsoon streamflow in the Mahanadi river for two future time slices, 2045-65 and 2075-95. Increased irrigation demands, rule curves dictated by increased need for flood storage and downscaled projections of streamflow from the ensemble of GCMs and scenarios are used for projecting future hydrologic scenarios. It is seen that hydropower generation and reliability with respect to hydropower and irrigation are likely to show a decrease in future in most scenarios, whereas the deficit ratio and vulnerability are likely to increase as a result of climate change if the standard operating policy (SOP) using current rule curves for flood protection is employed. An optimal monthly operating policy is then derived using stochastic dynamic programming (SDP) as an adaptive policy for mitigating impacts of climate change on reservoir operation. The objective of this policy is to maximize reliabilities with respect to multiple reservoir functions of hydropower, irrigation and flood control. In variations to this adaptive policy, increasingly more weightage is given to the purpose of maximizing reliability with respect to hydropower for two extreme scenarios. It is seen that by marginally sacrificing reliability with respect to irrigation and flood control, hydropower reliability and generation can be increased for future scenarios. This suggests that reservoir rules for flood control may have to be revised in basins where climate change projects an increasing probability of droughts. However, it is also seen that power generation is unable to be restored to current levels, due in part to the large projected increases in irrigation demand. This suggests that future water balance deficits may limit the success of adaptive policy options. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
A real-time operational methodology has been developed for multipurpose reservoir operation for irrigation and hydropower generation with application to the Bhadra reservoir system in the state of Karnataka, India. The methodology consists of three phases of computer modelling. In the first phase, the optimal release policy for a given initial storage and inflow is determined using a stochastic dynamic programming (SDP) model. Streamflow forecasting using an adaptive AutoRegressive Integrated Moving Average (ARIMA) model constitutes the second phase. A real-time simulation model is developed in the third phase using the forecast inflows of phase 2 and the operating policy of phase 1. A comparison of the optimal monthly real-time operation with the historical operation demonstrates the relevance, applicability and the relative advantage of the proposed methodology.
Resumo:
An integrated model is developed, based on seasonal inputs of reservoir inflow and rainfall in the irrigated area, to determine the optimal reservoir release policies and irrigation allocations to multiple crops. The model is conceptually made up of two modules, Module 1 is an intraseasonal allocation model to maximize the sum of relative yields of all crops, for a given state of the system, using linear programming (LP). The module takes into account reservoir storage continuity, soil moisture balance, and crop root growth with time. Module 2 is a seasonal allocation model to derive the steady state reservoir operating policy using stochastic dynamic programming (SDP). Reservoir storage, seasonal inflow, and seasonal rainfall are the state variables in the SDP. The objective in SDP is to maximize the expected sum of relative yields of all crops in a year. The results of module 1 and the transition probabilities of seasonal inflow and rainfall form the input for module 2. The use of seasonal inputs coupled with the LP-SDP solution strategy in the present formulation facilitates in relaxing the limitations of an earlier study, while affecting additional improvements. The model is applied to an existing reservoir in Karnataka State, India.
Resumo:
In this paper we are concerned with finding the maximum throughput that a mobile ad hoc network can support. Even when nodes are stationary, the problem of determining the capacity region has long been known to be NP-hard. Mobility introduces an additional dimension of complexity because nodes now also have to decide when they should initiate route discovery. Since route discovery involves communication and computation overhead, it should not be invoked very often. On the other hand, mobility implies that routes are bound to become stale resulting in sub-optimal performance if routes are not updated. We attempt to gain some understanding of these effects by considering a simple one-dimensional network model. The simplicity of our model allows us to use stochastic dynamic programming (SDP) to find the maximum possible network throughput with ideal routing and medium access control (MAC) scheduling. Using the optimal value as a benchmark, we also propose and evaluate the performance of a simple threshold-based heuristic. Unlike the optimal policy which requires considerable state information, the heuristic is very simple to implement and is not overly sensitive to the threshold value used. We find empirical conditions for our heuristic to be near-optimal as well as network scenarios when our simple heuristic does not perform very well. We provide extensive numerical and simulation results for different parameter settings of our model.
Resumo:
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
Resumo:
The purpose of this expository arti le is to present a self- ontained overview of some results on the hara terization of the optimal value fun tion of a sto hasti target problem as (dis ontinuous) vis osity solution of a ertain dynami programming PDE and its appli ation to the problem of hedging ontingent laims in the presen e of portfolio onstraints and large investors
Resumo:
We describe a general technique for determining upper bounds on maximal values (or lower bounds on minimal costs) in stochastic dynamic programs. In this approach, we relax the nonanticipativity constraints that require decisions to depend only on the information available at the time a decision is made and impose a "penalty" that punishes violations of nonanticipativity. In applications, the hope is that this relaxed version of the problem will be simpler to solve than the original dynamic program. The upper bounds provided by this dual approach complement lower bounds on values that may be found by simulating with heuristic policies. We describe the theory underlying this dual approach and establish weak duality, strong duality, and complementary slackness results that are analogous to the duality results of linear programming. We also study properties of good penalties. Finally, we demonstrate the use of this dual approach in an adaptive inventory control problem with an unknown and changing demand distribution and in valuing options with stochastic volatilities and interest rates. These are complex problems of significant practical interest that are quite difficult to solve to optimality. In these examples, our dual approach requires relatively little additional computation and leads to tight bounds on the optimal values. © 2010 INFORMS.