999 resultados para POLICY ITERATION


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP`s) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuous-time PDMP in a feedback form.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper studies the average control problem of discrete-time Markov Decision Processes (MDPs for short) with general state space, Feller transition probabilities, and possibly non-compact control constraint sets A(x). Two hypotheses are considered: either the cost function c is strictly unbounded or the multifunctions A(r)(x) = {a is an element of A(x) : c(x, a) <= r} are upper-semicontinuous and compact-valued for each real r. For these two cases we provide new results for the existence of a solution to the average-cost optimality equality and inequality using the vanishing discount approach. We also study the convergence of the policy iteration approach under these conditions. It should be pointed out that we do not make any assumptions regarding the convergence and the continuity of the limit function generated by the sequence of relative difference of the alpha-discounted value functions and the Poisson equations as often encountered in the literature. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this study is to verify the dynamics between fiscal policy, measured by public debt, and monetary policy, measured by a reaction function of a central bank. Changes in monetary policies due to deviations from their targets always generate fiscal impacts. We examine two policy reaction functions: the first related to inflation targets and the second related to economic growth targets. We find that the condition for stable equilibrium is more restrictive in the first case than in the second. We then apply our simulation model to Brazil and United Kingdom and find that the equilibrium is unstable in the Brazilian case but stable in the UK case.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper emphasizes the important changes in Brazilian foreign policy after Luiz Inacio Lula da Silva took tip the power in 2002. The paper defends the idea that it is not possible to argue that there were deep changes in comparison to Cardoso's administration. However, evidence shows that new things are happening as regards the design of a more active and clear foreign action line which led to institutional changes and to more incisive multilateral paths. This results both from the political profile of the direct operators of foreign policy and the aims of lite presidential diplomacy, The hypothesis dealt with on this paper consists on the fact that Lula's administration has not fully broken with the old administration practices, however the aims of global and regional integration are being plotted more clearly and with a higher degree of activism. This becomes clear in three aspects of the Brazilian foreign policy: the institutional framework, the practice of multilateralism and the foreign policy towards the South, the three topics analyzed in this paper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: In a number of malaria endemic regions, tourists and travellers face a declining risk of travel associated malaria, in part due to successful malaria control. Many millions of visitors to these regions are recommended, via national and international policy, to use chemoprophylaxis which has a well recognized morbidity profile. To evaluate whether current malaria chemo-prophylactic policy for travellers is cost effective when adjusted for endemic transmission risk and duration of exposure. a framework, based on partial cost-benefit analysis was used Methods: Using a three component model combining a probability component, a cost component and a malaria risk component, the study estimated health costs avoided through use of chemoprophylaxis and costs of disease prevention (including adverse events and pre-travel advice for visits to five popular high and low malaria endemic regions) and malaria transmission risk using imported malaria cases and numbers of travellers to malarious countries. By calculating the minimal threshold malaria risk below which the economic costs of chemoprophylaxis are greater than the avoided health costs we were able to identify the point at which chemoprophylaxis would be economically rational. Results: The threshold incidence at which malaria chemoprophylaxis policy becomes cost effective for UK travellers is an accumulated risk of 1.13% assuming a given set of cost parameters. The period a travellers need to remain exposed to achieve this accumulated risk varied from 30 to more than 365 days, depending on the regions intensity of malaria transmission. Conclusions: The cost-benefit analysis identified that chemoprophylaxis use was not a cost-effective policy for travellers to Thailand or the Amazon region of Brazil, but was cost-effective for travel to West Africa and for those staying longer than 45 days in India and Indonesia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study presents a decision-making method for maintenance policy selection of power plants equipment. The method is based on risk analysis concepts. The method first step consists in identifying critical equipment both for power plant operational performance and availability based on risk concepts. The second step involves the proposal of a potential maintenance policy that could be applied to critical equipment in order to increase its availability. The costs associated with each potential maintenance policy must be estimated, including the maintenance costs and the cost of failure that measures the critical equipment failure consequences for the power plant operation. Once the failure probabilities and the costs of failures are estimated, a decision-making procedure is applied to select the best maintenance policy. The decision criterion is to minimize the equipment cost of failure, considering the costs and likelihood of occurrence of failure scenarios. The method is applied to the analysis of a lubrication oil system used in gas turbines journal bearings. The turbine has more than 150 MW nominal output, installed in an open cycle thermoelectric power plant. A design modification with the installation of a redundant oil pump is proposed for lubricating oil system availability improvement. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid deforestation in the Brazilian Amazon, caused by economic, social, and policy factors, has focused global and national attention on protecting this valuable forest resource. In response, Brazil reformed its federal forest laws in 2006, creating new regulatory, development, and incentive policy instruments and institutions. Federal forestry responsibilities are maintained within the ministry of the environment; its regulatory agency responsibilities are divided among three different branches of the agency; many powers are delegated to states and municipalities; and a new private concession system is being developed. These reforms offer promise to improve forest protection and management in Brazil but must overcome significant institutional and social resistance for success.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A framework for and overview of the key elements of language planning is presented covering status planning, corpus planning, language-in-education planning, prestige planning and critical approaches to language planning. Within each of these areas, key articles outlining important recent directions are discussed indicating the field’s new found sense of vitality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: This study used household survey data on the prevalence of child, parent and family variables to establish potential targets for a population-level intervention to strengthen parenting skills in the community. The goals of the intervention include decreasing child conduct problems, increasing parental self-efficacy, use of positive parenting strategies, decreasing coercive parenting and increasing help-seeking, social support and participation in positive parenting programmes. Methods: A total of 4010 parents with a child under the age of 12 years completed a statewide telephone survey on parenting. Results: One in three parents reported that their child had a behavioural or emotional problem in the previous 6 months. Furthermore, 9% of children aged 2–12 years meet criteria for oppositional defiant disorder. Parents who reported their child's behaviour to be difficult were more likely to perceive parenting as a negative experience (i.e. demanding, stressful and depressing). Parents with greatest difficulties were mothers without partners and who had low levels of confidence in their parenting roles. About 20% of parents reported being stressed and 5% reported being depressed in the 2 weeks prior to the survey. Parents with personal adjustment problems had lower levels of parenting confidence and their child was more difficult to manage. Only one in four parents had participated in a parent education programme. Conclusions: Implications for the setting of population-level goals and targets for strengthening parenting skills are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Except for a few large scale projects, language planners have tended to talk and argue among themselves rather than to see language policy development as an inherently political process. A comparison with a social policy example, taken from the United States, suggests that it is important to understand the problem and to develop solutions in the context of the political process, as this is where decisions will ultimately be made.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Knowledge is a product of human social systems and, therefore, the foundations of the knowledge-based economy are social and cultural. Communication is central to knowledge creation and diffusion, and Public Policy in Knowledge-Based Economies highlights specific social and cultural conditions that can enhance the communication, use and creation of knowledge in a society.The purpose of this book is to illustrate how these social and cultural conditions are identified and analysed through new conceptual frameworks. Such frameworks are necessary to penetrate the surface features of knowledge-based economies - science and technology - and disclose what drives such economies.This book will provide policymakers, analysts and academics with the fundamental tools needed for the development of policy in this little understood and emerging area.