692 resultados para critic


Relevância:

20.00% 20.00%

Publicador:

Resumo:

It's hard to be dispassionate about Reyner Banham. For me, and for the plethora of other people with strong opinions about Banham, his writing is compelling, and one’s connection to him as a figure quite personal. For me, frankly, he rocks. As a landscape architect, I gleaned most of my knowledge about Modern architecture from Banham. His Theory and Design in the First Machine Age, along with Rowe and Koetter’s Collage City and Venturi’s Complexity and Contradiction in Architecture were the most influential books in my library, by far. Later, as a budding “real scholar”, I was disappointed to find that, while these authors had serious credibility, the writings themselves were regarded as “polemical” – when in fact what I admired about them most was their ability and willingness to make rough groupings and gross generalizations, and to offer fickle opinions. It spoke to me of a real personal engagement and an active, participatory reading of the architectural culture they discussed. They were at their best in their witty, cutting, but generally pithy, creative prose, such as in Rowe’s extrapolation of the modern citizen as the latest “noble savage”, or Banham railing against conservative social advocates and their response to high density housing: “those who had just re-discovered ‘community’ in the slums would fear megastructure as much as any other kind of large-scale renewal program, and would see to it that the people were never ready.” Any reader of Banham will be able to find a gem that will relate, somehow, personally, to what they are doing right now. For Banham, it was all personal, and the gaps in his scholarship, rather, were the dispassionate places: “Such bias is essential – an unbiased historian is a pointless historian – because history is an essentially critical activity, a constant re-scrutiny and rearrangement of the profession.” Reyner Banham: Historian of the Immediate Future, Nigel Whiteley’s recent “intellectual biography” (the MIT Press, 2002), allowed me to revisit Banham’s passionate mode of criticism and to consider what his legacy might be. The book examines Banham’s body of work, grouped according to his various primary fascinations, as well as his relationship to contemporaneous theoretical movements, such as postmodernism. His mode of practice, as a kind of creative critic, is also considered in some depth. While there are points where the book delves into Banham’s personal life, on the whole Whiteley is very rigorous in considering and theorizing the work itself: more than 750 articles and twelve books. In academic terms, this is good practice. However, considering the entirely personal nature of Banham’s writing itself, this separation seems artificial. Banham, as he himself noted, “didn’t mind a gossip”, and often when reading the book I was curious about what was happening to him at the time. Banham’s was an amazing type of intellectual practice, and one that academics (a term he hated) could do well to learn from. While Whiteley spends a lot of time arguing for his practice to be regarded as such, and makes strong points about both the role of the critic, and the importance of journalism, rather than scholarly publishing, I found myself wondering what his study looked like. What books he had in his library. Did he smoke when he wrote? What sort of teaching load did he have? He is an inspiration to design writers and thinkers, and I, personally, wanted to know how he did it.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recently developed single network adaptive critic (SNAC) design has been used in this study to design a power system stabiliser (PSS) for enhancing the small-signal stability of power systems over a wide range of operating conditions. PSS design is formulated as a discrete non-linear quadratic regulator problem. SNAC is then used to solve the resulting discrete-time optimal control problem. SNAC uses only a single critic neural network instead of the action-critic dual network architecture of typical adaptive critic designs. SNAC eliminates the iterative training loops between the action and critic networks and greatly simplifies the training procedure. The performance of the proposed PSS has been tested on a single machine infinite bus test system for various system and loading conditions. The proposed stabiliser, which is relatively easier to synthesise, consistently outperformed stabilisers based on conventional lead-lag and linear quadratic regulator designs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon MDPs. We propose two parametrized 'actor-critic' algorithms to compute optimal policies for FH-MDPs. Both algorithms use the two-timescale stochastic approximation technique, thus simultaneously performing gradient search in the parametrized policy space (the 'actor') on a slower timescale and learning the policy gradient (the 'critic') via a faster recursion. This is in contrast to methods where critic recursions learn the cost-to-go proper. We show w.p 1 convergence to a set with the necessary condition for constrained optima. The proposed parameterization is for FHMDPs with compact action sets, although certain exceptions can be handled. Further, a third algorithm for stochastic control of stopping time processes is presented. We explain why current policy evaluation methods do not work as critic to the proposed actor recursion. Simulation results from flow-control in communication networks attest to the performance advantages of all three algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a Single Network Adaptive Critic (SNAC) based Power System Stabilizer (PSS) for enhancing the small-signal stability of power systems over a wide range of operating conditions. SNAC uses only a single critic neural network instead of the action-critic dual network architecture of typical adaptive critic designs. SNAC eliminates the iterative training loops between the action and critic networks and greatly simplifies the training procedure. The performance of the proposed PSS has been tested on a Single Machine Infinite Bus test system for various system and loading conditions. The proposed stabilizer, which is relatively easier to synthesize, consistently outperformed stabilizers based on conventional lead-lag and linear quadratic regulator designs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Even though dynamic programming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. Approximate dynamic programming implemented with an Adaptive Critic (AC) neural network structure has evolved as a powerful alternative technique that obviates the need for excessive computations and storage requirements in solving optimal control problems. In this paper, an improvement to the AC architecture, called the �Single Network Adaptive Critic (SNAC)� is presented. This approach is applicable to a wide class of nonlinear systems where the optimal control (stationary) equation can be explicitly expressed in terms of the state and costate variables. The selection of this terminology is guided by the fact that it eliminates the use of one neural network (namely the action network) that is part of a typical dual network AC setup. As a consequence, the SNAC architecture offers three potential advantages: a simpler architecture, lesser computational load and elimination of the approximation error associated with the eliminated network. In order to demonstrate these benefits and the control synthesis technique using SNAC, two problems have been solved with the AC and SNAC approaches and their computational performances are compared. One of these problems is a real-life Micro-Electro-Mechanical-system (MEMS) problem, which demonstrates that the SNAC technique is applicable to complex engineering systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An optimal control law for a general nonlinear system can be obtained by solving Hamilton-Jacobi-Bellman equation. However, it is difficult to obtain an analytical solution of this equation even for a moderately complex system. In this paper, we propose a continuoustime single network adaptive critic scheme for nonlinear control affine systems where the optimal cost-to-go function is approximated using a parametric positive semi-definite function. Unlike earlier approaches, a continuous-time weight update law is derived from the HJB equation. The stability of the system is analysed during the evolution of weights using Lyapunov theory. The effectiveness of the scheme is demonstrated through simulation examples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a simulation based algorithm for finite horizon Markov decision processes with finite state and finite action space. Illustrative numerical experiments with the proposed algorithm are shown for problems in flow control of communication networks and capacity switching in semiconductor fabrication.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To combine the advantages of both stability and optimality-based designs, a single network adaptive critic (SNAC) aided nonlinear dynamic inversion approach is presented in this paper. Here, the gains of a dynamic inversion controller are selected in such a way that the resulting controller behaves very close to a pre-synthesized SNAC controller in the output regulation sense. Because SNAC is based on optimal control theory, it makes the dynamic inversion controller operate nearly optimal. More important, it retains the two major benefits of dynamic inversion, namely (i) a closed-form expression of the controller and (ii) easy scalability to command tracking applications without knowing the reference commands a priori. An extended architecture is also presented in this paper that adapts online to system modeling and inversion errors, as well as reduced control effectiveness, thereby leading to enhanced robustness. The strengths of this hybrid method of applying SNAC to optimize an nonlinear dynamic inversion controller is demonstrated by considering a benchmark problem in robotics, that is, a two-link robotic manipulator system. Copyright (C) 2013 John Wiley & Sons, Ltd.