987 resultados para Brazilian critic
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Even though dynamic programming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. Approximate dynamic programming implemented with an Adaptive Critic (AC) neural network structure has evolved as a powerful alternative technique that obviates the need for excessive computations and storage requirements in solving optimal control problems. In this paper, an improvement to the AC architecture, called the �Single Network Adaptive Critic (SNAC)� is presented. This approach is applicable to a wide class of nonlinear systems where the optimal control (stationary) equation can be explicitly expressed in terms of the state and costate variables. The selection of this terminology is guided by the fact that it eliminates the use of one neural network (namely the action network) that is part of a typical dual network AC setup. As a consequence, the SNAC architecture offers three potential advantages: a simpler architecture, lesser computational load and elimination of the approximation error associated with the eliminated network. In order to demonstrate these benefits and the control synthesis technique using SNAC, two problems have been solved with the AC and SNAC approaches and their computational performances are compared. One of these problems is a real-life Micro-Electro-Mechanical-system (MEMS) problem, which demonstrates that the SNAC technique is applicable to complex engineering systems.
Resumo:
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.
Resumo:
An optimal control law for a general nonlinear system can be obtained by solving Hamilton-Jacobi-Bellman equation. However, it is difficult to obtain an analytical solution of this equation even for a moderately complex system. In this paper, we propose a continuoustime single network adaptive critic scheme for nonlinear control affine systems where the optimal cost-to-go function is approximated using a parametric positive semi-definite function. Unlike earlier approaches, a continuous-time weight update law is derived from the HJB equation. The stability of the system is analysed during the evolution of weights using Lyapunov theory. The effectiveness of the scheme is demonstrated through simulation examples.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
We develop a simulation based algorithm for finite horizon Markov decision processes with finite state and finite action space. Illustrative numerical experiments with the proposed algorithm are shown for problems in flow control of communication networks and capacity switching in semiconductor fabrication.
Resumo:
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner
Resumo:
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
Resumo:
To combine the advantages of both stability and optimality-based designs, a single network adaptive critic (SNAC) aided nonlinear dynamic inversion approach is presented in this paper. Here, the gains of a dynamic inversion controller are selected in such a way that the resulting controller behaves very close to a pre-synthesized SNAC controller in the output regulation sense. Because SNAC is based on optimal control theory, it makes the dynamic inversion controller operate nearly optimal. More important, it retains the two major benefits of dynamic inversion, namely (i) a closed-form expression of the controller and (ii) easy scalability to command tracking applications without knowing the reference commands a priori. An extended architecture is also presented in this paper that adapts online to system modeling and inversion errors, as well as reduced control effectiveness, thereby leading to enhanced robustness. The strengths of this hybrid method of applying SNAC to optimize an nonlinear dynamic inversion controller is demonstrated by considering a benchmark problem in robotics, that is, a two-link robotic manipulator system. Copyright (C) 2013 John Wiley & Sons, Ltd.
Resumo:
Analisa os aspectos da estrutura política brasileira que contribuem para falhas nos processos decisórios e ineficácia nos resultados finais de políticas de governo, com ênfase nos problemas institucionais do poder legislativo. Aborda os seguintes tópicos: fragmentação do sistema partidário, infidelidade partidária, modelo de federalismo, preenchimento de cargos públicos, larga utilização de medidas provisórias, disfunções do processo legislativo.
Resumo:
Este trabalho parte da análise de entrevistas, artigos e demais textos de autoria de Glauber de Andrade Rocha entre os anos de 1959 e 1979, para entender a trajetória de desenvolvimento do cinema brasileiro e suas relações com a política e economia nacionais. Glauber Rocha, cineasta e um dos principais membros do Cinema Novo brasileiro, se tornou internacionalmente conhecido por sua cinegrafia marcada pela crítica social e por seus fins político-didáticos. Sua atuação como crítico de cinema e sua participação nos debates em torno das políticas públicas de fomento à indústria cinematográfica no Brasil lhe renderiam o título de líder do movimento cinemanovista e o afirmariam, ao longo dos anos sessenta e setenta, como um dos artistas-intelectuais de maior expressão política do país.
Resumo:
A presente tese tem por finalidade refletir sobre princípios pedagógico-filosóficos para o ensino da ciência na etapa intermediária da educação escolar. Considerando que tanto a prática educativa quanto a prática científica são práticas sociais mediadoras do processo de produção, e que, portanto, não se pode pensá-las fora de um método que as integre dialeticamente a partir de determinantes que estão dados no campo da economia política, procurou-se investigar aqui qual é o estatuto hoje reservado à ciência no quadro de valores introduzidos pela economia política neoliberal e os efeitos dessas mudanças sobre o que se prescreve para a formação científica no ensino médio brasileiro a partir da última reforma educacional (LDBEN/1996). Tratou-se de sublinhar aqui as conexões que foram se firmando entre os processos de universalização da forma-mercadoria e as mudanças introduzidas no regime de produção do conhecimento, que vai cada vez mais sendo moldado pelos objetivos e prescrições do capital. Tendo por referência o materialismo histórico-dialético, o objeto desta tese foi delineado de modo a refletir o processo de constituição da produção da ciência em dois âmbitos distintos: o da macro-política, presidido hegemonicamente pelas instituições ligadas ao capital, a partir da década de 1990, e o da relação epistemológica que subjaz à prática científica contemporânea, assinalando a co-relação entre estes processos e os seus nexos causais. Para dar contas destas relações, procedeu-se a uma investigação histórica e filosófica que teve por objetivo mostrar como o conceito de natureza cunhado pelas mãos dos primeiros cientistas no século XVII futura matriz da noção de ciências da natureza tal como ela é tomada hoje no currículo , assentado numa distinção fixa entre juízos de fato e juízos de valor, deve seu conteúdo a um processo que é finalmente econômico e social. Por meio desta crítica pode-se estabelecer os vínculos entre a economia política, o viés institucional da ciência e o universo da epistemologia. Concluiu-se que há uma relação necessária entre o novo registro institucional de produção do conhecimento, garantido por um estatuto regulatório afinado com as demandas do neoliberalismo, e o novo estatuto epistemológico, assinalado por uma ênfase nos pressupostos do realismo científico ingênuo. Esta relação se projeta sobre o ensino da ciência na forma de uma intensificação de seu teor tecnicista, e dentre as suas características destacamos duas: 1) o conceito de natureza, tomado no ensino das ciências como uma abstração des-historicizada; 2) o mito da unicidade científica, isto é, a crença de que só há uma ciência: a que formulará, numa linguagem única e inequívoca, a verdade do real. Para finalizar, fizemos alusão a dois programas educacionais que, a nosso ver, avançam rumo a novas formas de ensino na medida em que refletem a experiência de um grupo de educadores e alunos com os princípios da educação politécnica: o do Instituto de Educação Josué de Castro (IEJC/ITERRA) e o da Escola Politécnica de Saúde Joaquim Venâncio (EPSJV/Fiocruz).
Resumo:
The length-weight relationship (LWR) parameters of 23 small pelagic fish species (belonging to 13 families) from the south-southeast Brazilian Exclusive Economic Zone in 1996 and 1997 are presented. The b values varied between 2.72 and 3.53. The samples for this study were collected during hydroacoustic surveys covering an area of 700 000 square km.