Biblioteca Digital

974 resultados para dance critic

Continuous-time Single Network Adaptive Critic for Regulator Design of Nonlinear Control Affine Systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An optimal control law for a general nonlinear system can be obtained by solving Hamilton-Jacobi-Bellman equation. However, it is difficult to obtain an analytical solution of this equation even for a moderately complex system. In this paper, we propose a continuoustime single network adaptive critic scheme for nonlinear control affine systems where the optimal cost-to-go function is approximated using a parametric positive semi-definite function. Unlike earlier approaches, a continuous-time weight update law is derived from the HJB equation. The stability of the system is analysed during the evolution of weights using Lyapunov theory. The effectiveness of the scheme is demonstrated through simulation examples.

Incremental natural-gradient actor-critic algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

An Actor-Critic Algorithm for Finite Horizon Markov Decision Processes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a simulation based algorithm for finite horizon Markov decision processes with finite state and finite action space. Illustrative numerical experiments with the proposed algorithm are shown for problems in flow control of communication networks and capacity switching in semiconductor fabrication.

A Simultaneous Deterministic Perturbation Actor-Critic Algorithm with an Application to Optimal Mortgage Refinancing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Single network adaptive critic aided dynamic inversion for optimal regulation and command tracking with online adaptation for enhanced robustness

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To combine the advantages of both stability and optimality-based designs, a single network adaptive critic (SNAC) aided nonlinear dynamic inversion approach is presented in this paper. Here, the gains of a dynamic inversion controller are selected in such a way that the resulting controller behaves very close to a pre-synthesized SNAC controller in the output regulation sense. Because SNAC is based on optimal control theory, it makes the dynamic inversion controller operate nearly optimal. More important, it retains the two major benefits of dynamic inversion, namely (i) a closed-form expression of the controller and (ii) easy scalability to command tracking applications without knowing the reference commands a priori. An extended architecture is also presented in this paper that adapts online to system modeling and inversion errors, as well as reduced control effectiveness, thereby leading to enhanced robustness. The strengths of this hybrid method of applying SNAC to optimize an nonlinear dynamic inversion controller is demonstrated by considering a benchmark problem in robotics, that is, a two-link robotic manipulator system. Copyright (C) 2013 John Wiley & Sons, Ltd.

Solving sensor network coverage problems by distributed asynchronous actor critic methods

Relevância:

20.00% 20.00%

Publicador:

Distributed multi-agent actor-critic algorithms with applications to stochastic path finding problems

Relevância:

20.00% 20.00%

Publicador:

Da literatura para a dança: a prosa-poética de Gertrude Stein em tradução intersemiótica

Relevância:

20.00% 20.00%

Publicador:

Resumo:

O escopo desta tese é a relação entre a prosa-poética da escritora norte-americana Gertrude Stein, através de seus retratos e peças, e traduções intersemióticas para dança contemporânea. O corpus analítico articula os retratos Orta or One Dancing, If I Told Him: A Completed Portrait of Picasso, A Valentine to Sherwood Anderson, e as peças Four Saints in Three Acts, Listen to Me e Three Sisters Who Are Not Sisters de Gertrude Stein e os espetáculos de dança [5.sobre.o.mesmo], Shutters Shut, Always Now Slowly, ,e[dez episódios sobre a prosa topovisual de gertrude stein]. A natureza dos campos colocados em comparação literatura & dança demandou a conjugação de duas vertentes de estudo ligadas às especificidades performática e tradutória dos objetos selecionados: de um lado, seguimos encaminhamentos surgidos de uma derivação específica da Comparatística tradicional, os Estudos Interartes ou Artes Comparativas; de outro, os Estudos de Intermidialidade, relacionados aos Estudos das Mídias. A abordagem dos exemplos analisados sob a perspectiva comparativa baseia-se em Estudos de Tradução, com especial referência à noção de transcriação de Haroldo de Campos, e na semiótica de Charles S.Peirce. No primeiro capítulo, definimos nossa abordagem teórica; a seguir, apresentamos a obra de Gertrude Stein e as principais propriedades que transformaram sua obra em uma das principais referências literárias e estéticas do século XX; e, para finalizar, analisamos as traduções, com especial atenção para a transcriação da percepção do tempo e da construção sintática steineanas. Concluímos sugerindo que as traduções para dança são modos de interpretação e leitura dos textos literários, bem como formas radicais de crítica de arte ou literária

Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems

Relevância:

20.00% 20.00%

Publicador:

Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.

The double dance of agency: a socio-theoretic account of how machines and humans interact

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The nature of the relationship between information technology (IT) and organizations has been a long-standing debate in the Information Systems literature. Does IT shape organizations, or do people in organisations control how IT is used? To formulate the question a little differently: does agency (the capacity to make a difference) lie predominantly with machines (computer systems) or humans (organisational actors)? Many proposals for a middle way between the extremes of technological and social determinism have been put advanced; in recent years researchers oriented towards social theories have focused on structuration theory and (lately) actor network theory. These two theories, however, adopt different and incompatible views of agency. Thus, structuration theory sees agency as exclusively a property of humans, whereas the principle of general symmetry in actor network theory implies that machines may also be agents. Drawing on critiques of both structuration theory and actor network theory, this paper develops a theoretical account of the interaction between human and machine agency: the double dance of agency. The account seeks to contribute to theorisation of the relationship between technology and organisation by recognizing both the different character of human and machine agency, and the emergent properties of their interplay.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

from Webern to dance to Burton.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Commissioned by the Concorde Ensemble. Paul Roe gave the premiere performance at the RHA Gallery, Dublin, 23rd February 2014. The piece is informed by the choreography of Jiri Kylian, in particular two three minute sections of his work No More Play initially choreographed to a score by Webern (his Five Movements for String Quartet) hence the title.

Dance of the waves

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Impressionistic piece. Dance of the Waves uses and expansive structure (A, A2, B, A2, Solos, D, C2, D). The piece is an atmospheric soundscape which evokes imagery of the ocean.

«
1
2
3
4
5
6
7
8
...
64
65
»