33 resultados para continuous-time asymptotics
em Cambridge University Engineering Department Publications Database
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Resumo:
The classes of continuous-time flows on Rn×p that induce the same flow on the set of p- dimensional subspaces of Rn×p are described. The power flow is briefly reviewed in this framework, and a subspace generalization of the Rayleigh quotient flow [Linear Algebra Appl. 368C, 2003, pp. 343-357] is proposed and analyzed. This new flow displays a property akin to deflation in finite time. © 2008 Yokohama Publishers.
Resumo:
In this paper, we describe models and algorithms for detection and tracking of group and individual targets. We develop two novel group dynamical models, within a continuous time setting, that aim to mimic behavioural properties of groups. We also describe two possible ways of modeling interactions between closely using Markov Random Field (MRF) and repulsive forces. These can be combined together with a group structure transition model to create realistic evolving group models. We use a Markov Chain Monte Carlo (MCMC)-Particles Algorithm to perform sequential inference. Computer simulations demonstrate the ability of the algorithm to detect and track targets within groups, as well as infer the correct group structure over time. ©2008 IEEE.
Resumo:
In this paper we present a new, compact derivation of state-space formulae for the so-called discretisation-based solution of the H∞ sampled-data control problem. Our approach is based on the established technique of continuous time-lifting, which is used to isometrically map the continuous-time, linear, periodically time-varying, sampled-data problem to a discretetime, linear, time-invariant problem. State-space formulae are derived for the equivalent, discrete-time problem by solving a set of two-point, boundary-value problems. The formulae accommodate a direct feed-through term from the disturbance inputs to the controlled outputs of the original plant and are simple, requiring the computation of only a single matrix exponential. It is also shown that the resultant formulae can be easily re-structured to give a numerically robust algorithm for computing the state-space matrices. © 1997 Elsevier Science Ltd. All rights reserved.
Resumo:
Model compensation is a standard way of improving the robustness of speech recognition systems to noise. A number of popular schemes are based on vector Taylor series (VTS) compensation, which uses a linear approximation to represent the influence of noise on the clean speech. To compensate the dynamic parameters, the continuous time approximation is often used. This approximation uses a point estimate of the gradient, which fails to take into account that dynamic coefficients are a function of a number of consecutive static coefficients. In this paper, the accuracy of dynamic parameter compensation is improved by representing the dynamic features as a linear transformation of a window of static features. A modified version of VTS compensation is applied to the distribution of the window of static features and, importantly, their correlations. These compensated distributions are then transformed to distributions over standard static and dynamic features. With this improved approximation, it is also possible to obtain full-covariance corrupted speech distributions. This addresses the correlation changes that occur in noise. The proposed scheme outperformed the standard VTS scheme by 10% to 20% relative on a range of tasks. © 2006 IEEE.