917 resultados para continuous-time asymptotics
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Resumo:
The classes of continuous-time flows on Rn×p that induce the same flow on the set of p- dimensional subspaces of Rn×p are described. The power flow is briefly reviewed in this framework, and a subspace generalization of the Rayleigh quotient flow [Linear Algebra Appl. 368C, 2003, pp. 343-357] is proposed and analyzed. This new flow displays a property akin to deflation in finite time. © 2008 Yokohama Publishers.
Resumo:
A continuous-time 7th-order Butterworth Gm-C low pass filter (LPF) with on-chip automatic tuning circuit has been implemented for a direct conversion DBS tuner in a 0.35um SiGe BiCMOS technology. The filter's -3dB cutoff frequency f(0) can be tuned from 4MHz to 40MHz. A novel translinear transconductor (Gm) cell is used to implement the widely tunable and high linear filter. The filter has -0.5dB passband gain, 28nV/Hz(1/2) input referred noise, -2dBVrms passband IIP3, 24dBVrms stopband IIP3. The I/Q LPFs with the tuning circuit draw 16mA (with f(0)=20MHz) from 3.3 V supply, and occupy an area of 0.45 mm(2).
Resumo:
A continuous-time 7th-order Butterworth Gm-C low pass filter (LPF) with on-chip automatic tuning circuit has been implemented for a direct conversion DBS tuner in 0.35μm SiGe BiCMOS technology. The filter's -3 dB cutoff frequency f0 can be tuned from 4 to 40 MHz. A novel on-chip automatic tuning scheme has been successfully realized to tune and lock the filter's cutoff frequency. Measurement results show that the filter has -0.5 dB passband gain, +/- 5% bandwidth accuracy, 30 nV/Hz~(1/2) input referred noise, -3 dBVrms passband IIP3, and 27 dBVrms stopband IIP3. The I/Q LPFs with the tuning circuit draw 13 mA (with f_0 = 20 MHz) from 5 V supply, and occupy 0.5 mm~2.
Resumo:
Gough, John, (2004) 'Holevo-Ordering and the Continuous-Time Limit for Open Floquet Dynamics', Letters in Mathematical Physcis 67(3) pp.207-221 RAE2008
Resumo:
This paper analyzes a class of common-component allocation rules, termed no-holdback (NHB) rules, in continuous-review assemble-to-order (ATO) systems with positive lead times. The inventory of each component is replenished following an independent base-stock policy. In contrast to the usually assumed first-come-first-served (FCFS) component allocation rule in the literature, an NHB rule allocates a component to a product demand only if it will yield immediate fulfillment of that demand. We identify metrics as well as cost and product structures under which NHB rules outperform all other component allocation rules. For systems with certain product structures, we obtain key performance expressions and compare them to those under FCFS. For general product structures, we present performance bounds and approximations. Finally, we discuss the applicability of these results to more general ATO systems. © 2010 INFORMS.
Resumo:
The key problems in discussing stochastic monotonicity and duality for continuous time Markov chains are to give the criteria for existence and uniqueness and to construct the associated monotone processes in terms of their infinitesimal q -matrices. In their recent paper, Chen and Zhang [6] discussed these problems under the condition that the given q-matrix Q is conservative. The aim of this paper is to generalize their results to a more general case, i.e., the given q-matrix Q is not necessarily conservative. New problems arise 'in removing the conservative assumption. The existence and uniqueness criteria for this general case are given in this paper. Another important problem, the construction of all stochastically monotone Q-processes, is also considered.
Resumo:
We derive necessary and sufficient conditions for the existence of bounded or summable solutions to systems of linear equations associated with Markov chains. This substantially extends a famous result of G. E. H. Reuter, which provides a convenient means of checking various uniqueness criteria for birth-death processes. Our result allows chains with much more general transition structures to be accommodated. One application is to give a new proof of an important result of M. F. Chen concerning upwardly skip-free processes. We then use our generalization of Reuter's lemma to prove new results for downwardly skip-free chains, such as the Markov branching process and several of its many generalizations. This permits us to establish uniqueness criteria for several models, including the general birth, death, and catastrophe process, extended branching processes, and asymptotic birth-death processes, the latter being neither upwardly skip-free nor downwardly skip-free.
Resumo:
It is shown how the fractional probability density diffusion equation for the diffusion limit of one-dimensional continuous time random walks may be derived from a generalized Markovian Chapman-Kolmogorov equation. The non-Markovian behaviour is incorporated into the Markovian Chapman-Kolmogorov equation by postulating a Levy like distribution of waiting times as a kernel. The Chapman-Kolmogorov equation so generalised then takes on the form of a convolution integral. The dependence on the initial conditions typical of a non-Markovian process is treated by adding a time dependent term involving the survival probability to the convolution integral. In the diffusion limit these two assumptions about the past history of the process are sufficient to reproduce anomalous diffusion and relaxation behaviour of the Cole-Cole type. The Green function in the diffusion limit is calculated using the fact that the characteristic function is the Mittag-Leffler function. Fourier inversion of the characteristic function yields the Green function in terms of a Wright function. The moments of the distribution function are evaluated from the Mittag-Leffler function using the properties of characteristic functions and a relation between the powers of the second moment and higher order even moments is derived. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
A conventional local model (LM) network consists of a set of affine local models blended together using appropriate weighting functions. Such networks have poor interpretability since the dynamics of the blended network are only weakly related to the underlying local models. In contrast, velocity-based LM networks employ strictly linear local models to provide a transparent framework for nonlinear modelling in which the global dynamics are a simple linear combination of the local model dynamics. A novel approach for constructing continuous-time velocity-based networks from plant data is presented. Key issues including continuous-time parameter estimation, correct realisation of the velocity-based local models and avoidance of the input derivative are all addressed. Application results are reported for the highly nonlinear simulated continuous stirred tank reactor process.