37 resultados para Reward based model
em Cambridge University Engineering Department Publications Database
Resumo:
A method for modelling and predicting the noise generated by the interaction between the unsteady wake shed from the rotor and a downstream row of stators in a modern ultra-high bypass ducted turbofan engine is described. An analytically-based model is developed to account for three main features of the problem. First, the way in which a typical unsteady wake disturbance from the rotor interacts and is distorted by the mean swirling flow as it propagates downstream. The analysis allows for the inclusion of mean entropy gradients and entropy perturbations. Second, the effects of real stator-blade geometry and proper representation of the genuinely three-dimensional nature of the problem. Third, to model the propagation of the resulting noise back upstream in mean swirling flow. The analytical nature of the problem allows for the inclusion of all wake harmonics and enables the response at all blade passing frequencies to be determined. Example results are presented for an initial wake distribution corresponding to a genuine rotor configuration. Comparisons between numerical data and the asymptotic model for the wake evolution are made. Copyright © 2004 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Resumo:
Recent experiments have shown that spike-timing-dependent plasticity is influenced by neuromodulation. We derive theoretical conditions for successful learning of reward-related behavior for a large class of learning rules where Hebbian synaptic plasticity is conditioned on a global modulatory factor signaling reward. We show that all learning rules in this class can be separated into a term that captures the covariance of neuronal firing and reward and a second term that presents the influence of unsupervised learning. The unsupervised term, which is, in general, detrimental for reward-based learning, can be suppressed if the neuromodulatory signal encodes the difference between the reward and the expected reward-but only if the expected reward is calculated for each task and stimulus separately. If several tasks are to be learned simultaneously, the nervous system needs an internal critic that is able to predict the expected reward for arbitrary stimuli. We show that, with a critic, reward-modulated spike-timing-dependent plasticity is capable of learning motor trajectories with a temporal resolution of tens of milliseconds. The relation to temporal difference learning, the relevance of block-based learning paradigms, and the limitations of learning with a critic are discussed.
Resumo:
Copyright © 2014 John Wiley & Sons, Ltd. Copyright © 2014 John Wiley & Sons, Ltd. Summary A field programmable gate array (FPGA) based model predictive controller for two phases of spacecraft rendezvous is presented. Linear time-varying prediction models are used to accommodate elliptical orbits, and a variable prediction horizon is used to facilitate finite time completion of the longer range manoeuvres, whilst a fixed and receding prediction horizon is used for fine-grained tracking at close range. The resulting constrained optimisation problems are solved using a primal-dual interior point algorithm. The majority of the computational demand is in solving a system of simultaneous linear equations at each iteration of this algorithm. To accelerate these operations, a custom circuit is implemented, using a combination of Mathworks HDL Coder and Xilinx System Generator for DSP, and used as a peripheral to a MicroBlaze soft-core processor on the FPGA, on which the remainder of the system is implemented. Certain logic that can be hard-coded for fixed sized problems is implemented to be configurable online, in order to accommodate the varying problem sizes associated with the variable prediction horizon. The system is demonstrated in closed-loop by linking the FPGA with a simulation of the spacecraft dynamics running in Simulink on a PC, using Ethernet. Timing comparisons indicate that the custom implementation is substantially faster than pure embedded software-based interior point methods running on the same MicroBlaze and could be competitive with a pure custom hardware implementation.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
The objective of the present study is to assess the capabilities of a recently developed mechanism-based model for inelastic deformation and damage in structural ceramics. In addition to conventional lattice plasticity, the model accounts for microcrack growth and coalescence as well as granular flow following comminution. The assessment is made through a coupled experimental/computational study of the indentation response of a commercial armor ceramic. The experiments include examinations of subsurface damage zones along with measurements of residual surface profiles and residual near-surface stresses. Extensive finite element computations are conducted in parallel. Comparisons between experiment and simulation indicate that the most discriminating metric in the assessment is the spatial extent of subsurface damage following indentation. Residual stresses provide additional validation. In contrast, surface profiles of indents are dictated largely by lattice plasticity and thus provide minimal additional insight into the inelastic deformation resulting from microcracking or granular flow. A satisfactory level of correlation is obtained using property values that are either measured directly or estimated from physically based arguments, without undue reliance on adjustable (nonphysical) parameters. © 2011 The American Ceramic Society.
Resumo:
This paper addresses the need for computer support in aerospace design. A review of current design methodologies and computer support tools is presented and the need for further support in aerospace design, particularly in the early formative stages of the design process, is discussed. A parameter-based model of design, founded on the assumption that a design process can be constructed from a predefined set of tasks, is proposed for aerospace design. This is supported by knowledge of possible tasks in which the confidence in key design parameters is used as a basis for identifying, or signposting, the next task. A prototype implementation of the signposting model, for use in the design of helicopter rotor blades, is described and results from trials of the tool are presented. Further areas of research are discussed
Resumo:
The Silent Aircraft airframe has a flying wing design with a large wing planform and a propulsion system embedded in the rear of the airframe with intake on the upper surface of the wing. In the present paper, boundary element calculations are presented to evaluate acoustic shielding at low frequencies. Besides the three-dimensional geometry of the Silent Aircraft airframe, a few two-dimensional problems are considered that provide some physical insight into the shielding calculations. Mean flow refraction effects due to forward flight motion are accounted for by a simple time transformation that decouples the mean-flow and acoustic-field calculations. It is shown that significant amount of shielding can be obtained in the shadow region where there is no direct line of sight between the source and observer. The boundary element solutions are restricted to low frequencies. We have used a simple physically-based model to extend the solution to higher frequencies. Based on this model, using a monopole acoustic source, we predict at least an 18 dBA reduction in the overall sound pressure level of forward-propagating fan noise because of shielding.