68 resultados para Wireless Sensor and Actuator Networks. Simulation. Reinforcement Learning. Routing Techniques

em Cambridge University Engineering Department Publications Database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning. © 2012 The Author.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSNs) which utilise IEEE 802.15.4 technology operate primarily in the 2.4 GHz globally compatible ISM band. However, the wireless propagation channel in this crowded band is notoriously variable and unpredictable, and it has a significant impact on the coverage range and quality of the radio links between the wireless nodes. Therefore, the use of Frequency Diversity (FD) has potential to ameliorate this situation. In this paper, the possible benefits of using FD in a tunnel environment have been quantified by performing accurate propagation measurements using modified and calibrated off-the-shelf 802.15.4 based sensor motes in the disused Aldwych underground railway tunnel. The objective of this investigation is to characterise the performance of FD in this confined environment. Cross correlation coefficients are calculated from samples of the received power on a number of frequency channels gathered during the field measurements. The low measured values of the cross correlation coefficients indicate that applying FD at 2.4 GHz will improve link performance in a WSN deployed in a tunnel. This finding closely matches results obtained by running a computational simulation of the tunnel radio propagation using a 2D Finite-Difference Time-Domain (FDTD) method. ©2009 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Node placement plays a significant role in the effective and successful deployment of Wireless Sensor Networks (WSNs), i.e., meeting design goals such as cost effectiveness, coverage, connectivity, lifetime and data latency. In this paper, we propose a new strategy to assist in the placement of Relay Nodes (RNs) for a WSN monitoring underground tunnel infrastructure. By applying for the first time an accurate empirical mean path loss propagation model along with a well fitted fading distribution model specifically defined for the tunnel environment, we address the RN placement problem with guaranteed levels of radio link performance. The simulation results show that the choice of appropriate path loss model and fading distribution model for a typical environment is vital in the determination of the number and the positions of RNs. Furthermore, we adapt a two-tier clustering multi-hop framework in which the first tier of the RN placement is modelled as the minimum set cover problem, and the second tier placement is solved using the search-and-find algorithm. The implementation of the proposed scheme is evaluated by simulation, and it lays the foundations for further work in WSN planning for underground tunnel applications. © 2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developments in Micro-Electro-Mechanical Systems (MEMS), wireless communication systems and ad-hoc networking have created new dimensions to improve asset management not only during the operational phase but throughout an asset's lifecycle based on using improved quality of information obtained with respect to two key aspects of an asset: its location and condition. In this paper, we present our experience as well as lessons learnt from building a prototype condition monitoring platform to demonstrate and to evaluate the use of COTS wireless sensor networks to develop a prototype condition monitoring platform with the aim of improving asset management by providing accurate and real-time information. © 2010 IEEE.