996 resultados para population reinforcement
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
The Yunnan snub-nosed monkey (Rhinopithecus bieti), an endangered species in China, has received more protection in theory than in practice. Therefore it is on the very verge of extinction. The population of the species was estimated less than 2,000 individuals spread in 19 distinct groups. It was confirmed that the monkey was confined to the Yunling Mountain System, the area between the Yangtze River (Changjiang, aka Jinshajiang) to the east and the Mekong River (Lancangjiang) to the west. We further concluded that a lowland belt to the east, about 100 km long and 20 - 30 km wide was not suitable habitat for the monkeys, and appeared to serve as the natural ecogeologic barrier for the species. Our results indicated that the southern limit of the distribution was at Longma (26-degrees 14'N), and that the northern limit of the distribution was at Xiaochangdu (29-degrees 20'N). The distribution area of the species was substantially smaller than previously estimated. There were substantial ecological differences between the southern and northern parts of the species range. The monkey was found only in fir-larch forest.
Resumo:
The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.
Resumo:
Chapter 6 A Population Perspective on Mobile Phone Related Tasks M. Bradley, S. Waller, J. Goodman-Deane, l. Hosking, R. Tenneti, PM Langdon and PJ Clarkson 6.1 Introduction For design to be truly inclusive, it needs to take into ...
Resumo:
Chapter 11 Intrinsic Motivation and Design of ICT for ... Information and Communication Technology (ICT) systems provide an increasingly promising platform with which to improve the efficiency and effectiveness of healthcare, ...
Resumo:
Post-earthquake structural safety evaluations are currently performed manually by a team of certified inspectors and/or structural engineers. This process is time-consuming and costly, keeping owners and occupants from returning to their businesses and homes. Automating these evaluations would enable faster, and potentially more consistent, relief and response processes. In order to do this, the detection of exposed reinforcing steel is of utmost significance. This paper presents a novel method of detecting exposed reinforcement in concrete columns for the purpose of advancing practices of structural and safety evaluation of buildings after earthquakes. Under this method, the binary image of the reinforcing area is first isolated using a state-of-the-art adaptive thresholding technique. Next, the ribbed regions of the reinforcement are detected by way of binary template matching. Finally, vertical and horizontal profiling are applied to the processed image in order to filter out any superfluous pixels and take into consideration the size of reinforcement bars in relation to that of the structural element within which they reside. The final result is the combined binary image disclosing only the regions containing rebar overlaid on top of the original image. The method is tested on a set of images from the January 2010 earthquake in Haiti. Preliminary test results convey that most exposed reinforcement could be properly detected in images of moderately-to-severely damaged concrete columns.
Resumo:
The world is at the threshold of emerging technologies, where new systems in construction, materials, and civil and architectural design are poised to make the world better from a structural and construction perspective. Exciting developments, that are too many to name individually, take place yearly, affecting design considerations and construction practices. This edited book brings together modern methods and advances in structural engineering and construction, fulfilling the mission of ISEC Conferences, which is to enhance communication and understanding between structural and construction engineers for successful design and construction of engineering projects. The articles in this book are those accepted for publication and presentation at the 6th International Structural Engineering and Construction Conference in Zurich. The 6th ISEC Conference in Zurich, Switzerland, follows the overwhelming reception and success of previous ISEC conference in Las Vegas, USA in 2009; Melbourne, Australia in 2007; Shunan, Japan in 2005; Rome, Italy in 2003; and Honolulu, USA in 2001. Many topics are covered in this book, ranging from legal affairs and contracting, to innovations and risk analysis in infrastructure projects, analysis and design of structural systems, materials, architecture, and construction. The articles here are a lasting testimony to the excellent research being undertaken around the world. These articles provide a platform for the exchange of ideas, research efforts and networking in the structural engineering and construction communities. We congratulate and thank the authors for these articles that were selected after intensive peer-review, and our gratitude extends to all reviewers and members of the International Technical Committee. It is their combined contributions that have made this book a reality.
Resumo:
An object in the peripheral visual field is more difficult to recognize when surrounded by other objects. This phenomenon is called "crowding". Crowding places a fundamental constraint on human vision that limits performance on numerous tasks. It has been suggested that crowding results from spatial feature integration necessary for object recognition. However, in the absence of convincing models, this theory has remained controversial. Here, we present a quantitative and physiologically plausible model for spatial integration of orientation signals, based on the principles of population coding. Using simulations, we demonstrate that this model coherently accounts for fundamental properties of crowding, including critical spacing, "compulsory averaging", and a foveal-peripheral anisotropy. Moreover, we show that the model predicts increased responses to correlated visual stimuli. Altogether, these results suggest that crowding has little immediate bearing on object recognition but is a by-product of a general, elementary integration mechanism in early vision aimed at improving signal quality.
Resumo:
The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning.
Resumo:
The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning. © 2012 The Author.
Resumo:
Successful inclusive product design requires knowledge about the capabilities, needs and aspirations of potential users and should cater for the different scenarios in which people will use products, systems and services. This should include: the individual at home; in the workplace; for businesses, and for products in these contexts. It needs to reflect the development of theory, tools and techniques as research moves on. And it must also to draw in wider psychological, social, and economic considerations in order to gain a more accurate understanding of users' interactions with products and technology. However, recent research suggests that although a number of national disability surveys have been carried out, no such knowledge currently exists as information to support the design of products, systems and services for heterogeneous users. This paper outlines the strategy behind specific inclusive design research that is aimed at creating the foundations for measuring inclusion in product designs. A key outcome of this future research will be specifying and operationalising capability, and psychological, social and economic context measures for inclusive design. This paper proposes a framework for capturing such information, describes an early pilot study, and makes recommendations for better practice.
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.