17 resultados para reinforcement
em CentAUR: Central Archive University of Reading - UK
Resumo:
Perirhinal cortex in monkeys has been thought to be involved in visual associative learning. The authors examined rats' ability to make associations between visual stimuli in a visual secondary reinforcement task. Rats learned 2-choice visual discriminations for secondary visual reinforcement. They showed significant learning of discriminations before any primary reinforcement. Following bilateral perirhinal cortex lesions, rats continued to learn visual discriminations for visual secondary reinforcement at the same rate as before surgery. Thus, this study does not support a critical role of perirhinal cortex in learning for visual secondary reinforcement. Contrasting this result with other positive results, the authors suggest that the role of perirhinal cortex is in "within-object" associations and that it plays a much lesser role in stimulus-stimulus associations between objects.
Resumo:
The question of whether and how tropical Indian Ocean dipole or zonal mode (IOZM) interannual variability is independent of El Nino-Southern Oscillation (ENSO) variability in the Pacific is addressed in a comparison of twin 200-yr runs of a coupled climate model. The first is a reference simulation, and the second has ENSO-scale variability suppressed with a constraint on the tropical Pacific wind stress. The IOZM can exist in the model without ENSO, and the composite evolution of the main anomalies in the Indian Ocean in the two simulations is virtually identical. Its growth depends on a positive feedback between anomalous equatorial easterly winds, upwelling equatorial and coastal Kelvin waves reducing the thermocline depth and sea surface temperature off the coast of Sumatra, and the atmospheric dynamical response to the subsequently reduced convection. Two IOZM triggers in the boreal spring are found. The first is an anomalous Hadley circulation over the eastern tropical Indian Ocean and Maritime Continent, with an early northward penetration of the Southern Hemisphere southeasterly trades. This situation grows out of cooler sea surface temperatures in the southeastern tropical Indian Ocean left behind by a reinforcement of the late austral summer winds. The second trigger is a consequence of a zonal shift in the center of convection associated with a developing El Nino, a Walker cell anomaly. The first trigger is the only one present in the constrained simulation and is similar to the evolution of anomalies in 1994, when the IOZM occurred in the absence of a Pacific El Nino state. The presence of these two triggers-the first independent of ENSO and the second phase locking the IOZM to El Nino-allows an understanding of both the existence of IOZM events when Pacific conditions are neutral and the significant correlation between the IOZM and El Nino.
Resumo:
Researchers at the University of Reading have developed over many years some simple mobile robots that explore an environment they perceive through simple ultrasonic sensors. Information from these sensors has allowed the robots to learn the simple task of moving around while avoiding dynamic obstacles using a static set of fuzzy automata, the choice of which has been criticised, due to its arbitrary nature. This paper considers how a dynamic set of automata can overcome this criticism. In addition, a new reinforcement learning function is outlined which is both scalable to different numbers and types of sensors. The innovations compare successfully with earlier work.
Resumo:
The authors consider the problem of a robot manipulator operating in a noisy workspace. The manipulator is required to move from an initial position P(i) to a final position P(f). P(i) is assumed to be completely defined. However, P(f) is obtained by a sensing operation and is assumed to be fixed but unknown. The authors approach to this problem involves the use of three learning algorithms, the discretized linear reward-penalty (DLR-P) automaton, the linear reward-penalty (LR-P) automaton and a nonlinear reinforcement scheme. An automaton is placed at each joint of the robot and by acting as a decision maker, plans the trajectory based on noisy measurements of P(f).
Resumo:
The peak congestion of the European grid may create significant impacts on system costs because of the need for higher marginal cost generation, higher cost system balancing and increasing grid reinforcement investment. The use of time of use rates, incentives, real time pricing and other programmes, usually defined as Demand Side Management (DSM), could bring about significant reductions in prices, limit carbon emissions from dirty power plants, and improve the integration of renewable sources of energy. Unlike previous studies on elasticity of residential electricity demand under flat tariffs, the aim of this study is not to investigate the known relatively inelastic relationship between demand and prices. Rather, the aim is to assess how occupancy levels vary in different European countries. This reflects the reality of demand loads, which are predominantly determined by the timing of human activities (e.g. travelling to work, taking children to school) rather than prices. To this end, two types of occupancy elasticity are estimated: baseline occupancy elasticity and peak occupancy elasticity. These represent the intrinsic elasticity associated with human activities of single residential end-users in 15 European countries. This study makes use of occupancy time-series data from the Harmonised European Time Use Survey database to build European occupancy curves; identify peak occupancy periods; draw time use demand curves for video and TV watching activity; and estimate national occupancy elasticity levels of single-occupant households. Findings on occupancy elasticities provide an indication of possible DSM strategies based on occupancy levels and not prices.
Resumo:
In order to fabricate a biomimetic skin for an octopus inspired robot, a new process was developed based on mechanical properties measured from real octopus skin. Various knitted nylon textiles were tested and the one of 10-denier nylon was chosen as reinforcement. A combination of Ecoflex 0030 and 0010 silicone rubbers was used as matrix of the composite to obtain the right stiffness for the skin-analogue system. The open mould fabrication process developed allows air bubble to escape easily and the artificial skin produced was thin and waterproof. Material properties of the biomimetic skin were characterised using static tensile and instrumented scissors cutting tests. The Young’s moduli of the artificial skin are 0.08 MPa and 0.13 MPa in the longitudinal and transverse directions, which are much lower than those of the octopus skin. The strength and fracture toughness of the artificial skin, on the other hand are higher than those of real octopus skins. Conically-shaped skin prototypes to be used to cover the robotic arm unit were manufactured and tested. The biomimetic skin prototype was stiff enough to maintain it conical shape when filled with water. The driving force for elongation was reduced significantly compared with previous prototypes.
Resumo:
We examined the maturation of decision-making from early adolescence to mid-adulthood using fMRI of a variant of the Iowa gambling task. We have previously shown that performance in this task relies on sensitivity to accumulating negative outcomes in ventromedial PFC and dorsolateral PFC. Here, we further formalize outcome evaluation (as driven by prediction errors [PE], using a reinforcement learning model) and examine its development. Task performance improved significantly during adolescence, stabilizing in adulthood. Performance relied on greater impact of negative compared with positive PEs, the relative impact of which matured from adolescence into adulthood. Adolescents also showed increased exploratory behavior, expressed as a propensity to shift responding between options independently of outcome quality, whereas adults showed no systematic shifting patterns. The correlation between PE representation and improved performance strengthened with age for activation in ventral and dorsal PFC, ventral striatum, and temporal and parietal cortices. There was a medial-lateral distinction in the prefrontal substrates of effective PE utilization between adults and adolescents: Increased utilization of negative PEs, a hallmark of successful performance in the task, was associated with increased activation in ventromedial PFC in adults, but decreased activation in ventrolateral PFC and striatum in adolescents. These results suggest that adults and adolescents engage qualitatively distinct neural and psychological processes during decision-making, the development of which is not exclusively dependent on reward-processing maturation.
Resumo:
Contrary to the widespread belief that people are positively motivated by reward incentives, some studies have shown that performance-based extrinsic reward can actually undermine a person's intrinsic motivation to engage in a task. This “undermining effect” has timely practical implications, given the burgeoning of performance-based incentive systems in contemporary society. It also presents a theoretical challenge for economic and reinforcement learning theories, which tend to assume that monetary incentives monotonically increase motivation. Despite the practical and theoretical importance of this provocative phenomenon, however, little is known about its neural basis. Herein we induced the behavioral undermining effect using a newly developed task, and we tracked its neural correlates using functional MRI. Our results show that performance-based monetary reward indeed undermines intrinsic motivation, as assessed by the number of voluntary engagements in the task. We found that activity in the anterior striatum and the prefrontal areas decreased along with this behavioral undermining effect. These findings suggest that the corticobasal ganglia valuation system underlies the undermining effect through the integration of extrinsic reward value and intrinsic task value.
Resumo:
Position in the social hierarchy can influence brain dopamine function and cocaine reinforcement in nonhuman primates during early cocaine exposure. With prolonged exposure, however, initial differences in rates of cocaine self-administration between dominant and subordinate monkeys dissipate. The present studies used a choice procedure to assess the relative reinforcing strength of cocaine in group-housed male cynomolgus monkeys with extensive cocaine self-administration histories. Responding was maintained under a concurrent fixed-ratio 50 schedule of food and cocaine (0.003-0.1 mg/kg per injection) presentation. Responding on the cocaine-associated lever increased as a function of cocaine dose in all monkeys. Although response distribution was similar across social rank when saline or relatively low or high cocaine doses were the alternative to food, planned t tests indicated that cocaine choice was significantly greater in subordinate monkeys when choice was between an intermediate dose (0.01 mg/kg) and food. When a between-session progressive-ratio procedure was used to increase response requirements for the preferred reinforcer (either cocaine or food), choice of that reinforcer decreased in all monkeys. The average response requirement that produced a shift in response allocation from the cocaine-associated lever to the food-associated lever was higher in subordinates across cocaine doses, an effect that trended toward significance (p = 0.053). These data indicate that despite an extensive history of cocaine self-administration, most subordinate monkeys were more sensitive to the relative reinforcing strength of cocaine than dominant monkeys.
Resumo:
Extinction following positively reinforced operant conditioning reduces response frequency, at least in part through the aversive or frustrative effects of non-reinforcement. According to J.A. Gray's theory, non-reinforcement activates the behavioural inhibition system which in turn causes anxiety. As predicted, anxiolytic drugs including benzodiazepines affect the operant extinction process. Recent studies have shown that reducing GABA-mediated neurotransmission retards extinction of aversive conditioning. We have shown in a series of studies that anxiolytic compounds that potentiate GABA facilitate extinction of positively reinforced fixed-ratio operant behaviour in C57B1/6 male mice. This effect does not occur in the early stages of extinction, nor is it dependent on cumulative effects of the compound administered. Potentiation of GABA at later stages has the effect of increasing sensitivity to the extinction contingency and facilitates the inhibition of the behaviour that is no longer required. The GABAergic hypnotic, zolpidem, has the same selective effects on operant extinction in this procedure. The effects of zolpidem are not due to sedative action. There is evidence across our series of experiments that different GABA-A subtype receptors are involved in extinction facilitation and anxiolysis. Consequently, this procedure may not be an appropriate model for anxiolytic drug action, but it may be a useful technique for analysing the neural bases of extinction and designing therapeutic interventions in humans where failure to extinguish inappropriate behaviours can lead to pathological conditions such as post-traumatic stress disorder.
Resumo:
Relatively little is known about the role of the inhibitory neurotransmitter gamma-aminobutyric acid (GABA) in extinction of appetitively motivated tasks. The benzodiazepine (BZ) chlordiazepoxide (CDP) was administered during extinction and re-acquisition of lever pressing by mice following food reinforced discrete-trial fixed-ratio 5 (FR-5) training. Typical FR behaviour was established during baseline training and persisted for several extinction sessions. There were 15 extinction sessions in all, followed by six re-acquisition sessions where food reinforcement was re-introduced. In a 2x2x2 between-group design, CDP (15 mg/kg) or vehicle injections were given prior to either the last two food reinforcement sessions and the first 10 extinction sessions, or the final five extinction sessions, or the six re-acquisition sessions. Initially CDP had no effect on the rate of extinction, but after several extinction sessions it significantly facilitated it. Surprisingly, if CDP was administered only after several sessions of extinction, it immediately produced facilitation. Thus the delayed effects of CDP are not due to drug accumulation. These data suggest that some neural change must occur before CDP can affect extinction processes. In re-acquisition sessions, CDP facilitated the reinstatement of food-reinforced lever pressing. Implications for neural and behavioural accounts of operant extinction are discussed.
Resumo:
The Distribution Network Operators (DNOs) role is becoming more difficult as electric vehicles and electric heating penetrate the network, increasing the demand. As a result it becomes harder for the distribution networks infrastructure to remain within its operating constraints. Energy storage is a potential alternative to conventional network reinforcement such as upgrading cables and transformers. The research presented here in this paper shows that due to the volatile nature of the LV network, the control approach used for energy storage has a significant impact on performance. This paper presents and compares control methodologies for energy storage where the objective is to get the greatest possible peak demand reduction across the day from a pre-specified storage device. The results presented show the benefits and detriments of specific types of control on a storage device connected to a single phase of an LV network, using aggregated demand profiles based on real smart meter data from individual homes. The research demonstrates an important relationship between how predictable an aggregation is and the best control methodology required to achieve the objective.
Resumo:
The ability to change an established stimulus–behavior association based on feedback is critical for adaptive social behaviors. This ability has been examined in reversal learning tasks, where participants first learn a stimulus–response association (e.g., select a particular object to get a reward) and then need to alter their response when reinforcement contingencies change. Although substantial evidence demonstrates that the OFC is a critical region for reversal learning, previous studies have not distinguished reversal learning for emotional associations from neutral associations. The current study examined whether OFC plays similar roles in emotional versus neutral reversal learning. The OFC showed greater activity during reversals of stimulus–outcome associations for negative outcomes than for neutral outcomes. Similar OFC activity was also observed during reversals involving positive outcomes. Furthermore, OFC activity is more inversely correlated with amygdala activity during negative reversals than during neutral reversals. Overall, our results indicate that the OFC is more activated by emotional than neutral reversal learning and that OFC's interactions with the amygdala are greater for negative than neutral reversal learning.