89 resultados para Reinforcement
em Queensland University of Technology - ePrints Archive
Resumo:
The load–frequency control (LFC) problem has been one of the major subjects in a power system. In practice, LFC systems use proportional–integral (PI) controllers. However since these controllers are designed using a linear model, the non-linearities of the system are not accounted for and they are incapable of gaining good dynamical performance for a wide range of operating conditions in a multi-area power system. A strategy for solving this problem because of the distributed nature of a multi-area power system is presented by using a multi-agent reinforcement learning (MARL) approach. It consists of two agents in each power area; the estimator agent provides the area control error (ACE) signal based on the frequency bias estimation and the controller agent uses reinforcement learning to control the power system in which genetic algorithm optimisation is used to tune its parameters. This method does not depend on any knowledge of the system and it admits considerable flexibility in defining the control objective. Also, by finding the ACE signal based on the frequency bias estimation the LFC performance is improved and by using the MARL parallel, computation is realised, leading to a high degree of scalability. Here, to illustrate the accuracy of the proposed approach, a three-area power system example is given with two scenarios.
Resumo:
In an open railway access market price negotiation, it is feasible to achieve higher cost recovery by applying the principles of price discrimination. The price negotiation can be modeled as an optimization problem of revenue intake. In this paper, we present the pricing negotiation based on reinforcement learning model. A negotiated-price setting technique based on agent learning is introduced, and the feasible applications of the proposed method for open railway access market simulation are discussed.
Resumo:
Partially Grouted Reinforced Masonry (PGRM) shear walls perform well in places where the cyclonic wind pressure dominates the design. Their out-of-plane flexural performance is better understood than their inplane shear behaviour; in particular, it is not clear whether the PGRM shear walls act as unreinforced masonry (URM) walls embedded with discrete reinforced grouted cores or as integral systems of reinforced masonry (RM) with wider spacing of reinforcement. With a view to understanding the inplane response of PGRM shear walls, ten full scale single leaf, clay block walls were constructed and tested under monotonic and cyclic inplane loading cases. It has been shown that where the spacing of the vertical reinforcement is less than 2000mm, the walls behave as an integral system of RM; for spacing greater than 2000mm, the walls behave similar to URM with no significant benefit from the reinforced cores based on the displacement ductility and stiffness degradation factors derived from the complete lateral load – lateral displacement curves.
Resumo:
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~ O(HS p AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.
Resumo:
This study investigated the effect of a fear-based personality trait, as conceptualised in Gray’s revised reinforcement sensitivity theory (RST) by the strength of the fight/flight/freeze system (FFFS), on young people’s driving simulator performance under induced psychosocial stress. Seventy-one young drivers completed the Jackson-5 questionnaire of RST traits, followed by a psychosocial stress or relaxation induction procedure (random allocation to groups) and then a city driving simulator task. Some support was found for the hypothesis that higher FFFS sensitivity would result in poorer driving performance under stress, in terms of significantly poorer hazard responses, possibly due to an increased attentional focus on the aversive cues inherent in the stress induction leaving reduced attentional capacity for the driving task. These results suggest that stress may lead to riskier driving behaviour in individuals with fearful RST personality styles.
Resumo:
In this paper, a new comprehensive planning methodology is proposed for implementing distribution network reinforcement. The load growth, voltage profile, distribution line loss, and reliability are considered in this procedure. A time-segmentation technique is employed to reduce the computational load. Options considered range from supporting the load growth using the traditional approach of upgrading the conventional equipment in the distribution network, through to the use of dispatchable distributed generators (DDG). The objective function is composed of the construction cost, loss cost and reliability cost. As constraints, the bus voltages and the feeder currents should be maintained within the standard level. The DDG output power should not be less than a ratio of its rated power because of efficiency. A hybrid optimization method, called modified discrete particle swarm optimization, is employed to solve this nonlinear and discrete optimization problem. A comparison is performed between the optimized solution based on planning of capacitors along with tap-changing transformer and line upgrading and when DDGs are included in the optimization.
Resumo:
Impaction bone grafting for reconstitution of bone stock in revision hip surgery has been used for nearly 30 years. We used this technique, in combination with a cemented acetabular component, in the acetabula of 304 hips in 292 patients revised for aseptic loosening between 1995 and 2001. The only additional supports used were stainless steel meshes placed against the medial wall or laterally around the acetabular rim to contain the graft. All Paprosky grades of defect were included. Clinical and radiographic outcomes were collected in surviving patients at a minimum of 10 years following the index operation. Mean follow-up was 12.4 years (SD 1.5; range 10.0-16.0). Kaplan-Meier survivorship with revision for aseptic loosening as the endpoint was 85.9% (95% CI 81.0 to 90.8%) at 13.5 years. Clinical scores for pain relief remained satisfactory, and there was no difference in clinical scores between cups that appeared stable and those that appeared loose radiographically.
Resumo:
Rationale: Anabolic steroids are drugs of abuse. However, the potential for addiction remains unclear. Testosterone induces conditioned place preference in rats and oral self-administration in hamsters. Objectives: To determine if male rats and hamsters consume testosterone by intravenous (IV) or intracerebroventricular (ICV) self- administration. Methods: With each nose-poke in the active hole during daily 4-h tests in an operant condi- tioning chamber, gonad-intact adult rats and hamsters received 50 mg testosterone in an aqueous solution of b-cyclodextrin via jugular cannula. The inactive nose- poke hole served as a control. Additional hamsters received vehicle infusions. Results: Rats (n=7) expressed a significant preference for the active nose-poke hole (10.0€2.8 responses/4 h) over the inactive hole (4.7€1.2 responses/4 h). Similarly, during 16 days of testosterone self-administration IV, hamsters (n=9) averaged 11.7€2.9 responses/4 h and 6.3€1.1 responses/4 h in the active and inactive nose-poke holes, respectively. By contrast, vehicle controls (n=8) failed to develop a preference for the active nose-poke hole (6.5€0.5 and 6.4€0.3 responses/4 h). Hamsters (n=8) also self-administered 1 mg testosterone ICV (active hole:39.8€6.0 nose-pokes/ 4 h; inactive hole: 22.6€7.1 nose-pokes/4 h). When testosterone was replaced with vehicle, nose-poking in the active hole declined from 31.1€7.6 to 11.9€3.2 responses/ 4 h within 6 days. Likewise, reversing active and inactive holes increased nose-poking in the previously inactive hole from 9.1€1.9 to 25.6€5.4 responses/4 h. However, reducing the testosterone dose from 1 mg to 0.2 mg per 1 ml injection did not change nose-poking. Conclu- sions: Compared with other drugs of abuse, testosterone reinforcement is modest. Nonetheless, these data support the hypothesis that testosterone is reinforcing.
Resumo:
Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment.