114 resultados para reinforcement


Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Learning is often understood as an organism's gradual acquisition of the association between a given sensory stimulus and the correct motor response. Mathematically, this corresponds to regressing a mapping between the set of observations and the set of actions. Recently, however, it has been shown both in cognitive and motor neuroscience that humans are not only able to learn particular stimulus-response mappings, but are also able to extract abstract structural invariants that facilitate generalization to novel tasks. Here we show how such structure learning can enhance facilitation in a sensorimotor association task performed by human subjects. Using regression and reinforcement learning models we show that the observed facilitation cannot be explained by these basic models of learning stimulus-response associations. We show, however, that the observed data can be explained by a hierarchical Bayesian model that performs structure learning. In line with previous results from cognitive tasks, this suggests that hierarchical Bayesian inference might provide a common framework to explain both the learning of specific stimulus-response associations and the learning of abstract structures that are shared by different task environments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the quality of the learned strategy. In particular, the degree to which strategies learned with a user model generalise to real user populations has not be investigated. This paper presents a series of experiments that qualitatively and quantitatively examine the effect of the user model on the learned strategy. Our results show that the performance and characteristics of the strategy are in fact highly dependent on the user model. Furthermore, a policy trained with a poor user model may appear to perform well when tested with the same model, but fail when tested with a more sophisticated user model. This raises significant doubts about the current practice of learning and evaluating strategies with the same user model. The paper further investigates a new technique for testing and comparing strategies directly on real human-machine dialogues, thereby avoiding any evaluation bias introduced by the user model. © 2005 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that the dialogue statemust encode all relevent information obtained during the dialogue prior to that state. This can be achieved by combining the user goal, the dialogue history, and the last user action to form the dialogue state. In addition, to gain robustness to input errors, dialogue must be modeled as a Partially Observable Markov Decision Process (POMDP) and hence, a distribution over all possible states must be maintained at every dialogue turn. This poses a potential computational limitation since there can be a very large number of dialogue states. The Hidden Information State model provides a principled way of ensuring tractability in a POMDP-based dialogue model. The key feature of this model is the grouping of user goals into partitions that are dynamically built during the dialogue. In this article, we extend this model further to incorporate the notion of complements. This allows for a more complex user goal to be represented, and it enables an effective pruning technique to be implemented that preserves the overall system performance within a limited computational resource more effectively than existing approaches. © 2011 ACM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A case study of the response of two buildings to the construction of a 12 m diameter tunnel excavated by conventional method, in Italy, is studied. The 12 m diameter tunnel was constructed carrying out reinforcement of the tunnel face and around the crown prior to excavation and installation of the temporary sprayed concrete lining and the permanent reinforced concrete lining. Reflective prisms, placed at first floor level around the perimeter of the building facades, allowed building settlements to be measured. Ground settlements between the two buildings were measured using BRE type settlement studs. Extensive protective measures were adopted to maintain stability of the tunnel excavation and to reduce ground movements. The number of horizontal jet grout columns installed into the tunnel face was reduced over the course of the project. Results from CPT tests indicate that the undrained shear strength at the tunnel axis is around 120 kPa. SPT and undrained unconsolidated (UU) triaxial tests indicate lower strengths of around 80 kPa, although this may be due to sample disturbance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The embodied energy (EE) and gas emissions of four design alternatives for an embankment retaining wall system are analyzed for a hypothetical highway construction project. The airborne emissions considered are carbon dioxide (CO 2), methane (CH 4), nitrous oxide (N 2O), sulphur oxides (SO X), and nitrogen oxides (NO X). The process stages considered in this study are the initial materials production, transportation of construction machineries and materials, machinery operation during installation, and machinery depreciations. The objectives are (1) to determine whether there are statistically significant differences among the structural alternatives; (2) to understand the relative proportions of impacts for the process stages within each design; (3) to contextualize the impacts to other aspects in life by comparing the computed EE values to household energy consumption and car emission values; and (4) to examine the validity of the adopted EE as an environmental impact indicator through comparison with the amount of gas emissions. For the project considered in this study, the calculated results indicate that propped steel sheet pile wall and minipile wall systems have less embodied energy and gas emissions than cantilever steel tubular wall and secant concrete pile wall systems. The difference in CO 2 emission for the retaining wall of 100 m length between the most and least environmentally preferable wall design is equivalent to an average 2.0 L family car being driven for 6.2 million miles (or 62 cars with a mileage of 10,000 miles/year for 10 years). The impacts in construction are generally notable and careful consideration and optimization of designs will reduce such impacts. The use of recycled steel or steel pile as reinforcement bar is effective in reducing the environmental impact. The embodied energy value of a given design is correlated to the amount of gas emissions. © 2011 American Society of Civil Engineers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes an experimental study of a new form of prestressed concrete beam. Aramid Fiber Reinforced Polymers (AFRPs) are used to provide compression confinement in the form of interlocking circular spirals, while external tendons are made from parallel-lay aramid ropes. The response shows that the confinement of the compression flange significantly increases the ductility of the beam, allowing much better utilization of the fiber strength. The failure of the beam is characterized by rupture of spiral confinement reinforcement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A small low air-speed wind turbine blade case study is used to demonstrate the effectiveness of a materials and design selection methodology described by Monroy Aceves et al. (2008) [24] for composite structures. The blade structure comprises a shell of uniform thickness and a unidirectional reinforcement. The shell outer geometry is fixed by aerodynamic considerations. A wide range of lay-ups are considered for the shell and reinforcement. Structural analysis is undertaken using the finite element method. Results are incorporated into a database for analysis using material selection software. A graphical selection stage is used to identify the lightest blade meeting appropriate design constraints. The proposed solution satisfies the design requirements and improves on the prototype benchmark by reducing the mass by almost 50%. The flexibility of the selection software in allowing identification of trends in the results and modifications to the selection criteria is demonstrated. Introducing a safety factor of two on the material failure stresses increases the mass by only 11%. The case study demonstrates that the proposed design methodology is useful in preliminary design where a very wide range of cases should be considered using relatively simple analysis. © 2011 Elsevier Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The compressive behaviour of finite unidirectional composites with a region of misaligned reinforcement is investigated via finite element analyses. Models with and without fibre bending stiffness are compared, confirming that compressive strength is accurately predicted without modelling fibre bending stiffness for real composite components which typically have waviness defects of several millimetres wavelength. Various defect parameters are investigated. Results confirm the well-known sensitivity of compressive strength to misalignment angle, and also show that compressive strength falls rapidly with the proportion of laminate width covered by the wavy region. A simple empirical equation is proposed to model the effect of a single patch of waviness in finite specimens. Other parameters such as length and position of the wavy region are found to have a smaller effect on compressive strength. The modelling approach is finally adapted to model distributed waviness and thus determine the compressive strength of composites with realistic waviness defects. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task. © 2010 Association for Computational Linguistics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study investigates the structural behavior of precracked reinforced concrete (RC) T-beams strengthened in shear with externally bonded carbon fiber-reinforced polymer (CFRP) sheets. It reports on seven tests on unstrengthened and strengthened RC T-beams, identifying the influence of load history, beam depth, and percentage of longitudinal steel reinforcement on the structural behavior. The experimental results indicate that the contributions of the external CFRP sheets to the shear force capacity can be significant and depend on most of the investigated variables. This study also investigates the accuracy of the prediction of the fiber-reinforced polymer (FRP) contribution in ACI 440.2R-08, UK Concrete Society TR55, and fib Bulletin 14 design guidelines for shear strengthening. A comparison of predicted values with experimental results indicates that the guidelines can overestimate the shear contribution of the externally bonded FRP system. © 2012, American Concrete Institute.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of the state of the art in spoken dialogue systems (SDS). Yet, it is still the case that the commonly used training algorithms for SDS require a large number of dialogues and hence most systems still rely on artificial data generated by a user simulator. Optimization is therefore performed off-line before releasing the system to real users. Gaussian Processes (GP) for RL have recently been applied to dialogue systems. One advantage of GP is that they compute an explicit measure of uncertainty in the value function estimates computed during learning. In this paper, a class of novel learning strategies is described which use uncertainty to control exploration on-line. Comparisons between several exploration schemes show that significant improvements to learning speed can be obtained and that rapid and safe online optimisation is possible, even on a complex task. Copyright © 2011 ISCA.