124 resultados para GFRP reinforcement
Resumo:
This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.
Resumo:
Post-earthquake structural safety evaluations are currently performed manually by a team of certified inspectors and/or structural engineers. This process is time-consuming and costly, keeping owners and occupants from returning to their businesses and homes. Automating these evaluations would enable faster, and potentially more consistent, relief and response processes. In order to do this, the detection of exposed reinforcing steel is of utmost significance. This paper presents a novel method of detecting exposed reinforcement in concrete columns for the purpose of advancing practices of structural and safety evaluation of buildings after earthquakes. Under this method, the binary image of the reinforcing area is first isolated using a state-of-the-art adaptive thresholding technique. Next, the ribbed regions of the reinforcement are detected by way of binary template matching. Finally, vertical and horizontal profiling are applied to the processed image in order to filter out any superfluous pixels and take into consideration the size of reinforcement bars in relation to that of the structural element within which they reside. The final result is the combined binary image disclosing only the regions containing rebar overlaid on top of the original image. The method is tested on a set of images from the January 2010 earthquake in Haiti. Preliminary test results convey that most exposed reinforcement could be properly detected in images of moderately-to-severely damaged concrete columns.
Resumo:
The world is at the threshold of emerging technologies, where new systems in construction, materials, and civil and architectural design are poised to make the world better from a structural and construction perspective. Exciting developments, that are too many to name individually, take place yearly, affecting design considerations and construction practices. This edited book brings together modern methods and advances in structural engineering and construction, fulfilling the mission of ISEC Conferences, which is to enhance communication and understanding between structural and construction engineers for successful design and construction of engineering projects. The articles in this book are those accepted for publication and presentation at the 6th International Structural Engineering and Construction Conference in Zurich. The 6th ISEC Conference in Zurich, Switzerland, follows the overwhelming reception and success of previous ISEC conference in Las Vegas, USA in 2009; Melbourne, Australia in 2007; Shunan, Japan in 2005; Rome, Italy in 2003; and Honolulu, USA in 2001. Many topics are covered in this book, ranging from legal affairs and contracting, to innovations and risk analysis in infrastructure projects, analysis and design of structural systems, materials, architecture, and construction. The articles here are a lasting testimony to the excellent research being undertaken around the world. These articles provide a platform for the exchange of ideas, research efforts and networking in the structural engineering and construction communities. We congratulate and thank the authors for these articles that were selected after intensive peer-review, and our gratitude extends to all reviewers and members of the International Technical Committee. It is their combined contributions that have made this book a reality.