704 resultados para Learning in robotics
Resumo:
This thesis addresses the Batch Reinforcement Learning methods in Robotics. This sub-class of Reinforcement Learning has shown promising results and has been the focus of recent research. Three contributions are proposed that aim to extend the state-of-art methods allowing for a faster and more stable learning process, such as required for learning in Robotics. The Q-learning update-rule is widely applied, since it allows to learn without the presence of a model of the environment. However, this update-rule is transition-based and does not take advantage of the underlying episodic structure of collected batch of interactions. The Q-Batch update-rule is proposed in this thesis, to process experiencies along the trajectories collected in the interaction phase. This allows a faster propagation of obtained rewards and penalties, resulting in faster and more robust learning. Non-parametric function approximations are explored, such as Gaussian Processes. This type of approximators allows to encode prior knowledge about the latent function, in the form of kernels, providing a higher level of exibility and accuracy. The application of Gaussian Processes in Batch Reinforcement Learning presented a higher performance in learning tasks than other function approximations used in the literature. Lastly, in order to extract more information from the experiences collected by the agent, model-learning techniques are incorporated to learn the system dynamics. In this way, it is possible to augment the set of collected experiences with experiences generated through planning using the learned models. Experiments were carried out mainly in simulation, with some tests carried out in a physical robotic platform. The obtained results show that the proposed approaches are able to outperform the classical Fitted Q Iteration.
Resumo:
Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.
Resumo:
One of the most important characteristics of intelligent activity is the ability to change behaviour according to many forms of feedback. Through learning an agent can interact with its environment to improve its performance over time. However, most of the techniques known that involves learning are time expensive, i.e., once the agent is supposed to learn over time by experimentation, the task has to be executed many times. Hence, high fidelity simulators can save a lot of time. In this context, this paper describes the framework designed to allow a team of real RoboNova-I humanoids robots to be simulated under USARSim environment. Details about the complete process of modeling and programming the robot are given, as well as the learning methodology proposed to improve robot's performance. Due to the use of a high fidelity model, the learning algorithms can be widely explored in simulation before adapted to real robots. © 2008 Springer-Verlag Berlin Heidelberg.
Resumo:
Image-to-image (i2i) translation networks can generate fake images beneficial for many applications in augmented reality, computer graphics, and robotics. However, they require large scale datasets and high contextual understanding to be trained correctly. In this thesis, we propose strategies for solving these problems, improving performances of i2i translation networks by using domain- or physics-related priors. The thesis is divided into two parts. In Part I, we exploit human abstraction capabilities to identify existing relationships in images, thus defining domains that can be leveraged to improve data usage efficiency. We use additional domain-related information to train networks on web-crawled data, hallucinate scenarios unseen during training, and perform few-shot learning. In Part II, we instead rely on physics priors. First, we combine realistic physics-based rendering with generative networks to boost outputs realism and controllability. Then, we exploit naive physical guidance to drive a manifold reorganization, which allowed generating continuous conditions such as timelapses.
Resumo:
In a local production system (LPS), besides external economies, the interaction, cooperation, and learning are indicated by the literature as complementary ways of enhancing the LPS's competitiveness and gains. In Brazil, the greater part of LPSs, mostly composed by small enterprises, displays incipient relationships and low levels of interaction and cooperation among their actors. The size of the participating enterprises itself for specificities that engender organizational constraints, which, in turn, can have a considerable impact on their relationships and learning dynamics. For that reason, it is the purpose of this article to present an analysis of interaction, cooperation, and learning relationships among several types of actors pertaining to an LPS in the farming equipment and machinery sector, bearing in mind the specificities of small enterprises. To this end, the fieldwork carried out in this study aimed at: (i) investigating external and internal knowledge sources conducive to learning and (ii) identifying and analyzing motivating and inhibiting factors related to specificities of small enterprises in order to bring the LPS members closer together and increase their cooperation and interaction. Empirical evidence shows that internal aspects of the enterprises, related to management and infrastructure, can have a strong bearing on their joint actions, interaction and learning processes.
Resumo:
We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.
Resumo:
This paper is concerned with the use of scientific visualization methods for the analysis of feedforward neural networks (NNs). Inevitably, the kinds of data associated with the design and implementation of neural networks are of very high dimensionality, presenting a major challenge for visualization. A method is described using the well-known statistical technique of principal component analysis (PCA). This is found to be an effective and useful method of visualizing the learning trajectories of many learning algorithms such as back-propagation and can also be used to provide insight into the learning process and the nature of the error surface.
Resumo:
This special issue represents a further exploration of some issues raised at a symposium entitled “Functional magnetic resonance imaging: From methods to madness” presented during the 15th annual Theoretical and Experimental Neuropsychology (TENNET XV) meeting in Montreal, Canada in June, 2004. The special issue’s theme is methods and learning in functional magnetic resonance imaging (fMRI), and it comprises 6 articles (3 reviews and 3 empirical studies). The first (Amaro and Barker) provides a beginners guide to fMRI and the BOLD effect (perhaps an alternative title might have been “fMRI for dummies”). While fMRI is now commonplace, there are still researchers who have yet to employ it as an experimental method and need some basic questions answered before they venture into new territory. This article should serve them well. A key issue of interest at the symposium was how fMRI could be used to elucidate cerebral mechanisms responsible for new learning. The next 4 articles address this directly, with the first (Little and Thulborn) an overview of data from fMRI studies of category-learning, and the second from the same laboratory (Little, Shin, Siscol, and Thulborn) an empirical investigation of changes in brain activity occurring across different stages of learning. While a role for medial temporal lobe (MTL) structures in episodic memory encoding has been acknowledged for some time, the different experimental tasks and stimuli employed across neuroimaging studies have not surprisingly produced conflicting data in terms of the precise subregion(s) involved. The next paper (Parsons, Haut, Lemieux, Moran, and Leach) addresses this by examining effects of stimulus modality during verbal memory encoding. Typically, BOLD fMRI studies of learning are conducted over short time scales, however, the fourth paper in this series (Olson, Rao, Moore, Wang, Detre, and Aguirre) describes an empirical investigation of learning occurring over a longer than usual period, achieving this by employing a relatively novel technique called perfusion fMRI. This technique shows considerable promise for future studies. The final article in this special issue (de Zubicaray) represents a departure from the more familiar cognitive neuroscience applications of fMRI, instead describing how neuroimaging studies might be conducted to both inform and constrain information processing models of cognition.
Resumo:
This paper reports on the outcomes of the first stage of a longitudinal study that focused on the transformational change process being undertaken within the Supply Chain and Operations Area of a major Australian food manufacturing company. Organizational learning is an essential prerequisite for any successful change process and an organization's ability to learn is dependent on the existence of an environment within the organization that nurtures learning and the presence of key enablers that facilitate the learning process. An organization's capacity to learn can be enhanced through its ability to form and sustain collaborative relationships with its chain partners. The results show that an environment that supports organizational learning is being developed through consultative leadership and the empowerment of individuals within a culture that supports innovation and cross-functional teamwork but demands responsibility and accountability. The impact of these changes within the Supply Chain and Operations Area is evident in the significant improvement in the Area's productivity and efficiency levels over the past twelve months. The company's endeavours to engage its major supply chain partners in the learning process have been limited by the turmoil within the company. However the company has involved its supply chain partners in a series of mutually beneficial projects that have improved communication and built trust thereby laying the foundations for more collaborative chain relationships.