Biblioteca Digital

984 resultados para Reinforcement-Learning

Estratégias baseadas em aprendizado para coordenação de uma frota de robôs em tarefas cooperativas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In multi-robot systems, both control architecture and work strategy represent a challenge for researchers. It is important to have a robust architecture that can be easily adapted to requirement changes. It is also important that work strategy allows robots to complete tasks efficiently, considering that robots interact directly in environments with humans. In this context, this work explores two approaches for robot soccer team coordination for cooperative tasks development. Both approaches are based on a combination of imitation learning and reinforcement learning. Thus, in the first approach was developed a control architecture, a fuzzy inference engine for recognizing situations in robot soccer games, a software for narration of robot soccer games based on the inference engine and the implementation of learning by imitation from observation and analysis of others robotic teams. Moreover, state abstraction was efficiently implemented in reinforcement learning applied to the robot soccer standard problem. Finally, reinforcement learning was implemented in a form where actions are explored only in some states (for example, states where an specialist robot system used them) differently to the traditional form, where actions have to be tested in all states. In the second approach reinforcement learning was implemented with function approximation, for which an algorithm called RBF-Sarsa($lambda$) was created. In both approaches batch reinforcement learning algorithms were implemented and imitation learning was used as a seed for reinforcement learning. Moreover, learning from robotic teams controlled by humans was explored. The proposal in this work had revealed efficient in the robot soccer standard problem and, when implemented in other robotics systems, they will allow that these robotics systems can efficiently and effectively develop assigned tasks. These approaches will give high adaptation capabilities to requirements and environment changes.

Classificação de padrões através de um comitê de máquinas aprimorado por aprendizagem por reforço

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification

Roteamento em Redes de Sensores Sem Fios Com Base Em Aprendizagem Por Reforço

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of wireless sensor and actuator networks in industry has been increasing past few years, bringing multiple benefits compared to wired systems, like network flexibility and manageability. Such networks consists of a possibly large number of small and autonomous sensor and actuator devices with wireless communication capabilities. The data collected by sensors are sent directly or through intermediary nodes along the network to a base station called sink node. The data routing in this environment is an essential matter since it is strictly bounded to the energy efficiency, thus the network lifetime. This work investigates the application of a routing technique based on Reinforcement Learning s Q-Learning algorithm to a wireless sensor network by using an NS-2 simulated environment. Several metrics like energy consumption, data packet delivery rates and delays are used to validate de proposal comparing it with another solutions existing in the literature

Estado do conhecimento: recuperação da aprendizagem e do reforço escolar na rede estadual paulista (1999 a 2009)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Lernen mit einer Hybride aus Lernendem Klassifizierendem System und Selbstorganisierender Karte

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Im Forschungsgebiet der Künstlichen Intelligenz, insbesondere im Bereich des maschinellen Lernens, hat sich eine ganze Reihe von Verfahren etabliert, die von biologischen Vorbildern inspiriert sind. Die prominentesten Vertreter derartiger Verfahren sind zum einen Evolutionäre Algorithmen, zum anderen Künstliche Neuronale Netze. Die vorliegende Arbeit befasst sich mit der Entwicklung eines Systems zum maschinellen Lernen, das Charakteristika beider Paradigmen in sich vereint: Das Hybride Lernende Klassifizierende System (HCS) wird basierend auf dem reellwertig kodierten eXtended Learning Classifier System (XCS), das als Lernmechanismus einen Genetischen Algorithmus enthält, und dem Wachsenden Neuralen Gas (GNG) entwickelt. Wie das XCS evolviert auch das HCS mit Hilfe eines Genetischen Algorithmus eine Population von Klassifizierern - das sind Regeln der Form [WENN Bedingung DANN Aktion], wobei die Bedingung angibt, in welchem Bereich des Zustandsraumes eines Lernproblems ein Klassifizierer anwendbar ist. Beim XCS spezifiziert die Bedingung in der Regel einen achsenparallelen Hyperquader, was oftmals keine angemessene Unterteilung des Zustandsraumes erlaubt. Beim HCS hingegen werden die Bedingungen der Klassifizierer durch Gewichtsvektoren beschrieben, wie die Neuronen des GNG sie besitzen. Jeder Klassifizierer ist anwendbar in seiner Zelle der durch die Population des HCS induzierten Voronoizerlegung des Zustandsraumes, dieser kann also flexibler unterteilt werden als beim XCS. Die Verwendung von Gewichtsvektoren ermöglicht ferner, einen vom Neuronenadaptationsverfahren des GNG abgeleiteten Mechanismus als zweites Lernverfahren neben dem Genetischen Algorithmus einzusetzen. Während das Lernen beim XCS rein evolutionär erfolgt, also nur durch Erzeugen neuer Klassifizierer, ermöglicht dies dem HCS, bereits vorhandene Klassifizierer anzupassen und zu verbessern. Zur Evaluation des HCS werden mit diesem verschiedene Lern-Experimente durchgeführt. Die Leistungsfähigkeit des Ansatzes wird in einer Reihe von Lernproblemen aus den Bereichen der Klassifikation, der Funktionsapproximation und des Lernens von Aktionen in einer interaktiven Lernumgebung unter Beweis gestellt.

An Actor-Critic based controller for glucose regulation in type 1 diabetes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel adaptive approach for glucose control in individuals with type 1 diabetes under sensor-augmented pump therapy is proposed. The controller, is based on Actor-Critic (AC) learning and is inspired by the principles of reinforcement learning and optimal control theory. The main characteristics of the proposed controller are (i) simultaneous adjustment of both the insulin basal rate and the bolus dose, (ii) initialization based on clinical procedures, and (iii) real-time personalization. The effectiveness of the proposed algorithm in terms of glycemic control has been investigated in silico in adults, adolescents and children under open-loop and closed-loop approaches, using announced meals with uncertainties in the order of ±25% in the estimation of carbohydrates. The results show that glucose regulation is efficient in all three groups of patients, even with uncertainties in the level of carbohydrates in the meal. The percentages in the A+B zones of the Control Variability Grid Analysis (CVGA) were 100% for adults, and 93% for both adolescents and children. The AC based controller seems to be a promising approach for the automatic adjustment of insulin infusion in order to improve glycemic control. After optimization of the algorithm, the controller will be tested in a clinical trial.

Adaptive Algorithms for Personalized Diabetes Treatment

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dynamic systems, especially in real-life applications, are often determined by inter-/intra-variability, uncertainties and time-varying components. Physiological systems are probably the most representative example in which population variability, vital signal measurement noise and uncertain dynamics render their explicit representation and optimization a rather difficult task. Systems characterized by such challenges often require the use of adaptive algorithmic solutions able to perform an iterative structural and/or parametrical update process towards optimized behavior. Adaptive optimization presents the advantages of (i) individualization through learning of basic system characteristics, (ii) ability to follow time-varying dynamics and (iii) low computational cost. In this chapter, the use of online adaptive algorithms is investigated in two basic research areas related to diabetes management: (i) real-time glucose regulation and (ii) real-time prediction of hypo-/hyperglycemia. The applicability of these methods is illustrated through the design and development of an adaptive glucose control algorithm based on reinforcement learning and optimal control and an adaptive, personalized early-warning system for the recognition and alarm generation against hypo- and hyperglycemic events.

Evolving and coevolving computer go players using neuroevolution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Go game is ancient very complex game with simple rules which still is a challenge for the AI.This work cover some neuroevolution techniques used in reinforcement learning applied to the GO game as SANE (Symbiotic Adaptive Neuro-Evolution) and presents a variation to this method with the intention of evolving better strategies in the game. The computer Go player based in SANE is evolved againts a knowed player which creates some problem as determinism for which is proposed the co-evolution. Finally, it is introduced an algorithm to co-evolve two populations of neurons to evolve better computer Go players.

Arquitectura de Control para la Conducción Autónoma de Vehículos en Entornos Urbanos y Autovías

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hoy en día, el desarrollo tecnológico en el campo de los sistemas inteligentes de transporte (ITS por sus siglas en inglés) ha permitido dotar a los vehículos con diversos sistemas de ayuda a la conducción (ADAS, del inglés advanced driver assistance system), mejorando la experiencia y seguridad de los pasajeros, en especial del conductor. La mayor parte de estos sistemas están pensados para advertir al conductor sobre ciertas situaciones de riesgo, como la salida involuntaria del carril o la proximidad de obstáculos en el camino. No obstante, también podemos encontrar sistemas que van un paso más allá y son capaces de cooperar con el conductor en el control del vehículo o incluso relegarlos de algunas tareas tediosas. Es en este último grupo donde se encuentran los sistemas de control electrónico de estabilidad (ESP - Electronic Stability Program), el antibloqueo de frenos (ABS - Anti-lock Braking System), el control de crucero (CC - Cruise Control) y los más recientes sistemas de aparcamiento asistido. Continuando con esta línea de desarrollo, el paso siguiente consiste en la supresión del conductor humano, desarrollando sistemas que sean capaces de conducir un vehículo de forma autónoma y con un rendimiento superior al del conductor. En este trabajo se presenta, en primer lugar, una arquitectura de control para la automatización de vehículos. Esta se compone de distintos componentes de hardware y software, agrupados de acuerdo a su función principal. El diseño de la arquitectura parte del trabajo previo desarrollado por el Programa AUTOPIA, aunque introduce notables aportaciones en cuanto a la eficiencia, robustez y escalabilidad del sistema. Ahondando un poco más en detalle, debemos resaltar el desarrollo de un algoritmo de localización basado en enjambres de partículas. Este está planteado como un método de filtrado y fusión de la información obtenida a partir de los distintos sensores embarcados en el vehículo, entre los que encontramos un receptor GPS (Global Positioning System), unidades de medición inercial (IMU – Inertial Measurement Unit) e información tomada directamente de los sensores embarcados por el fabricante, como la velocidad de las ruedas y posición del volante. Gracias a este método se ha conseguido resolver el problema de la localización, indispensable para el desarrollo de sistemas de conducción autónoma. Continuando con el trabajo de investigación, se ha estudiado la viabilidad de la aplicación de técnicas de aprendizaje y adaptación al diseño de controladores para el vehículo. Como punto de partida se emplea el método de Q-learning para la generación de un controlador borroso lateral sin ningún tipo de conocimiento previo. Posteriormente se presenta un método de ajuste on-line para la adaptación del control longitudinal ante perturbaciones impredecibles del entorno, como lo son los cambios en la inclinación del camino, fricción de las ruedas o peso de los ocupantes. Para finalizar, se presentan los resultados obtenidos durante un experimento de conducción autónoma en carreteras reales, el cual se llevó a cabo en el mes de Junio de 2012 desde la población de San Lorenzo de El Escorial hasta las instalaciones del Centro de Automática y Robótica (CAR) en Arganda del Rey. El principal objetivo tras esta demostración fue validar el funcionamiento, robustez y capacidad de la arquitectura propuesta para afrontar el problema de la conducción autónoma, bajo condiciones mucho más reales a las que se pueden alcanzar en las instalaciones de prueba. ABSTRACT Nowadays, the technological advances in the Intelligent Transportation Systems (ITS) field have led the development of several driving assistance systems (ADAS). These solutions are designed to improve the experience and security of all the passengers, especially the driver. For most of these systems, the main goal is to warn drivers about unexpected circumstances leading to risk situations such as involuntary lane departure or proximity to other vehicles. However, other ADAS go a step further, being able to cooperate with the driver in the control of the vehicle, or even overriding it on some tasks. Examples of this kind of systems are the anti-lock braking system (ABS), cruise control (CC) and the recently commercialised assisted parking systems. Within this research line, the next step is the development of systems able to replace the human drivers, improving the control and therefore, the safety and reliability of the vehicles. First of all, this dissertation presents a control architecture design for autonomous driving. It is made up of several hardware and software components, grouped according to their main function. The design of this architecture is based on the previous works carried out by the AUTOPIA Program, although notable improvements have been made regarding the efficiency, robustness and scalability of the system. It is also remarkable the work made on the development of a location algorithm for vehicles. The proposal is based on the emulation of the behaviour of biological swarms and its performance is similar to the well-known particle filters. The developed method combines information obtained from different sensors, including GPS, inertial measurement unit (IMU), and data from the original vehicle’s sensors on-board. Through this filtering algorithm the localization problem is properly managed, which is critical for the development of autonomous driving systems. The work deals also with the fuzzy control tuning system, a very time consuming task when done manually. An analysis of learning and adaptation techniques for the development of different controllers has been made. First, the Q-learning –a reinforcement learning method– has been applied to the generation of a lateral fuzzy controller from scratch. Subsequently, the development of an adaptation method for longitudinal control is presented. With this proposal, a final cruise control controller is able to deal with unpredictable environment disturbances, such as road slope, wheel’s friction or even occupants’ weight. As a testbed for the system, an autonomous driving experiment on real roads is presented. This experiment was carried out on June 2012, driving from San Lorenzo de El Escorial up to the Center for Automation and Robotics (CAR) facilities in Arganda del Rey. The main goal of the demonstration was validating the performance, robustness and viability of the proposed architecture to deal with the problem of autonomous driving under more demanding conditions than those achieved on closed test tracks.

Cooperative off-policy prediction of markov decision processes in adaptive networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Auto-Emergencia de Comunicación Sintáctica en Entornos Estáticos y Dinámicos para Grupos de Robots mediante Evolución Gramatical y Aprendizaje por Refuerzo

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La idea de dotar a un grupo de robots o agentes artificiales de un lenguaje ha sido objeto de intenso estudio en las ultimas décadas. Como no podía ser de otra forma los primeros intentos se enfocaron hacia el estudio de la emergencia de vocabularios compartidos convencionalmente por el grupo de robots. Las ventajas que puede ofrecer un léxico común son evidentes, como también lo es que un lenguaje con una estructura más compleja, en la que se pudieran combinar palabras, sería todavía más beneficioso. Surgen así algunas propuestas enfocadas hacia la emergencia de un lenguaje consensuado que muestre una estructura sintáctica similar al lenguaje humano, entre las que se encuentra este trabajo. Tomar el lenguaje humano como modelo supone adoptar algunas de las hipótesis y teorías que disciplinas como la filosofía, la psicología o la lingüística entre otras se han encargado de proponer. Según estas aproximaciones teóricas el lenguaje presenta una doble dimension formal y funcional. En base a su dimensión formal parece claro que el lenguaje sigue unas reglas, por lo que el uso de una gramática se ha considerado esencial para su representación, pero también porque las gramáticas son un dispositivo muy sencillo y potente que permite generar fácilmente estructuras simbólicas. En cuanto a la dimension funcional se ha tenido en cuenta la teoría quizá más influyente de los últimos tiempos, que no es otra que la Teoría de los Actos del Habla. Esta teoría se basa en la idea de Wittgenstein por la que el significado reside en el uso del lenguaje, hasta el punto de que éste se entiende como una manera de actuar y de comportarse, en definitiva como una forma de vida. Teniendo presentes estas premisas en esta tesis se pretende experimentar con modelos computacionales que permitan a un grupo de robots alcanzar un lenguaje común de manera autónoma, simplemente mediante interacciones individuales entre los robots, en forma de juegos de lenguaje. Para ello se proponen tres modelos distintos de lenguaje: • Un modelo basado en gramáticas probabilísticas y aprendizaje por refuerzo en el que las interacciones y el uso del lenguaje son claves para su emergencia y que emplea una gramática generativa estática y diseñada de antemano. Este modelo se aplica a dos grupos distintos: uno formado exclusivamente por robots y otro que combina robots y un humano, de manera que en este segundo caso se plantea un aprendizaje supervisado por humanos. • Un modelo basado en evolución gramatical que permite estudiar no solo el consenso sintáctico, sino también cuestiones relativas a la génesis del lenguaje y que emplea una gramática universal a partir de la cual los robots pueden evolucionar por sí mismos la gramática más apropiada según la situación lingüística que traten en cada momento. • Un modelo basado en evolución gramatical y aprendizaje por refuerzo que toma aspectos de los anteriores y amplia las posibilidades de los robots al permitir desarrollar un lenguaje que se adapta a situaciones lingüísticas dinámicas que pueden cambiar en el tiempo y también posibilita la imposición de restricciones de orden muy frecuentes en las estructuras sintácticas complejas. Todos los modelos implican un planteamiento descentralizado y auto-organizado, de manera que ninguno de los robots es el dueño del lenguaje y todos deben cooperar y colaborar de forma coordinada para lograr el consenso sintáctico. En cada caso se plantean experimentos que tienen como objetivo validar los modelos propuestos, tanto en lo relativo al éxito en la emergencia del lenguaje como en lo relacionado con cuestiones paralelas de importancia, como la interacción hombre-máquina o la propia génesis del lenguaje. ABSTRACT The idea of giving a language to a group of robots or artificial agents has been the subject of intense study in recent decades. The first attempts have focused on the development and emergence of a conventionally shared vocabulary. The advantages that can provide a common vocabulary are evident and therefore a more complex language that combines words would be even more beneficial. Thus some proposals are put forward towards the emergence of a consensual language with a sintactical structure in similar terms to the human language. This work follows this trend. Taking the human language as a model means taking some of the assumptions and theories that disciplines such as philosophy, psychology or linguistics among others have provided. According to these theoretical positions language has a double formal and functional dimension. Based on its formal dimension it seems clear that language follows rules, so that the use of a grammar has been considered essential for representation, but also because grammars are a very simple and powerful device that easily generates these symbolic structures. As for the functional dimension perhaps the most influential theory of recent times, the Theory of Speech Acts has been taken into account. This theory is based on the Wittgenstein’s idea about that the meaning lies in the use of language, to the extent that it is understood as a way of acting and behaving. Having into account these issues this work implements some computational models in order to test if they allow a group of robots to reach in an autonomous way a shared language by means of individual interaction among them, that is by means of language games. Specifically, three different models of language for robots are proposed: • A reinforcement learning based model in which interactions and language use are key to its emergence. This model uses a static probabilistic generative grammar which is designed beforehand. The model is applied to two different groups: one formed exclusively by robots and other combining robots and a human. Therefore, in the second case the learning process is supervised by the human. • A model based on grammatical evolution that allows us to study not only the syntactic consensus, but also the very genesis of language. This model uses a universal grammar that allows robots to evolve for themselves the most appropriate grammar according to the current linguistic situation they deal with. • A model based on grammatical evolution and reinforcement learning that takes aspects of the previous models and increases their possibilities. This model allows robots to develop a language in order to adapt to dynamic language situations that can change over time and also allows the imposition of syntactical order restrictions which are very common in complex syntactic structures. All models involve a decentralized and self-organized approach so that none of the robots is the language’s owner and everyone must cooperate and work together in a coordinated manner to achieve syntactic consensus. In each case experiments are presented in order to validate the proposed models, both in terms of success about the emergence of language and it relates to the study of important parallel issues, such as human-computer interaction or the very genesis of language.

Speech technology in the year 2001.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces the session "Technology in the Year 2001" and is the first of four papers dealing with the future of human-machine communication by voice. In looking to the future it is important to recognize both the difficulties of technological forecasting and the frailties of the technology as it exists today--frailties that are manifestations of our limited scientific understanding of human cognition. The technology to realize truly advanced applications does not yet exist and cannot be supported by our presently incomplete science of speech. To achieve this long-term goal, the authors advocate a fundamental research program using a cybernetic approach substantially different from more conventional synthetic approaches. In a cybernetic approach, feedback control systems will allow a machine to adapt to a linguistically rich environment using reinforcement learning.

Sound source localisation through active audition

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel method for enabling a robot to determine the direction to a sound source through interacting with its environment. The method uses a new neural network, the Parameter-Less Self-Organizing Map algorithm, and reinforcement learning to achieve rapid and accurate response.

Neural networks for time-varying data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Bayesian invariant measurements of generalisation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

«
1
2
...
6
7
8
9
10
11
12
...
65
66
»