984 resultados para Reinforcement-Learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spiking Neural Networks (SNNs) are bio-inspired Artificial Neural Networks (ANNs) utilizing discrete spiking signals, akin to neuron communication in the brain, making them ideal for real-time and energy-efficient Cyber-Physical Systems (CPSs). This thesis explores their potential in Structural Health Monitoring (SHM), leveraging low-cost MEMS accelerometers for early damage detection in motorway bridges. The study focuses on Long Short-Term SNNs (LSNNs), although their complex learning processes pose challenges. Comparing LSNNs with other ANN models and training algorithms for SHM, findings indicate LSNNs' effectiveness in damage identification, comparable to ANNs trained using traditional methods. Additionally, an optimized embedded LSNN implementation demonstrates a 54% reduction in execution time, but with longer pre-processing due to spike-based encoding. Furthermore, SNNs are applied in UAV obstacle avoidance, trained directly using a Reinforcement Learning (RL) algorithm with event-based input from a Dynamic Vision Sensor (DVS). Performance evaluation against Convolutional Neural Networks (CNNs) highlights SNNs' superior energy efficiency, showing a 6x decrease in energy consumption. The study also investigates embedded SNN implementations' latency and throughput in real-world deployments, emphasizing their potential for energy-efficient monitoring systems. This research contributes to advancing SHM and UAV obstacle avoidance through SNNs' efficient information processing and decision-making capabilities within CPS domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nella letteratura economica e di teoria dei giochi vi è un dibattito aperto sulla possibilità di emergenza di comportamenti anticompetitivi da parte di algoritmi di determinazione automatica dei prezzi di mercato. L'obiettivo di questa tesi è sviluppare un modello di reinforcement learning di tipo actor-critic con entropy regularization per impostare i prezzi in un gioco dinamico di competizione oligopolistica con prezzi continui. Il modello che propongo esibisce in modo coerente comportamenti cooperativi supportati da meccanismi di punizione che scoraggiano la deviazione dall'equilibrio raggiunto a convergenza. Il comportamento di questo modello durante l'apprendimento e a convergenza avvenuta aiuta inoltre a interpretare le azioni compiute da Q-learning tabellare e altri algoritmi di prezzo in condizioni simili. I risultati sono robusti alla variazione del numero di agenti in competizione e al tipo di deviazione dall'equilibrio ottenuto a convergenza, punendo anche deviazioni a prezzi più alti.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is focused on the design of a flexible, dynamic and innovative telecommunication's system for future 6G applications on vehicular communications. The system is based on the development of drones acting as mobile base stations in an urban scenario to cope with the increasing traffic demand and avoid network's congestion conditions. In particular, the exploitation of Reinforcement Learning algorithms is used to let the drone learn autonomously how to behave in a scenario full of obstacles with the goal of tracking and serve the maximum number of moving vehicles, by at the same time, minimizing the energy consumed to perform its tasks. This project is an extraordinary opportunity to open the doors to a new way of applying and develop telecommunications in an urban scenario by mixing it to the rising world of the Artificial Intelligence.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Classical and operant conditioning principles, such as the behavioral discrepancy-derived assumption that reinforcement always selects antecedent stimulus and response relations, have been studied at the neural level, mainly by observing the strengthening of neuronal responses or synaptic connections. A review of the literature on the neural basis of behavior provided extensive scientific data that indicate a synthesis between the two conditioning processes based mainly on stimulus control in learning tasks. The resulting analysis revealed the following aspects. Dopamine acts as a behavioral discrepancy signal in the midbrain pathway of positive reinforcement, leading toward the nucleus accumbens. Dopamine modulates both types of conditioning in the Aplysia mollusk and in mammals. In vivo and in vitro mollusk preparations show convergence of both types of conditioning in the same motor neuron. Frontal cortical neurons are involved in behavioral discrimination in reversal and extinction procedures, and these neurons preferentially deliver glutamate through conditioned stimulus or discriminative stimulus pathways. Discriminative neural responses can reliably precede operant movements and can also be common to stimuli that share complex symbolic relations. The present article discusses convergent and divergent points between conditioning paradigms at the neural level of analysis to advance our knowledge on reinforcement.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, Heuristically Accelerated Q-learning (HAQL), which incorporates heuristics for action selection to the Q-Learning algorithm. Experimental results on robot navigation show that the use of even very simple heuristic functions results in significant performance enhancement of the learning rate.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Electricity markets are complex environments, involving a large number of different entities, playing in a dynamic scene to obtain the best advantages and profits. MASCEM is a multi-agent electricity market simulator to model market players and simulate their operation in the market. Market players are entities with specific characteristics and objectives, making their decisions and interacting with other players. MASCEM is integrated with ALBidS, a system that provides several dynamic strategies for agents’ behavior. This paper presents a method that aims at enhancing ALBidS competence in endowing market players with adequate strategic bidding capabilities, allowing them to obtain the higher possible gains out of the market. This method uses a reinforcement learning algorithm to learn from experience how to choose the best from a set of possible actions. These actions are defined accordingly to the most probable points of bidding success. With the purpose of accelerating the convergence process, a simulated annealing based algorithm is included.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The very particular characteristics of electricity markets, require deep studies of the interactions between the involved players. MASCEM is a market simulator developed to allow studying electricity market negotiations. This paper presents a new proposal for the definition of MASCEM players’ strategies to negotiate in the market. The proposed methodology is implemented as a multiagent system, using reinforcement learning algorithms to provide players with the capabilities to perceive the changes in the environment, while adapting their bids formulation according to their needs, using a set of different techniques that are at their disposal. This paper also presents a methodology to define players’ models based on the historic of their past actions, interpreting how their choices are affected by past experience, and competition.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Electricity markets are complex environments with very particular characteristics. A critical issue regarding these specific characteristics concerns the constant changes they are subject to. This is a result of the electricity markets’ restructuring, which was performed so that the competitiveness could be increased, but it also had exponential implications in the increase of the complexity and unpredictability in those markets scope. The constant growth in markets unpredictability resulted in an amplified need for market intervenient entities in foreseeing market behaviour. The need for understanding the market mechanisms and how the involved players’ interaction affects the outcomes of the markets, contributed to the growth of usage of simulation tools. Multi-agent based software is particularly well fitted to analyze dynamic and adaptive systems with complex interactions among its constituents, such as electricity markets. This dissertation presents ALBidS – Adaptive Learning strategic Bidding System, a multiagent system created to provide decision support to market negotiating players. This system is integrated with the MASCEM electricity market simulator, so that its advantage in supporting a market player can be tested using cases based on real markets’ data. ALBidS considers several different methodologies based on very distinct approaches, to provide alternative suggestions of which are the best actions for the supported player to perform. The approach chosen as the players’ actual action is selected by the employment of reinforcement learning algorithms, which for each different situation, simulation circumstances and context, decides which proposed action is the one with higher possibility of achieving the most success. Some of the considered approaches are supported by a mechanism that creates profiles of competitor players. These profiles are built accordingly to their observed past actions and reactions when faced with specific situations, such as success and failure. The system’s context awareness and simulation circumstances analysis, both in terms of results performance and execution time adaptation, are complementary mechanisms, which endow ALBidS with further adaptation and learning capabilities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Electricity markets are complex environments, involving a large number of different entities, playing in a dynamic scene to obtain the best advantages and profits. MASCEM (Multi-Agent System for Competitive Electricity Markets) is a multi-agent electricity market simulator that models market players and simulates their operation in the market. Market players are entities with specific characteristics and objectives, making their decisions and interacting with other players. This paper presents a methodology to provide decision support to electricity market negotiating players. This model allows integrating different strategic approaches for electricity market negotiations, and choosing the most appropriate one at each time, for each different negotiation context. This methodology is integrated in ALBidS (Adaptive Learning strategic Bidding System) – a multiagent system that provides decision support to MASCEM's negotiating agents so that they can properly achieve their goals. ALBidS uses artificial intelligence methodologies and data analysis algorithms to provide effective adaptive learning capabilities to such negotiating entities. The main contribution is provided by a methodology that combines several distinct strategies to build actions proposals, so that the best can be chosen at each time, depending on the context and simulation circumstances. The choosing process includes reinforcement learning algorithms, a mechanism for negotiating contexts analysis, a mechanism for the management of the efficiency/effectiveness balance of the system, and a mechanism for competitor players' profiles definition.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Artificial Intelligence has been applied to dynamic games for many years. The ultimate goal is creating responses in virtual entities that display human-like reasoning in the definition of their behaviors. However, virtual entities that can be mistaken for real persons are yet very far from being fully achieved. This paper presents an adaptive learning based methodology for the definition of players’ profiles, with the purpose of supporting decisions of virtual entities. The proposed methodology is based on reinforcement learning algorithms, which are responsible for choosing, along the time, with the gathering of experience, the most appropriate from a set of different learning approaches. These learning approaches have very distinct natures, from mathematical to artificial intelligence and data analysis methodologies, so that the methodology is prepared for very distinct situations. This way it is equipped with a variety of tools that individually can be useful for each encountered situation. The proposed methodology is tested firstly on two simpler computer versus human player games: the rock-paper-scissors game, and a penalty-shootout simulation. Finally, the methodology is applied to the definition of action profiles of electricity market players; players that compete in a dynamic game-wise environment, in which the main goal is the achievement of the highest possible profits in the market.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q_learning algorithm (SONQL). The algorithm uses the classic Q_learning technique with two modifications. First, a neural network (NN) approximates the Q_function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Utilizing the well-known Ultimatum Game, this note presents the following phenomenon. If we start with simple stimulus-response agents, learning through naive reinforcement, and then grant them some introspective capabilities, we get outcomes that are not closer but farther away from the fully introspective game-theoretic approach. The cause of this is the following: there is an asymmetry in the information that agents can deduce from their experience, and this leads to a bias in their learning process.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Utilizing the well-known Ultimatum Game, this note presents the following phenomenon. If we start with simple stimulus-response agents,learning through naive reinforcement, and then grant them some introspective capabilities, we get outcomes that are not closer but farther away from the fully introspective game-theoretic approach. The cause of this is the following: there is an asymmetry in the information that agents can deduce from their experience, and this leads to a bias in their learning process.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

When individuals learn by trial-and-error, they perform randomly chosen actions and then reinforce those actions that led to a high payoff. However, individuals do not always have to physically perform an action in order to evaluate its consequences. Rather, they may be able to mentally simulate actions and their consequences without actually performing them. Such fictitious learners can select actions with high payoffs without making long chains of trial-and-error learning. Here, we analyze the evolution of an n-dimensional cultural trait (or artifact) by learning, in a payoff landscape with a single optimum. We derive the stochastic learning dynamics of the distance to the optimum in trait space when choice between alternative artifacts follows the standard logit choice rule. We show that for both trial-and-error and fictitious learners, the learning dynamics stabilize at an approximate distance of root n/(2 lambda(e)) away from the optimum, where lambda(e) is an effective learning performance parameter depending on the learning rule under scrutiny. Individual learners are thus unlikely to reach the optimum when traits are complex (n large), and so face a barrier to further improvement of the artifact. We show, however, that this barrier can be significantly reduced in a large population of learners performing payoff-biased social learning, in which case lambda(e) becomes proportional to population size. Overall, our results illustrate the effects of errors in learning, levels of cognition, and population size for the evolution of complex cultural traits. (C) 2013 Elsevier Inc. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In order to understand the development of non-genetically encoded actions during an animal's lifespan, it is necessary to analyze the dynamics and evolution of learning rules producing behavior. Owing to the intrinsic stochastic and frequency-dependent nature of learning dynamics, these rules are often studied in evolutionary biology via agent-based computer simulations. In this paper, we show that stochastic approximation theory can help to qualitatively understand learning dynamics and formulate analytical models for the evolution of learning rules. We consider a population of individuals repeatedly interacting during their lifespan, and where the stage game faced by the individuals fluctuates according to an environmental stochastic process. Individuals adjust their behavioral actions according to learning rules belonging to the class of experience-weighted attraction learning mechanisms, which includes standard reinforcement and Bayesian learning as special cases. We use stochastic approximation theory in order to derive differential equations governing action play probabilities, which turn out to have qualitative features of mutator-selection equations. We then perform agent-based simulations to find the conditions where the deterministic approximation is closest to the original stochastic learning process for standard 2-action 2-player fluctuating games, where interaction between learning rules and preference reversal may occur. Finally, we analyze a simplified model for the evolution of learning in a producer-scrounger game, which shows that the exploration rate can interact in a non-intuitive way with other features of co-evolving learning rules. Overall, our analyses illustrate the usefulness of applying stochastic approximation theory in the study of animal learning.