984 resultados para Reinforcement-Learning
Recent modeling of spike-timing-dependent plasticity indicates that plasticity involves as a third factor a local dendritic potential, besides pre- and postsynaptic firing times. We present a simple compartmental neuron model together with a non-Hebbian, biologically plausible learning rule for dendritic synapses where plasticity is modulated by these three factors. In functional terms, the rule seeks to minimize discrepancies between somatic firings and a local dendritic potential. Such prediction errors can arise in our model from stochastic fluctuations as well as from synaptic input, which directly targets the soma. Depending on the nature of this direct input, our plasticity rule subserves supervised or unsupervised learning. When a reward signal modulates the learning rate, reinforcement learning results. Hence a single plasticity rule supports diverse learning paradigms.
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.
The integration of distributed and ubiquitous intelligence has emerged over the last years as the mainspring of transformative advancements in mobile radio networks. As we approach the era of “mobile for intelligence”, next-generation wireless networks are poised to undergo significant and profound changes. Notably, the overarching challenge that lies ahead is the development and implementation of integrated communication and learning mechanisms that will enable the realization of autonomous mobile radio networks. The ultimate pursuit of eliminating human-in-the-loop constitutes an ambitious challenge, necessitating a meticulous delineation of the fundamental characteristics that artificial intelligence (AI) should possess to effectively achieve this objective. This challenge represents a paradigm shift in the design, deployment, and operation of wireless networks, where conventional, static configurations give way to dynamic, adaptive, and AI-native systems capable of self-optimization, self-sustainment, and learning. This thesis aims to provide a comprehensive exploration of the fundamental principles and practical approaches required to create autonomous mobile radio networks that seamlessly integrate communication and learning components. The first chapter of this thesis introduces the notion of Predictive Quality of Service (PQoS) and adaptive optimization and expands upon the challenge to achieve adaptable, reliable, and robust network performance in dynamic and ever-changing environments. The subsequent chapter delves into the revolutionary role of generative AI in shaping next-generation autonomous networks. This chapter emphasizes achieving trustworthy uncertainty-aware generation processes with the use of approximate Bayesian methods and aims to show how generative AI can improve generalization while reducing data communication costs. Finally, the thesis embarks on the topic of distributed learning over wireless networks. Distributed learning and its declinations, including multi-agent reinforcement learning systems and federated learning, have the potential to meet the scalability demands of modern data-driven applications, enabling efficient and collaborative model training across dynamic scenarios while ensuring data privacy and reducing communication overhead.
In recent times, a significant research effort has been focused on how deformable linear objects (DLOs) can be manipulated for real world applications such as assembly of wiring harnesses for the automotive and aerospace sector. This represents an open topic because of the difficulties in modelling accurately the behaviour of these objects and simulate a task involving their manipulation, considering a variety of different scenarios. These problems have led to the development of data-driven techniques in which machine learning techniques are exploited to obtain reliable solutions. However, this approach makes the solution difficult to be extended, since the learning must be replicated almost from scratch as the scenario changes. It follows that some model-based methodology must be introduced to generalize the results and reduce the training effort accordingly. The objective of this thesis is to develop a solution for the DLOs manipulation to assemble a wiring harness for the automotive sector based on adaptation of a base trajectory set by means of reinforcement learning methods. The idea is to create a trajectory planning software capable of solving the proposed task, reducing where possible the learning time, which is done in real time, but at the same time presenting suitable performance and reliability. The solution has been implemented on a collaborative 7-DOFs Panda robot at the Laboratory of Automation and Robotics of the University of Bologna. Experimental results are reported showing how the robot is capable of optimizing the manipulation of the DLOs gaining experience along the task repetition, but showing at the same time a high success rate from the very beginning of the learning phase.
De entre todos os paradigmas de aprendizagem actualmente identificados, a Aprendizagem por Reforço revela-se de especial interesse e aplicabilidade nos inúmeros processos que nos rodeiam: desde a solitária sonda que explora o planeta mais remoto, passando pelo programa especialista que aprende a apoiar a decisão médica pela experiencia adquirida, até ao cão de brincar que faz as delícias da criança interagindo com ela e adaptando-se aos seus gostos, e todo um novo mundo que nos rodeia e apela crescentemente a que façamos mais e melhor nesta área. Desde o aparecimento do conceito de aprendizagem por reforço, diferentes métodos tem sido propostos para a sua concretização, cada um deles abordando aspectos específicos. Duas vertentes distintas, mas complementares entre si, apresentam-se como características chave do processo de aprendizagem por reforço: a obtenção de experiência através da exploração do espaço de estados e o aproveitamento do conhecimento obtido através dessa mesma experiência. Esta dissertação propõe-se seleccionar alguns dos métodos propostos mais promissores de ambas as vertentes de exploração e aproveitamento, efectuar uma implementação de cada um destes sobre uma plataforma modular que permita a simulação do uso de agentes inteligentes e, através da sua aplicação na resolução de diferentes configurações de ambientes padrão, gerar estatísticas funcionais que permitam inferir conclusões que retractem entre outros aspectos a sua eficiência e eficácia comparativas em condições específicas.
Electricity markets are complex environments with very particular characteristics. MASCEM is a market simulator developed to allow deep studies of the interactions between the players that take part in the electricity market negotiations. This paper presents a new proposal for the definition of MASCEM players’ strategies to negotiate in the market. The proposed methodology is multiagent based, using reinforcement learning algorithms to provide players with the capabilities to perceive the changes in the environment, while adapting their bids formulation according to their needs, using a set of different techniques that are at their disposal.
Electricity markets are complex environments, involving numerous entities trying to obtain the best advantages and profits while limited by power-network characteristics and constraints.1 The restructuring and consequent deregulation of electricity markets introduced a new economic dimension to the power industry. Some observers have criticized the restructuring process, however, because it has failed to improve market efficiency and has complicated the assurance of reliability and fairness of operations. To study and understand this type of market, we developed the Multiagent Simulator of Competitive Electricity Markets (MASCEM) platform based on multiagent simulation. The MASCEM multiagent model includes players with strategies for bid definition, acting in forward, day-ahead, and balancing markets and considering both simple and complex bids. Our goal with MASCEM was to simulate as many market models and player types as possible. This approach makes MASCEM both a short- and mediumterm simulation as well as a tool to support long-term decisions, such as those taken by regulators. This article proposes a new methodology integrated in MASCEM for bid definition in electricity markets. This methodology uses reinforcement learning algorithms to let players perceive changes in the environment, thus helping them react to the dynamic environment and adapt their bids accordingly.
Electricity markets are complex environments with very particular characteristics. MASCEM is a market simulator developed to allow deep studies of the interactions between the players that take part in the electricity market negotiations. This paper presents a new proposal for the definition of MASCEM players’ strategies to negotiate in the market. The proposed methodology is multiagent based, using reinforcement learning algorithms to provide players with the capabilities to perceive the changes in the environment, while adapting their bids formulation according to their needs, using a set of different techniques that are at their disposal. Each agent has the knowledge about a different method for defining a strategy for playing in the market, the main agent chooses the best among all those, and provides it to the market player that requests, to be used in the market. This paper also presents a methodology to manage the efficiency/effectiveness balance of this method, to guarantee that the degradation of the simulator processing times takes the correct measure.
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Redes de Comunicação e Multimédia
A otimização nos sistemas de suporte à decisão atuais assume um carácter fortemente interdisciplinar relacionando-se com a necessidade de integração de diferentes técnicas e paradigmas na resolução de problemas reais complexos, sendo que a computação de soluções ótimas em muitos destes problemas é intratável. Os métodos de pesquisa heurística são conhecidos por permitir obter bons resultados num intervalo temporal aceitável. Muitas vezes, necessitam que a parametrização seja ajustada de forma a permitir obter bons resultados. Neste sentido, as estratégias de aprendizagem podem incrementar o desempenho de um sistema, dotando-o com a capacidade de aprendizagem, por exemplo, qual a técnica de otimização mais adequada para a resolução de uma classe particular de problemas, ou qual a parametrização mais adequada de um dado algoritmo num determinado cenário. Alguns dos métodos de otimização mais usados para a resolução de problemas do mundo real resultaram da adaptação de ideias de várias áreas de investigação, principalmente com inspiração na natureza - Meta-heurísticas. O processo de seleção de uma Meta-heurística para a resolução de um dado problema é em si um problema de otimização. As Híper-heurísticas surgem neste contexto como metodologias eficientes para selecionar ou gerar heurísticas (ou Meta-heurísticas) na resolução de problemas de otimização NP-difícil. Nesta dissertação pretende-se dar uma contribuição para o problema de seleção de Metaheurísticas respetiva parametrização. Neste sentido é descrita a especificação de uma Híperheurística para a seleção de técnicas baseadas na natureza, na resolução do problema de escalonamento de tarefas em sistemas de fabrico, com base em experiência anterior. O módulo de Híper-heurística desenvolvido utiliza um algoritmo de aprendizagem por reforço (QLearning), que permite dotar o sistema da capacidade de seleção automática da Metaheurística a usar no processo de otimização, assim como a respetiva parametrização. Finalmente, procede-se à realização de testes computacionais para avaliar a influência da Híper- Heurística no desempenho do sistema de escalonamento AutoDynAgents. Como conclusão genérica, é possível afirmar que, dos resultados obtidos é possível concluir existir vantagem significativa no desempenho do sistema quando introduzida a Híper-heurística baseada em QLearning.
O processo ensino-aprendizagem das ciências naturais é desafiado para se adaptar às necessidades e estilo de vida atual dos alunos. Este desafio, para os docentes, é a inovação das estratégias pedagógicas orientadas para a mudança conceptual. As escolas podem estar equipadas com vários recursos tecnológicos, mas a sua integração no processo ensino-aprendizagem, como práticas inovadoras e promotoras de uma utilização efetiva por parte dos discentes, é ainda limitada. Cabe ao professor do século XXI incrementar a utilização das Tecnologias de Inovação e Comunicação nas suas práticas letivas permitindo aos alunos, que temos na sala de aula, e que nasceram na era digital, uma maior motivação e potenciação das suas aprendizagens. O conteúdo “O Sistema Nervoso”, só abordado no 9º ano de escolaridade, temática incluída no tema organizador “Viver Melhor na Terra”, envolve termos científicos complexos, sendo sempre um grande desafio para os docentes a sua abordagem em contexto educativo. Neste contexto enunciou-se o seguinte problema de estudo: o podcast aplicado como auxiliar de aprendizagem do segmento de conteúdo “O Sistema Nervoso”, levará a uma melhoria dos resultados escolares dos alunos do 9º ano? Para este estudo utilizou-se uma metodologia mista, envolvendo 19 alunos, permitindo a utilização de vários métodos e diferentes formas de obter dados e os analisar. É um estudo de caso e os resultados mostram o impacto da utilização de um podcast educacional como auxiliar de aprendizagem de um conteúdo, “O Sistema Nervoso”. O ensino-aprendizagem deixou de estar confinado à sala de aula, podendo ocorrer em qualquer lugar e a todo o momento, de acordo com a escolha dos alunos e respeitando o ritmo de aprendizagem de cada um. A audição do podcast revelou-se um instrumento facilitador do trabalho autónomo, já que é um reforço pedagógico e/ou estratégia de ensino diferenciado, funcionando como um auxiliar esclarecedor, através do qual os alunos clarificam as suas dúvidas, o que não conseguiriam fazer sozinhos. Permite, ainda, recriar o ambiente de sala de aula na explicação dos conteúdos.
Performance monitoring, ERN, CRN, Pe, Memory, Llist learning, Emotion, IAPS, N2, Reinforcement Learning Hypothesis, Conflict Monitoring Hypothesis
Agent-based computational economics is becoming widely used in practice. This paperexplores the consistency of some of its standard techniques. We focus in particular on prevailingwholesale electricity trading simulation methods. We include different supply and demandrepresentations and propose the Experience-Weighted Attractions method to include severalbehavioural algorithms. We compare the results across assumptions and to economic theorypredictions. The match is good under best-response and reinforcement learning but not underfictitious play. The simulations perform well under flat and upward-slopping supply bidding,and also for plausible demand elasticity assumptions. Learning is influenced by the number ofbids per plant and the initial conditions. The overall conclusion is that agent-based simulationassumptions are far from innocuous. We link their performance to underlying features, andidentify those that are better suited to model wholesale electricity markets.
Learning what to approach, and what to avoid, involves assigning value to environmental cues that predict positive and negative events. Studies in animals indicate that the lateral habenula encodes the previously learned negative motivational value of stimuli. However, involvement of the habenula in dynamic trial-by-trial aversive learning has not been assessed, and the functional role of this structure in humans remains poorly characterized, in part, due to its small size. Using high-resolution functional neuroimaging and computational modeling of reinforcement learning, we demonstrate positive habenula responses to the dynamically changing values of cues signaling painful electric shocks, which predict behavioral suppression of responses to those cues across individuals. By contrast, negative habenula responses to monetary reward cue values predict behavioral invigoration. Our findings show that the habenula plays a key role in an online aversive learning system and in generating associated motivated behavior in humans.
An assortment of human behaviors is thought to be driven by rewards including reinforcement learning, novelty processing, learning, decision making, economic choice, incentive motivation, and addiction. In each case the ventral tegmental area/ventral striatum (nucleus accumbens) (VTAVS) system has been implicated as a key structure by functional imaging studies, mostly on the basis of standard, univariate analyses. Here we propose that standard functional magnetic resonance imaging analysis needs to be complemented by methods that take into account the differential connectivity of the VTAVS system in the different behavioral contexts in order to describe reward based processes more appropriately. We fi rst consider the wider network for reward processing as it emerged from animal experimentation. Subsequently, an example for a method to assess functional connectivity is given. Finally, we illustrate the usefulness of such analyses by examples regarding reward valuation, reward expectation and the role of reward in addiction.