984 resultados para Reinforcement-Learning


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techniques that rely on reinforcement learning only. This is thought to be a result of intelligent behaviour selection on the part of the idiotypic robot. In this paper an attempt is made to imitate idiotypic dynamics by creating controllers that use reinforcement with a number of different probabilistic schemes to select robot behaviour. The aims are to show that the idiotypic system is not merely performing some kind of periodic random behaviour selection, and to try to gain further insight into the processes that govern the idiotypic mechanism. Trials are carried out using simulated Pioneer robots that undertake navigation exercises. Results show that a scheme that boosts the probability of selecting highly-ranked alternative behaviours to 50% during stall conditions comes closest to achieving the properties of the idiotypic system, but remains unable to match it in terms of all round performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of reservoir engineering is to manage fields of oil production in order to maximize the production of hydrocarbons according to economic and physical restrictions. The deciding of a production strategy is a complex activity involving several variables in the process. Thus, a smart system, which assists in the optimization of the options for developing of the field, is very useful in day-to-day of reservoir engineers. This paper proposes the development of an intelligent system to aid decision making, regarding the optimization of strategies of production in oil fields. The intelligence of this system will be implemented through the use of the technique of reinforcement learning, which is presented as a powerful tool in problems of multi-stage decision. The proposed system will allow the specialist to obtain, in time, a great alternative (or near-optimal) for the development of an oil field known

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose to learn a variable selection policy for branch-and-bound in mixed-integer linear programming, by imitation learning on a diversified variant of the strong branching expert rule. We encode states as bipartite graphs and parameterize the policy as a graph convolutional neural network. Experiments on a series of synthetic problems demonstrate that our approach produces policies that can improve upon expert-designed branching rules on large problems, and generalize to instances significantly larger than seen during training.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, application domains such as smart cities, agriculture or intelligent transportation, require communication technologies that combine long transmission ranges and energy efficiency to fulfill a set of capabilities and constraints to rely on. In addition, in recent years, the interest in Unmanned Aerial Vehicles (UAVs) providing wireless connectivity in such scenarios is substantially increased thanks to their flexible deployment. The first chapters of this thesis deal with LoRaWAN and Narrowband-IoT (NB-IoT), which recent trends identify as the most promising Low Power Wide Area Networks technologies. While LoRaWAN is an open protocol that has gained a lot of interest thanks to its simplicity and energy efficiency, NB-IoT has been introduced from 3GPP as a radio access technology for massive machine-type communications inheriting legacy LTE characteristics. This thesis offers an overview of the two, comparing them in terms of selected performance indicators. In particular, LoRaWAN technology is assessed both via simulations and experiments, considering different network architectures and solutions to improve its performance (e.g., a new Adaptive Data Rate algorithm). NB-IoT is then introduced to identify which technology is more suitable depending on the application considered. The second part of the thesis introduces the use of UAVs as flying Base Stations, denoted as Unmanned Aerial Base Stations, (UABSs), which are considered as one of the key pillars of 6G to offer service for a number of applications. To this end, the performance of an NB-IoT network are assessed considering a UABS following predefined trajectories. Then, machine learning algorithms based on reinforcement learning and meta-learning are considered to optimize the trajectory as well as the radio resource management techniques the UABS may rely on in order to provide service considering both static (IoT sensors) and dynamic (vehicles) users. Finally, some experimental projects based on the technologies mentioned so far are presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Let’s put ourselves in the shoes of an energy company. Our fleet of electricity production plants mainly includes gas, hydroelectric and waste-to-energy plants. We also sold contracts for the supply of gas and electricity. For each year we have to plan the trading of the volumes needed by the plants and customers: better to fix the price of these volumes in advance with the so-called forward contracts, instead of waiting for the delivery months, exposing ourselves to price uncertainty. Here’s the thing: trying to keep uncertainty under control in a market that has never shown such extreme scenarios as in recent years: a pandemic, a worsening climate crisis and a war that is affecting economies around the world have made the energy market more volatile than ever. How to make decisions in such uncertain contexts? There is an optimization problem: given a year, we need to choose the optimal planning of volume trading times, to meet the needs of our portfolio at the best prices, taking into account the liquidity constraints given by the market and the risk constraints imposed by the company. Algorithms are needed for the generation of market scenarios over a finite time horizon, that is, a probabilistic distribution that allows a view of all the dates between now and the end of the year of interest. Algorithms are needed to solve the optimization problem: we have proposed more than one and compared them; a very simple one, which avoids considering part of the complexity, moving on to a scenario approach and finally a reinforcement learning approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Today we live in an age where the internet and artificial intelligence allow us to search for information through impressive amounts of data, opening up revolutionary new ways to make sense of reality and understand our world. However, it is still an area of improvement to exploit the full potential of large amounts of explainable information by distilling it automatically in an intuitive and user-centred explanation. For instance, different people (or artificial agents) may search for and request different types of information in a different order, so it is unlikely that a short explanation can suffice for all needs in the most generic case. Moreover, dumping a large portion of explainable information in a one-size-fits-all representation may also be sub-optimal, as the needed information may be scarce and dispersed across hundreds of pages. The aim of this work is to investigate how to automatically generate (user-centred) explanations from heterogeneous and large collections of data, with a focus on the concept of explanation in a broad sense, as a critical artefact for intelligence, regardless of whether it is human or robotic. Our approach builds on and extends Achinstein’s philosophical theory of explanations, where explaining is an illocutionary (i.e., broad but relevant) act of usefully answering questions. Specifically, we provide the theoretical foundations of Explanatory Artificial Intelligence (YAI), formally defining a user-centred explanatory tool and the space of all possible explanations, or explanatory space, generated by it. We present empirical results in support of our theory, showcasing the implementation of YAI tools and strategies for assessing explainability. To justify and evaluate the proposed theories and models, we considered case studies at the intersection of artificial intelligence and law, particularly European legislation. Our tools helped produce better explanations of software documentation and legal texts for humans and complex regulations for reinforcement learning agents.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

I gangli della base (BG) sono un gruppo di nuclei subcorticali che si trovano alla base del telencefalo e nella parte superiore del mesencefalo. La funzione dei BG è il controllo e la regolazione delle attività delle aree corticali motorie e premotorie in modo che i movimenti possano essere eseguiti fluidamente, ma sono coinvolti in numerosi altri processi motori e cognitivi. L’obiettivo che si pone questo lavoro di tesi è di simulare il comportamento dei BG attraverso un nuovo modello neurocomputazionale. È stato valutato il funzionamento del modello in varie condizioni, di base ed alterate, per illustrare casi standard (soggetti sani) e casi patologici (deplezione di dopamina nei pazienti Parkinson o ipermedicazione della dopamina tramite levodopa) durante un compito di probabilistic reversal learning (RL). Sono stati variati parametri di dopamina tonica e di rumore applicato ai soggetti e ai dati per simulare il modello durante il “one-choice task” presente in letteratura. I risultati raccolti indicano come il modello funzioni in maniera del tutto confrontabile con i risultati in letteratura, dimostrando la sua validità ed un utilizzo corretto dei parametri e della regola di apprendimento. Non è stato possibile dire altrettanto per seconda fase del RL, in cui la regola viene invertita: i soggetti non risultano apprendere in maniera coerente rispetto ai dati in letteratura, ma risultano restii all’individuazione del nuovo stimolo vincente. Tale risultato è da ricondursi probabilmente ad alcuni fattori: il numero di epoche utilizzate per il test sono esigue, lasciando ai soggetti poco tempo per apprendere la nuova regola; la regola di apprendimento usata nel reversal potrebbe non rappresentare la scelta migliore per rendere i soggetti più esplorativi nei confronti delle scelte proposte. Tali limiti sono spunti per futuri sviluppi del modello e del suo funzionamento, utilizzando regole di apprendimento diverse e più efficaci rispetto ai diversi contesti di azione.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the metal industry, and more specifically in the forging one, scrap material is a crucial issue and reducing it would be an important goal to reach. Not only would this help the companies to be more environmentally friendly and more sustainable, but it also would reduce the use of energy and lower costs. At the same time, the techniques for Industry 4.0 and the advancements in Artificial Intelligence (AI), especially in the field of Deep Reinforcement Learning (DRL), may have an important role in helping to achieve this objective. This document presents the thesis work, a contribution to the SmartForge project, that was performed during a semester abroad at Karlstad University (Sweden). This project aims at solving the aforementioned problem with a business case of the company Bharat Forge Kilsta, located in Karlskoga (Sweden). The thesis work includes the design and later development of an event-driven architecture with microservices, to support the processing of data coming from sensors set up in the company's industrial plant, and eventually the implementation of an algorithm with DRL techniques to control the electrical power to use in it.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, Recommender systems play a key role in managing information overload, particularly in areas such as e-commerce, music and cinema. However, despite their good-natured goal, in recent years there has been a growing awareness of their involvement in creating unwanted effects on society, such as creating biases of popularity or filter bubble. This thesis is an attempt to investigate the role of RS and its stakeholders in creating such effects. A simulation study will be performed using EcoAgent, an RL-based multi-stakeholder recommendation system, in a simulation environment that captures key user interactions, suppliers and the recommender system in order to identify possible unhealthy scenarios for stakeholders. In particular, we focus on analyzing the document catalog to see how the diversity of topics that users have access to varies during interactions. Finally, some post-processing methods will be defined on EcoAgent, one reactive and one proactive, which allows us to manipulate the agent’s behavior in order to study whether and how the topic distribution of documents is affected by content providers and by the fairness of the system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The inferior colliculus is a primary relay for the processing of auditory information in the brainstem. The inferior colliculus is also part of the so-called brain aversion system as animals learn to switch off the electrical stimulation of this structure. The purpose of the present study was to determine whether associative learning occurs between aversion induced by electrical stimulation of the inferior colliculus and visual and auditory warning stimuli. Rats implanted with electrodes into the central nucleus of the inferior colliculus were placed inside an open-field and thresholds for the escape response to electrical stimulation of the inferior colliculus were determined. The rats were then placed inside a shuttle-box and submitted to a two-way avoidance paradigm. Electrical stimulation of the inferior colliculus at the escape threshold (98.12 ± 6.15 (A, peak-to-peak) was used as negative reinforcement and light or tone as the warning stimulus. Each session consisted of 50 trials and was divided into two segments of 25 trials in order to determine the learning rate of the animals during the sessions. The rats learned to avoid the inferior colliculus stimulation when light was used as the warning stimulus (13.25 ± 0.60 s and 8.63 ± 0.93 s for latencies and 12.5 ± 2.04 and 19.62 ± 1.65 for frequencies in the first and second halves of the sessions, respectively, P<0.01 in both cases). No significant changes in latencies (14.75 ± 1.63 and 12.75 ± 1.44 s) or frequencies of responses (8.75 ± 1.20 and 11.25 ± 1.13) were seen when tone was used as the warning stimulus (P>0.05 in both cases). Taken together, the present results suggest that rats learn to avoid the inferior colliculus stimulation when light is used as the warning stimulus. However, this learning process does not occur when the neutral stimulus used is an acoustic one. Electrical stimulation of the inferior colliculus may disturb the signal transmission of the stimulus to be conditioned from the inferior colliculus to higher brain structures such as amygdala

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Perirhinal cortex in monkeys has been thought to be involved in visual associative learning. The authors examined rats' ability to make associations between visual stimuli in a visual secondary reinforcement task. Rats learned 2-choice visual discriminations for secondary visual reinforcement. They showed significant learning of discriminations before any primary reinforcement. Following bilateral perirhinal cortex lesions, rats continued to learn visual discriminations for visual secondary reinforcement at the same rate as before surgery. Thus, this study does not support a critical role of perirhinal cortex in learning for visual secondary reinforcement. Contrasting this result with other positive results, the authors suggest that the role of perirhinal cortex is in "within-object" associations and that it plays a much lesser role in stimulus-stimulus associations between objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ecologically and evolutionarily oriented research on learning has traditionally been carried out on vertebrates and bees. While less sophisticated than those animals, fruit flies (Drosophila) are capable of several forms of learning, and have an advantage of a short generation time, which makes them an ideal system for experimental evolution studies. This review summarizes the insights into evolutionary questions about learning gained in the last decade from evolutionary experiments on Drosophila. These experiments demonstrate that Drosophila have the genetic potential to evolve substantially improved learning performance in ecologically relevant learning tasks. In at least one set of selected populations the improved learning generalized to another task than that used to impose selection, involving a different behavior, different stimuli, and a different sensory channel for the aversive reinforcement. This improvement in learning ability was associated with reduction in other fitness-related traits, such as larval competitive ability and lifespan, pointing out to evolutionary trade-offs of improved learning. These trade-offs were confirmed by other evolutionary experiments where reduction in learning performance was observed as a correlated response to selection for tolerance to larval nutritional stress or for delayed aging. Such trade-offs could be one reason why fruit flies have not fully used up their evolutionary potential for learning ability. Finally, another evolutionary experiment with Drosophila provided the first direct evidence for the long-standing ideas that learning can under some circumstances accelerate and in other slow down genetically-based evolutionary change. These results demonstrate the usefulness of fruit flies as a model system to address evolutionary questions about learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: One characteristic of post traumatic stress disorder is an inability to adapt to a safe environment i.e. to change behavior when predictions of adverse outcomes are not met. Recent studies have also indicated that PTSD patients have altered pain processing, with hyperactivation of the putamen and insula to aversive stimuli (Geuze et al, 2007). The present study examined neuronal responses to aversive and predicted aversive events. Methods: Twenty-four trauma exposed non-PTSD controls and nineteen subjects with PTSD underwent fMRI imaging during a partial reinforcement fear conditioning paradigm, with a mild electric shock as the unconditioned stimuli (UCS). Three conditions were analyzed: actual presentations of the UCS, events when a UCS was expected, but omitted (CS+), and events when the UCS was neither expected nor delivered (CS-). Results: The UCS evoked significant alterations in the pain matrix consisting of the brainstem, the midbrain, the thalamus, the insula, the anterior and middle cingulate and the contralateral somatosensory cortex. PTSD subjects displayed bilaterally elevated putamen activity to the electric shock, as compared to controls. In trials when USC was expected, but omitted, significant activations were observed in the brainstem, the midbrain, the anterior insula and the anterior cingulate. PTSD subjects displayed similar activations, but also elevated activations in the amygdala and the posterior insula. Conclusions: These results indicate altered fear and safety learning in PTSD, and neuronal activations are further explored in terms of functional connectivity using psychophysiological interaction analyses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The EVS4CSCL project starts in the context of a Computer Supported Collaborative Learning environment (CSCL). Previous UOC projects created a CSCL generic platform (CLPL) to facilitate the development of CSCL applications. A discussion forum (DF) was the first application developed over the framework. This discussion forum was different from other products on the marketplace because of its focus on the learning process. The DF carried out the specification and elaboration phases from the discussion learning process but there was a lack in the consensus phase. The consensus phase in a learning environment is not something to be achieved but tested. Common tests are done by Electronic Voting System (EVS) tools, but consensus test is not an assessment test. We are not evaluating our students by their answers but by their discussion activity. Our educational EVS would be used as a discussion catalyst proposing a discussion about the results after an initial query or it would be used after a discussion period in order to manifest how the discussion changed the students mind (consensus). It should be also used by the teacher as a quick way to know where the student needs some reinforcement. That is important in a distance-learning environment where there is no direct contact between the teacher and the student and it is difficult to detect the learning lacks. In an educational environment, assessment it is a must and the EVS will provide direct assessment by peer usefulness evaluation, teacher marks on every query created and indirect assessment from statistics regarding the user activity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The influence of proximal olfactory cues on place learning and memory was tested in two different spatial tasks. Rats were trained to find a hole leading to their home cage or a single food source in an array of petri dishes. The two apparatuses differed both by the type of reinforcement (return to the home cage or food reward) and the local characteristics of the goal (masked holes or salient dishes). In both cases, the goal was in a fixed location relative to distant visual landmarks and could be marked by a local olfactory cue. Thus, the position of the goal was defined by two sets of redundant cues, each of which was sufficient to allow the discrimination of the goal location. These experiments were conducted with two strains of hooded rats (Long-Evans and PVG), which show different speeds of acquisition in place learning tasks. They revealed that the presence of an olfactory cue marking the goal facilitated learning of its location and that the facilitation persisted after the removal of the cue. Thus, the proximal olfactory cue appeared to potentiate learning and memory of the goal location relative to distant environmental cues. This facilitating effect was only detected when the expression of spatial memory was not already optimal, i.e., during the early phase of acquisition. It was not limited to a particular strain.