882 resultados para reinforcement learning,cryptography,machine learning,deep learning,Deep Q-Learning (DQN),AES


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) adaptive critic method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterized by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the critic network is then calculated and shown to be equal to the analytically derived correct value.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When visual sensor networks are composed of cameras which can adjust the zoom factor of their own lens, one must determine the optimal zoom levels for the cameras, for a given task. This gives rise to an important trade-off between the overlap of the different cameras’ fields of view, providing redundancy, and image quality. In an object tracking task, having multiple cameras observe the same area allows for quicker recovery, when a camera fails. In contrast having narrow zooms allow for a higher pixel count on regions of interest, leading to increased tracking confidence. In this paper we propose an approach for the self-organisation of redundancy in a distributed visual sensor network, based on decentralised multi-objective online learning using only local information to approximate the global state. We explore the impact of different zoom levels on these trade-offs, when tasking omnidirectional cameras, having perfect 360-degree view, with keeping track of a varying number of moving objects. We further show how employing decentralised reinforcement learning enables zoom configurations to be achieved dynamically at runtime according to an operator’s preference for maximising either the proportion of objects tracked, confidence associated with tracking, or redundancy in expectation of camera failure. We show that explicitly taking account of the level of overlap, even based only on local knowledge, improves resilience when cameras fail. Our results illustrate the trade-off between maintaining high confidence and object coverage, and maintaining redundancy, in anticipation of future failure. Our approach provides a fully tunable decentralised method for the self-organisation of redundancy in a changing environment, according to an operator’s preferences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beamforming is a technique widely used in various fields. With the aid of an antenna array, the beamforming aims to minimize the contribution of unknown interferents directions, while capturing the desired signal in a given direction. In this thesis are proposed beamforming techniques using Reinforcement Learning (RL) through the Q-Learning algorithm in antennas array. One proposal is to use RL to find the optimal policy selection between the beamforming (BF) and power control (PC) in order to better leverage the individual characteristics of each of them for a certain amount of Signal to Interference plus noise Ration (SINR). Another proposal is to use RL to determine the optimal policy between blind beamforming algorithm of CMA (Constant Modulus Algorithm) and DD (Decision Direct) in multipath environments. Results from simulations showed that the RL technique could be effective in achieving na optimal of switching between different techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Free energy calculations are a computational method for determining thermodynamic quantities, such as free energies of binding, via simulation.

Currently, due to computational and algorithmic limitations, free energy calculations are limited in scope.

In this work, we propose two methods for improving the efficiency of free energy calculations.

First, we expand the state space of alchemical intermediates, and show that this expansion enables us to calculate free energies along lower variance paths.

We use Q-learning, a reinforcement learning technique, to discover and optimize paths at low computational cost.

Second, we reduce the cost of sampling along a given path by using sequential Monte Carlo samplers.

We develop a new free energy estimator, pCrooks (pairwise Crooks), a variant on the Crooks fluctuation theorem (CFT), which enables decomposition of the variance of the free energy estimate for discrete paths, while retaining beneficial characteristics of CFT.

Combining these two advancements, we show that for some test models, optimal expanded-space paths have a nearly 80% reduction in variance relative to the standard path.

Additionally, our free energy estimator converges at a more consistent rate and on average 1.8 times faster when we enable path searching, even when the cost of path discovery and refinement is considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Jerne's idiotypic network theory postulates that the immune response involves inter-antibody stimulation and suppression as well as matching to antigens. The theory has proved the most popular Artificial Immune System (AIS) model for incorporation into behavior-based robotics but guidelines for implementing idiotypic selection are scarce. Furthermore, the direct effects of employing the technique have not been demonstrated in the form of a comparison with non-idiotypic systems. This paper aims to address these issues. A method for integrating an idiotypic AIS network with a Reinforcement Learning based control system (RL) is described and the mechanisms underlying antibody stimulation and suppression are explained in detail. Some hypotheses that account for the network advantage are put forward and tested using three systems with increasing idiotypic complexity. The basic RL, a simplified hybrid AIS-RL that implements idiotypic selection independently of derived concentration levels and a full hybrid AIS-RL scheme are examined. The test bed takes the form of a simulated Pioneer robot that is required to navigate through maze worlds detecting and tracking door markers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Jerne's idiotypic network theory postulates that the immune response involves inter-antibody stimulation and suppression as well as matching to antigens. The theory has proved the most popular Artificial Immune System (AIS) model for incorporation into behavior-based robotics but guidelines for implementing idiotypic selection are scarce. Furthermore, the direct effects of employing the technique have not been demonstrated in the form of a comparison with non-idiotypic systems. This paper aims to address these issues. A method for integrating an idiotypic AIS network with a Reinforcement Learning based control system (RL) is described and the mechanisms underlying antibody stimulation and suppression are explained in detail. Some hypotheses that account for the network advantage are put forward and tested using three systems with increasing idiotypic complexity. The basic RL, a simplified hybrid AIS-RL that implements idiotypic selection independently of derived concentration levels and a full hybrid AIS-RL scheme are examined. The test bed takes the form of a simulated Pioneer robot that is required to navigate through maze worlds detecting and tracking door markers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techniques that rely on reinforcement learning only. This is thought to be a result of intelligent behaviour selection on the part of the idiotypic robot. In this paper an attempt is made to imitate idiotypic dynamics by creating controllers that use reinforcement with a number of different probabilistic schemes to select robot behaviour. The aims are to show that the idiotypic system is not merely performing some kind of periodic random behaviour selection, and to try to gain further insight into the processes that govern the idiotypic mechanism. Trials are carried out using simulated Pioneer robots that undertake navigation exercises. Results show that a scheme that boosts the probability of selecting highly-ranked alternative behaviours to 50% during stall conditions comes closest to achieving the properties of the idiotypic system, but remains unable to match it in terms of all round performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of reservoir engineering is to manage fields of oil production in order to maximize the production of hydrocarbons according to economic and physical restrictions. The deciding of a production strategy is a complex activity involving several variables in the process. Thus, a smart system, which assists in the optimization of the options for developing of the field, is very useful in day-to-day of reservoir engineers. This paper proposes the development of an intelligent system to aid decision making, regarding the optimization of strategies of production in oil fields. The intelligence of this system will be implemented through the use of the technique of reinforcement learning, which is presented as a powerful tool in problems of multi-stage decision. The proposed system will allow the specialist to obtain, in time, a great alternative (or near-optimal) for the development of an oil field known

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Let’s put ourselves in the shoes of an energy company. Our fleet of electricity production plants mainly includes gas, hydroelectric and waste-to-energy plants. We also sold contracts for the supply of gas and electricity. For each year we have to plan the trading of the volumes needed by the plants and customers: better to fix the price of these volumes in advance with the so-called forward contracts, instead of waiting for the delivery months, exposing ourselves to price uncertainty. Here’s the thing: trying to keep uncertainty under control in a market that has never shown such extreme scenarios as in recent years: a pandemic, a worsening climate crisis and a war that is affecting economies around the world have made the energy market more volatile than ever. How to make decisions in such uncertain contexts? There is an optimization problem: given a year, we need to choose the optimal planning of volume trading times, to meet the needs of our portfolio at the best prices, taking into account the liquidity constraints given by the market and the risk constraints imposed by the company. Algorithms are needed for the generation of market scenarios over a finite time horizon, that is, a probabilistic distribution that allows a view of all the dates between now and the end of the year of interest. Algorithms are needed to solve the optimization problem: we have proposed more than one and compared them; a very simple one, which avoids considering part of the complexity, moving on to a scenario approach and finally a reinforcement learning approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today we live in an age where the internet and artificial intelligence allow us to search for information through impressive amounts of data, opening up revolutionary new ways to make sense of reality and understand our world. However, it is still an area of improvement to exploit the full potential of large amounts of explainable information by distilling it automatically in an intuitive and user-centred explanation. For instance, different people (or artificial agents) may search for and request different types of information in a different order, so it is unlikely that a short explanation can suffice for all needs in the most generic case. Moreover, dumping a large portion of explainable information in a one-size-fits-all representation may also be sub-optimal, as the needed information may be scarce and dispersed across hundreds of pages. The aim of this work is to investigate how to automatically generate (user-centred) explanations from heterogeneous and large collections of data, with a focus on the concept of explanation in a broad sense, as a critical artefact for intelligence, regardless of whether it is human or robotic. Our approach builds on and extends Achinstein’s philosophical theory of explanations, where explaining is an illocutionary (i.e., broad but relevant) act of usefully answering questions. Specifically, we provide the theoretical foundations of Explanatory Artificial Intelligence (YAI), formally defining a user-centred explanatory tool and the space of all possible explanations, or explanatory space, generated by it. We present empirical results in support of our theory, showcasing the implementation of YAI tools and strategies for assessing explainability. To justify and evaluate the proposed theories and models, we considered case studies at the intersection of artificial intelligence and law, particularly European legislation. Our tools helped produce better explanations of software documentation and legal texts for humans and complex regulations for reinforcement learning agents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I gangli della base (BG) sono un gruppo di nuclei subcorticali che si trovano alla base del telencefalo e nella parte superiore del mesencefalo. La funzione dei BG è il controllo e la regolazione delle attività delle aree corticali motorie e premotorie in modo che i movimenti possano essere eseguiti fluidamente, ma sono coinvolti in numerosi altri processi motori e cognitivi. L’obiettivo che si pone questo lavoro di tesi è di simulare il comportamento dei BG attraverso un nuovo modello neurocomputazionale. È stato valutato il funzionamento del modello in varie condizioni, di base ed alterate, per illustrare casi standard (soggetti sani) e casi patologici (deplezione di dopamina nei pazienti Parkinson o ipermedicazione della dopamina tramite levodopa) durante un compito di probabilistic reversal learning (RL). Sono stati variati parametri di dopamina tonica e di rumore applicato ai soggetti e ai dati per simulare il modello durante il “one-choice task” presente in letteratura. I risultati raccolti indicano come il modello funzioni in maniera del tutto confrontabile con i risultati in letteratura, dimostrando la sua validità ed un utilizzo corretto dei parametri e della regola di apprendimento. Non è stato possibile dire altrettanto per seconda fase del RL, in cui la regola viene invertita: i soggetti non risultano apprendere in maniera coerente rispetto ai dati in letteratura, ma risultano restii all’individuazione del nuovo stimolo vincente. Tale risultato è da ricondursi probabilmente ad alcuni fattori: il numero di epoche utilizzate per il test sono esigue, lasciando ai soggetti poco tempo per apprendere la nuova regola; la regola di apprendimento usata nel reversal potrebbe non rappresentare la scelta migliore per rendere i soggetti più esplorativi nei confronti delle scelte proposte. Tali limiti sono spunti per futuri sviluppi del modello e del suo funzionamento, utilizzando regole di apprendimento diverse e più efficaci rispetto ai diversi contesti di azione.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, Recommender systems play a key role in managing information overload, particularly in areas such as e-commerce, music and cinema. However, despite their good-natured goal, in recent years there has been a growing awareness of their involvement in creating unwanted effects on society, such as creating biases of popularity or filter bubble. This thesis is an attempt to investigate the role of RS and its stakeholders in creating such effects. A simulation study will be performed using EcoAgent, an RL-based multi-stakeholder recommendation system, in a simulation environment that captures key user interactions, suppliers and the recommender system in order to identify possible unhealthy scenarios for stakeholders. In particular, we focus on analyzing the document catalog to see how the diversity of topics that users have access to varies during interactions. Finally, some post-processing methods will be defined on EcoAgent, one reactive and one proactive, which allows us to manipulate the agent’s behavior in order to study whether and how the topic distribution of documents is affected by content providers and by the fairness of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)