14 resultados para reinforcement learning

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a control approach based on reinforcement learning is present for a robot to complete a dynamic task in an unknown environment. First, a temporal difference-based reinforcement learning algorithm and its evaluation function are used to make the robot learn with its trials and errors as well as experiences. Second, the simulation are carried out to adjust the parameters of the learning algorithm and determine an optimal policy by using the models of a robot. Last, the effectiveness of the present approach is demonstrated by balancing an inverse pendulum in the unknown environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An intelligent agent-based scheduling system, consisting of a reinforcement learning agent and a simulation model has been developed and tested on a classic scheduling problem. The production facility studied is a multiproduct serial line subject to stochastic failure. The agent goal is to minimise total production costs, through selection of job sequence and batch size. To explore state space the agent used reinforcement learning. By applying an independent inventory control policy for each product, the agent successfully identified optimal operating policies for a real production facility.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A reinforcement learning agent has been developed to determine optimal operating policies in a multi-part serial line. The agent interacts with a discrete event simulation model of a stochastic production facility. This study identifies issues important to the simulation developer who wishes to optimise a complex simulation or develop a robust operating policy. Critical parameters pertinent to 'tuning' an agent quickly and enabling it to rapidly learn the system were investigated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional optimisation methods are incapable of capturing the complexity of today's dynamic manufacturing systems. A new methodology, integrating simulation models and intelligent learning agents, was successfully applied to identify solutions to a fundamental scheduling problem. The robustness of this approach was then demonstrated through a series of real-world industrial applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing an effective memetic algorithm that integrates the Particle Swarm Optimization (PSO) algorithm and a local search method is a difficult task. The challenging issues include when the local search method should be called, the frequency of calling the local search method, as well as which particle should undergo the local search operations. Motivated by this challenge, we introduce a new Reinforcement Learning-based Memetic Particle Swarm Optimization (RLMPSO) model. Each particle is subject to five operations under the control of the Reinforcement Learning (RL) algorithm, i.e. exploration, convergence, high-jump, low-jump, and fine-tuning. These operations are executed by the particle according to the action generated by the RL algorithm. The proposed RLMPSO model is evaluated using four uni-modal and multi-modal benchmark problems, six composite benchmark problems, five shifted and rotated benchmark problems, as well as two benchmark application problems. The experimental results show that RLMPSO is useful, and it outperforms a number of state-of-the-art PSO-based algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This study contributes to work in baggage handling system (BHS) control, specifically dynamic bag routing. Although studies in BHS agent-based control have examined the need for intelligent control, but there has not been an effort to explore the dynamic routing problem. As such, this study provides additional insight into how agents can learn to route in a BHS. This study describes a BHS status-based routing algorithm that applies learning methods to select criteria based on routing decisions. Although numerous studies have identified the need for dynamic routing, little analytic attention has been paid to intelligent agents for learning routing tables rather than manual creation of routing rules. We address this issue by demonstrating the ability of agents to learn how to route based on bag status, a robust method that is able to function in a variety of different BHS designs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research investigated the problem of path planning in complex conveyor networks. A reinforcement learning approach was applied to derive a control strategy for routing traffic. The derived strategy was verified in real world systems and was found to improve network performance by prioritising traffic flows and balancing network load.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Cocaine addiction involves persistent deficits to unlearn previously rewarded response options, potentially due to neuroadaptations in learning-sensitive regions. Cocaine-targeted prefrontal systems have been consistently associated with reinforcement learning and reversal deficits, but more recent interspecies research has raised awareness about the contribution of the cerebellum to cocaine addiction and reversal. We aimed at investigating the link between cocaine use, reversal learning and prefrontal, insula and cerebellar gray matter in cocaine-dependent individuals (CDIs) varying on levels of cocaine exposure in comparison with healthy controls (HCs). Twenty CDIs and 21 HCs performed a probabilistic reversal learning task (PRLT) and were subsequently scanned in a 3-Tesla magnetic resonance imaging scanner. In the PRLT, subjects progressively learn to respond to one predominantly reinforced stimulus, and thenmust learn to respond according to the opposite, previously irrelevant, stimulus-reward pairing. Performance measureswere errors after reversal (reversal cost), and probability of maintaining response after errors. Voxel-based morphometry was conducted to investigate the association between gray matter volume in the regions of interest and cocaine use and PRLT performance. Severity of cocaine use correlated with gray matter volume reduction in the left cerebellum (lobule VIII), while greater reversal cost was correlated with gray matter volume reduction in a partially overlapping cluster (lobules VIIb and VIII). Right insula/inferior frontal gyrus correlated with probability of maintaining response after errors. Severity of cocaine use detrimentally impacted reversal learning and cerebellar gray matter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A two-stage hybrid model for data classification and rule extraction is proposed. The first stage uses a Fuzzy ARTMAP (FAM) classifier with Q-learning (known as QFAM) for incremental learning of data samples, while the second stage uses a Genetic Algorithm (GA) for rule extraction from QFAM. Given a new data sample, the resulting hybrid model, known as QFAM-GA, is able to provide prediction pertaining to the target class of the data sample as well as to give a fuzzy if-then rule to explain the prediction. To reduce the network complexity, a pruning scheme using Q-values is applied to reduce the number of prototypes generated by QFAM. A 'don't care' technique is employed to minimize the number of input features using the GA. A number of benchmark problems are used to evaluate the effectiveness of QFAM-GA in terms of test accuracy, noise tolerance, model complexity (number of rules and total rule length). The results are comparable, if not better, than many other models reported in the literature. The main significance of this research is a usable and useful intelligent model (i.e., QFAM-GA) for data classification in noisy conditions with the capability of yielding a set of explanatory rules with minimum antecedents. In addition, QFAM-GA is able to maximize accuracy and minimize model complexity simultaneously. The empirical outcome positively demonstrate the potential impact of QFAM-GA in the practical environment, i.e., providing an accurate prediction with a concise justification pertaining to the prediction to the domain users, therefore allowing domain users to adopt QFAM-GA as a useful decision support tool in assisting their decision-making processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In an evolutionary model, players from a given population meet randomly in pairs each instant to play a coordination game. At each instant, the learning model used is determined via some replicator dynamics that respects payoff fitness. We allow for two such models: a belief-based best-response model that uses a costly predictor, and a costless reinforcement-based one. This generates dynamics over the choice of learning models and the consequent choices of endogenous variables. We report conditions under which the long run outcomes are efficient (or inefficient) and they support the exclusive use of either of the models (or their co-existence).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Incremental learning allows modification of developed concepts without the need for prior knowledge of all data. An incremental algorithm is developed to focus on the problems of memory size, forgetting and concept drift. The evidence based forgetting procedure minimizes the concept size and maintains its consistency with respect to the incoming data. An age value associated with the data determines its reinforcement or removal from a concept.