885 resultados para Q-learning algorithm
Resumo:
This paper reviews a study that was done with hearing and hearing impaired children to test the effectiveness of self-instructional programs and whether the results can be correlated with Educational Quotient and Intelligence Quotient.
Resumo:
A neural network enhanced proportional, integral and derivative (PID) controller is presented that combines the attributes of neural network learning with a generalized minimum-variance self-tuning control (STC) strategy. The neuro PID controller is structured with plant model identification and PID parameter tuning. The plants to be controlled are approximated by an equivalent model composed of a simple linear submodel to approximate plant dynamics around operating points, plus an error agent to accommodate the errors induced by linear submodel inaccuracy due to non-linearities and other complexities. A generalized recursive least-squares algorithm is used to identify the linear submodel, and a layered neural network is used to detect the error agent in which the weights are updated on the basis of the error between the plant output and the output from the linear submodel. The procedure for controller design is based on the equivalent model, and therefore the error agent is naturally functioned within the control law. In this way the controller can deal not only with a wide range of linear dynamic plants but also with those complex plants characterized by severe non-linearity, uncertainties and non-minimum phase behaviours. Two simulation studies are provided to demonstrate the effectiveness of the controller design procedure.
Resumo:
A self-learning simulated annealing algorithm is developed by combining the characteristics of simulated annealing and domain elimination methods. The algorithm is validated by using a standard mathematical function and by optimizing the end region of a practical power transformer. The numerical results show that the CPU time required by the proposed method is about one third of that using conventional simulated annealing algorithm.
Resumo:
This paper aims to provide an improved NSGA-II (Non-Dominated Sorting Genetic Algorithm-version II) which incorporates a parameter-free self-tuning approach by reinforcement learning technique, called Non-Dominated Sorting Genetic Algorithm Based on Reinforcement Learning (NSGA-RL). The proposed method is particularly compared with the classical NSGA-II when applied to a satellite coverage problem. Furthermore, not only the optimization results are compared with results obtained by other multiobjective optimization methods, but also guarantee the advantage of no time-spending and complex parameter tuning.
Resumo:
Artificial pancreas is in the forefront of research towards the automatic insulin infusion for patients with type 1 diabetes. Due to the high inter- and intra-variability of the diabetic population, the need for personalized approaches has been raised. This study presents an adaptive, patient-specific control strategy for glucose regulation based on reinforcement learning and more specifically on the Actor-Critic (AC) learning approach. The control algorithm provides daily updates of the basal rate and insulin-to-carbohydrate (IC) ratio in order to optimize glucose regulation. A method for the automatic and personalized initialization of the control algorithm is designed based on the estimation of the transfer entropy (TE) between insulin and glucose signals. The algorithm has been evaluated in silico in adults, adolescents and children for 10 days. Three scenarios of initialization to i) zero values, ii) random values and iii) TE-based values have been comparatively assessed. The results have shown that when the TE-based initialization is used, the algorithm achieves faster learning with 98%, 90% and 73% in the A+B zones of the Control Variability Grid Analysis for adults, adolescents and children respectively after five days compared to 95%, 78%, 41% for random initialization and 93%, 88%, 41% for zero initial values. Furthermore, in the case of children, the daily Low Blood Glucose Index reduces much faster when the TE-based tuning is applied. The results imply that automatic and personalized tuning based on TE reduces the learning period and improves the overall performance of the AC algorithm.
Resumo:
We describe how to use a Granular Linguistic Model of a Phenomenon (GLMP) to assess e-learning processes. We apply this technique to evaluate algorithm learning using the GRAPHs learning environment.
Resumo:
A formalism for describing the dynamics of Genetic Algorithms (GAs) using method s from statistical mechanics is applied to the problem of generalization in a perceptron with binary weights. The dynamics are solved for the case where a new batch of training patterns is presented to each population member each generation, which considerably simplifies the calculation. The theory is shown to agree closely to simulations of a real GA averaged over many runs, accurately predicting the mean best solution found. For weak selection and large problem size the difference equations describing the dynamics can be expressed analytically and we find that the effects of noise due to the finite size of each training batch can be removed by increasing the population size appropriately. If this population resizing is used, one can deduce the most computationally efficient size of training batch each generation. For independent patterns this choice also gives the minimum total number of training patterns used. Although using independent patterns is a very inefficient use of training patterns in general, this work may also prove useful for determining the optimum batch size in the case where patterns are recycled.
Resumo:
To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space.
Resumo:
An eMathTeacher [Sánchez-Torrubia 2007a] is an eLearning on line self assessment tool that help students to active learning math algorithms by themselves, correcting their mistakes and providing them with clues to find the right solution. The tool presented in this paper is an example of this new concept on Computer Aided Instruction (CAI) resources and has been implemented as a Java applet and designed as an auxiliary instrument for both classroom teaching and individual practicing of Fleury’s algorithm. This tool, included within a set of eMathTeacher tools, has been designed as educational complement of Graph Algorithm active learning for first course students. Its characteristics of visualization, simplicity and interactivity, make this tutorial a great value pedagogical instrument.
Resumo:
The purpose of this mixed methods study was to understand physics Learning Assistants' (LAs) views on reflective teaching, expertise in teaching, and LA program teaching experience and to determine if views predicted level of reflection evident in writing. Interviews were conducted in Phase One, Q methodology was used in Phase Two, and level of reflection in participants' writing was assessed using a rubric based on Hatton and Smith's (1995) "Criteria for the Recognition of Evidence for Different Types of Reflective Writing" in Phase Three. Interview analysis revealed varying perspectives on content knowledge, pedagogical knowledge, and experience in relation to expertise in teaching. Participants revealed that they engaged in reflection on their teaching, believed reflection helps teachers improve, and found peer reflection beneficial. Participants believed teaching experience in the LA program provided preparation for teaching, but that more preparation was needed to teach. Three typologies emerged in Phase Two. Type One LAs found participation in the LA program rewarding and believed expertise in teaching does not require expertise in content or pedagogy, but it develops over time from reflection. Type Two LAs valued reflection, but not writing reflections, felt the LA program teaching experience helped them decide on non-teaching careers and helped them confront gaps in their physics knowledge. Type Three LAs valued reflection, believed expertise in content and pedagogy are necessary for expert teaching, and felt LA program teaching experience increased their likelihood of becoming teachers, but did not prepare them for teaching. Writing assignments submitted in Phase Three were categorized as 19% descriptive writing, 60% descriptive reflections, and 21% dialogic reflections. No assignments were categorized as critical reflection. Using ordinal logistic regression, typologies that emerged in Phase Two were not found to be predictors for the level of reflection evident in the writing assignments. In conclusion, viewpoints of physics LAs were revealed, typologies among them were discovered, and their writing gave evidence of their ability to reflect on teaching. These findings may benefit faculty and staff in the LA program by helping them better understand the views of physics LAs and how to assess their various forms of reflection.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods \cite{korhonen2exact, nie2014advances} tackle the problem by using $k$-trees to learn the optimal Bayesian network with tree-width up to $k$. Finding the best $k$-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative $k$-trees by introducing an informative score function to characterize the quality of a $k$-tree. To further improve the quality of the $k$-trees, we propose a probabilistic hill climbing approach that locally refines the sampled $k$-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most $k$. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods.
Resumo:
We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.