30 resultados para reinforcement learning,cryptography,machine learning,deep learning,Deep Q-Learning (DQN),AES

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamic Optimization Problems (DOPs) have been widely studied using Evolutionary Algorithms (EAs). Yet, a clear and rigorous definition of DOPs is lacking in the Evolutionary Dynamic Optimization (EDO) community. In this paper, we propose a unified definition of DOPs based on the idea of multiple-decision-making discussed in the Reinforcement Learning (RL) community. We draw a connection between EDO and RL by arguing that both of them are studying DOPs according to our definition of DOPs. We point out that existing EDO or RL research has been mainly focused on some types of DOPs. A conceptualized benchmark problem, which is aimed at the systematic study of various DOPs, is then developed. Some interesting experimental studies on the benchmark reveal that EDO and RL methods are specialized in certain types of DOPs and more importantly new algorithms for DOPs can be developed by combining the strength of both EDO and RL methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a new technique for optimizing the trading strategy of brokers that autonomously trade in re- tail and wholesale markets. Simultaneous optimization of re- tail and wholesale strategies has been considered by existing studies as intractable. Therefore, each of these strategies is optimized separately and their interdependence is generally ignored, with resulting broker agents not aiming for a glob- ally optimal retail and wholesale strategy. In this paper, we propose a novel formalization, based on a semi-Markov deci- sion process (SMDP), which globally and simultaneously op- timizes retail and wholesale strategies. The SMDP is solved using hierarchical reinforcement learning (HRL) in multi- agent environments. To address the curse of dimensionality, which arises when applying SMDP and HRL to complex de- cision problems, we propose an ecient knowledge transfer approach. This enables the reuse of learned trading skills in order to speed up the learning in new markets, at the same time as making the broker transportable across market envi- ronments. The proposed SMDP-broker has been thoroughly evaluated in two well-established multi-agent simulation en- vironments within the Trading Agent Competition (TAC) community. Analysis of controlled experiments shows that this broker can outperform the top TAC-brokers. More- over, our broker is able to perform well in a wide range of environments by re-using knowledge acquired in previously experienced settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart grid technologies have given rise to a liberalised and decentralised electricity market, enabling energy providers and retailers to have a better understanding of the demand side and its response to pricing signals. This paper puts forward a reinforcement-learning-powered tool aiding an electricity retailer to define the tariff prices it offers, in a bid to optimise its retail strategy. In a competitive market, an energy retailer aims to simultaneously increase the number of contracted customers and its profit margin. We have abstracted the problem of deciding on a tariff price as faced by a retailer, as a semi-Markov decision problem (SMDP). A hierarchical reinforcement learning approach, MaxQ value function decomposition, is applied to solve the SMDP through interactions with the market. To evaluate our trading strategy, we developed a retailer agent (termed AstonTAC) that uses the proposed SMDP framework to act in an open multi-agent simulation environment, the Power Trading Agent Competition (Power TAC). An evaluation and analysis of the 2013 Power TAC finals show that AstonTAC successfully selects sell prices that attract as many customers as necessary to maximise the profit margin. Moreover, during the competition, AstonTAC was the only retailer agent performing well across all retail market settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) adaptive critic method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterized by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the critic network is then calculated and shown to be equal to the analytically derived correct value.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When visual sensor networks are composed of cameras which can adjust the zoom factor of their own lens, one must determine the optimal zoom levels for the cameras, for a given task. This gives rise to an important trade-off between the overlap of the different cameras’ fields of view, providing redundancy, and image quality. In an object tracking task, having multiple cameras observe the same area allows for quicker recovery, when a camera fails. In contrast having narrow zooms allow for a higher pixel count on regions of interest, leading to increased tracking confidence. In this paper we propose an approach for the self-organisation of redundancy in a distributed visual sensor network, based on decentralised multi-objective online learning using only local information to approximate the global state. We explore the impact of different zoom levels on these trade-offs, when tasking omnidirectional cameras, having perfect 360-degree view, with keeping track of a varying number of moving objects. We further show how employing decentralised reinforcement learning enables zoom configurations to be achieved dynamically at runtime according to an operator’s preference for maximising either the proportion of objects tracked, confidence associated with tracking, or redundancy in expectation of camera failure. We show that explicitly taking account of the level of overlap, even based only on local knowledge, improves resilience when cameras fail. Our results illustrate the trade-off between maintaining high confidence and object coverage, and maintaining redundancy, in anticipation of future failure. Our approach provides a fully tunable decentralised method for the self-organisation of redundancy in a changing environment, according to an operator’s preferences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender system is a specific type of intelligent systems, which exploits historical user ratings on items and/or auxiliary information to make recommendations on items to the users. It plays a critical role in a wide range of online shopping, e-commercial services and social networking applications. Collaborative filtering (CF) is the most popular approaches used for recommender systems, but it suffers from complete cold start (CCS) problem where no rating record are available and incomplete cold start (ICS) problem where only a small number of rating records are available for some new items or users in the system. In this paper, we propose two recommendation models to solve the CCS and ICS problems for new items, which are based on a framework of tightly coupled CF approach and deep learning neural network. A specific deep neural network SADE is used to extract the content features of the items. The state of the art CF model, timeSVD++, which models and utilizes temporal dynamics of user preferences and item features, is modified to take the content features into prediction of ratings for cold start items. Extensive experiments on a large Netflix rating dataset of movies are performed, which show that our proposed recommendation models largely outperform the baseline models for rating prediction of cold start items. The two proposed recommendation models are also evaluated and compared on ICS items, and a flexible scheme of model retraining and switching is proposed to deal with the transition of items from cold start to non-cold start status. The experiment results on Netflix movie recommendation show the tight coupling of CF approach and deep learning neural network is feasible and very effective for cold start item recommendation. The design is general and can be applied to many other recommender systems for online shopping and social networking applications. The solution of cold start item problem can largely improve user experience and trust of recommender systems, and effectively promote cold start items.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender systems (RS) are used by many social networking applications and online e-commercial services. Collaborative filtering (CF) is one of the most popular approaches used for RS. However traditional CF approach suffers from sparsity and cold start problems. In this paper, we propose a hybrid recommendation model to address the cold start problem, which explores the item content features learned from a deep learning neural network and applies them to the timeSVD++ CF model. Extensive experiments are run on a large Netflix rating dataset for movies. Experiment results show that the proposed hybrid recommendation model provides a good prediction for cold start items, and performs better than four existing recommendation models for rating of non-cold start items.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Original Paper European Journal of Information Systems (2001) 10, 135–146; doi:10.1057/palgrave.ejis.3000394 Organisational learning—a critical systems thinking discipline P Panagiotidis1,3 and J S Edwards2,4 1Deloitte and Touche, Athens, Greece 2Aston Business School, Aston University, Aston Triangle, Birmingham, B4 7ET, UK Correspondence: Dr J S Edwards, Aston Business School, Aston University, Aston Triangle, Birmingham, B4 7ET, UK. E-mail: j.s.edwards@aston.ac.uk 3Petros Panagiotidis is Manager responsible for the Process and Systems Integrity Services of Deloitte and Touche in Athens, Greece. He has a BSc in Business Administration and an MSc in Management Information Systems from Western International University, Phoenix, Arizona, USA; an MSc in Business Systems Analysis and Design from City University, London, UK; and a PhD degree from Aston University, Birmingham, UK. His doctorate was in Business Systems Analysis and Design. His principal interests now are in the ERP/DSS field, where he serves as project leader and project risk managment leader in the implementation of SAP and JD Edwards/Cognos in various major clients in the telecommunications and manufacturing sectors. In addition, he is responsible for the development and application of knowledge management systems and activity-based costing systems. 4John S Edwards is Senior Lecturer in Operational Research and Systems at Aston Business School, Birmingham, UK. He holds MA and PhD degrees (in mathematics and operational research respectively) from Cambridge University. His principal research interests are in knowledge management and decision support, especially methods and processes for system development. He has written more than 30 research papers on these topics, and two books, Building Knowledge-based Systems and Decision Making with Computers, both published by Pitman. Current research work includes the effect of scale of operations on knowledge management, interfacing expert systems with simulation models, process modelling in law and legal services, and a study of the use of artifical intelligence techniques in management accounting. Top of pageAbstract This paper deals with the application of critical systems thinking in the domain of organisational learning and knowledge management. Its viewpoint is that deep organisational learning only takes place when the business systems' stakeholders reflect on their actions and thus inquire about their purpose(s) in relation to the business system and the other stakeholders they perceive to exist. This is done by reflecting both on the sources of motivation and/or deception that are contained in their purpose, and also on the sources of collective motivation and/or deception that are contained in the business system's purpose. The development of an organisational information system that captures, manages and institutionalises meaningful information—a knowledge management system—cannot be separated from organisational learning practices, since it should be the result of these very practices. Although Senge's five disciplines provide a useful starting-point in looking at organisational learning, we argue for a critical systems approach, instead of an uncritical Systems Dynamics one that concentrates only on the organisational learning practices. We proceed to outline a methodology called Business Systems Purpose Analysis (BSPA) that offers a participatory structure for team and organisational learning, upon which the stakeholders can take legitimate action that is based on the force of the better argument. In addition, the organisational learning process in BSPA leads to the development of an intrinsically motivated information organisational system that allows for the institutionalisation of the learning process itself in the form of an organisational knowledge management system. This could be a specific application, or something as wide-ranging as an Enterprise Resource Planning (ERP) implementation. Examples of the use of BSPA in two ERP implementations are presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. Most existing systems concentrate either on mining algorithms or on visualization techniques. Though visual methods developed in information visualization have been helpful, for improved understanding of a complex large high-dimensional dataset, there is a need for an effective projection of such a dataset onto a lower-dimension (2D or 3D) manifold. This paper introduces a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualization domain. The framework follows Shneiderman’s mantra to provide an effective user interface. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection methods, such as Generative Topographic Mapping (GTM) and Hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, billboarding, and user interaction facilities, to provide an integrated visual data mining framework. Results on a real life high-dimensional dataset from the chemoinformatics domain are also reported and discussed. Projection results of GTM are analytically compared with the projection results from other traditional projection methods, and it is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background - The literature is not univocal about the effects of Peer Review (PR) within the context of constructivist learning. Due to the predominant focus on using PR as an assessment tool, rather than a constructivist learning activity, and because most studies implicitly assume that the benefits of PR are limited to the reviewee, little is known about the effects upon students who are required to review their peers. Much of the theoretical debate in the literature is focused on explaining how and why constructivist learning is beneficial. At the same time these discussions are marked by an underlying presupposition of a causal relationship between reviewing and deep learning. Objectives - The purpose of the study is to investigate whether the writing of PR feedback causes students to benefit in terms of: perceived utility about statistics, actual use of statistics, better understanding of statistical concepts and associated methods, changed attitudes towards market risks, and outcomes of decisions that were made. Methods - We conducted a randomized experiment, assigning students randomly to receive PR or non–PR treatments and used two cohorts with a different time span. The paper discusses the experimental design and all the software components that we used to support the learning process: Reproducible Computing technology which allows students to reproduce or re–use statistical results from peers, Collaborative PR, and an AI–enhanced Stock Market Engine. Results - The results establish that the writing of PR feedback messages causes students to experience benefits in terms of Behavior, Non–Rote Learning, and Attitudes, provided the sequence of PR activities are maintained for a period that is sufficiently long.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Assessment criteria are increasingly incorporated into teaching, making it important to clarify the pedagogic status of the qualities to which they refer. We reviewed theory and evidence about the extent to which four core criteria for student writing-critical thinking, use of language, structuring, and argument-refer to the outcomes of three types of learning: generic skills learning, a deep approach to learning, and complex learning. The analysis showed that all four of the core criteria describe to some extent properties of text resulting from using skills, but none qualify fully as descriptions of the outcomes of applying generic skills. Most also describe certain aspects of the outcomes of taking a deep approach to learning. Critical thinking and argument correspond most closely to the outcomes of complex learning. At lower levels of performance, use of language and structuring describe the outcomes of applying transferable skills. At higher levels of performance, they describe the outcomes of taking a deep approach to learning. We propose that the type of learning required to meet the core criteria is most usefully and accurately conceptualized as the learning of complex skills, and that this provides a conceptual framework for maximizing the benefits of using assessment criteria as part of teaching. © 2006 Taylor & Francis.