903 resultados para Optimal Control Problems
Resumo:
Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.
Resumo:
MultiProcessor Systems-on-Chip (MPSoC) are the core of nowadays and next generation computing platforms. Their relevance in the global market continuously increase, occupying an important role both in everydaylife products (e.g. smartphones, tablets, laptops, cars) and in strategical market sectors as aviation, defense, robotics, medicine. Despite of the incredible performance improvements in the recent years processors manufacturers have had to deal with issues, commonly called “Walls”, that have hindered the processors development. After the famous “Power Wall”, that limited the maximum frequency of a single core and marked the birth of the modern multiprocessors system-on-chip, the “Thermal Wall” and the “Utilization Wall” are the actual key limiter for performance improvements. The former concerns the damaging effects of the high temperature on the chip caused by the large power densities dissipation, whereas the second refers to the impossibility of fully exploiting the computing power of the processor due to the limitations on power and temperature budgets. In this thesis we faced these challenges by developing efficient and reliable solutions able to maximize performance while limiting the maximum temperature below a fixed critical threshold and saving energy. This has been possible by exploiting the Model Predictive Controller (MPC) paradigm that solves an optimization problem subject to constraints in order to find the optimal control decisions for the future interval. A fully-distributedMPC-based thermal controller with a far lower complexity respect to a centralized one has been developed. The control feasibility and interesting properties for the simplification of the control design has been proved by studying a partial differential equation thermal model. Finally, the controller has been efficiently included in more complex control schemes able to minimize energy consumption and deal with mixed-criticalities tasks
Resumo:
Traditionally, the study of internal combustion engines operation has focused on the steady-state performance. However, the daily driving schedule of automotive engines is inherently related to unsteady conditions. There are various operating conditions experienced by (diesel) engines that can be classified as transient. Besides the variation of the engine operating point, in terms of engine speed and torque, also the warm up phase can be considered as a transient condition. Chapter 2 has to do with this thermal transient condition; more precisely the main issue is the performance of a Selective Catalytic Reduction (SCR) system during cold start and warm up phases of the engine. The proposal of the underlying work is to investigate and identify optimal exhaust line heating strategies, to provide a fast activation of the catalytic reactions on SCR. Chapters 3 and 4 focus the attention on the dynamic behavior of the engine, when considering typical driving conditions. The common approach to dynamic optimization involves the solution of a single optimal-control problem. However, this approach requires the availability of models that are valid throughout the whole engine operating range and actuator ranges. In addition, the result of the optimization is meaningful only if the model is very accurate. Chapter 3 proposes a methodology to circumvent those demanding requirements: an iteration between transient measurements to refine a purpose-built model and a dynamic optimization which is constrained to the model validity region. Moreover all numerical methods required to implement this procedure are presented. Chapter 4 proposes an approach to derive a transient feedforward control system in an automated way. It relies on optimal control theory to solve a dynamic optimization problem for fast transients. From the optimal solutions, the relevant information is extracted and stored in maps spanned by the engine speed and the torque gradient.
Resumo:
In power electronic basedmicrogrids, the computational requirements needed to implement an optimized online control strategy can be prohibitive. The work presented in this dissertation proposes a generalized method of derivation of geometric manifolds in a dc microgrid that is based on the a-priori computation of the optimal reactions and trajectories for classes of events in a dc microgrid. The proposed states are the stored energies in all the energy storage elements of the dc microgrid and power flowing into them. It is anticipated that calculating a large enough set of dissimilar transient scenarios will also span many scenarios not specifically used to develop the surface. These geometric manifolds will then be used as reference surfaces in any type of controller, such as a sliding mode hysteretic controller. The presence of switched power converters in microgrids involve different control actions for different system events. The control of the switch states of the converters is essential for steady state and transient operations. A digital memory look-up based controller that uses a hysteretic sliding mode control strategy is an effective technique to generate the proper switch states for the converters. An example dcmicrogrid with three dc-dc boost converters and resistive loads is considered for this work. The geometric manifolds are successfully generated for transient events, such as step changes in the loads and the sources. The surfaces corresponding to a specific case of step change in the loads are then used as reference surfaces in an EEPROM for experimentally validating the control strategy. The required switch states corresponding to this specific transient scenario are programmed in the EEPROM as a memory table. This controls the switching of the dc-dc boost converters and drives the system states to the reference manifold. In this work, it is shown that this strategy effectively controls the system for a transient condition such as step changes in the loads for the example case.
Resumo:
This paper addresses the problem of optimal constant continuous low-thrust transfer in the context of the restricted two-body problem (R2BP). Using the Pontryagin’s principle, the problem is formulated as a two point boundary value problem (TPBVP) for a Hamiltonian system. Lie transforms obtained through the Deprit method allow us to obtain the canonical mapping of the phase flow as a series in terms of the order of magnitude of the thrust applied. The reachable set of states starting from a given initial condition using optimal control policy is obtained analytically. In addition, a particular optimal transfer can be computed as the solution of a non-linear algebraic equation. Se investiga el uso de series y transformadas de Lie en problemas de optimización de trayectorias de satélites impulsados por motores de bajo empuje
Resumo:
Tactile sensors play an important role in robotics manipulation to perform dexterous and complex tasks. This paper presents a novel control framework to perform dexterous manipulation with multi-fingered robotic hands using feedback data from tactile and visual sensors. This control framework permits the definition of new visual controllers which allow the path tracking of the object motion taking into account both the dynamics model of the robot hand and the grasping force of the fingertips under a hybrid control scheme. In addition, the proposed general method employs optimal control to obtain the desired behaviour in the joint space of the fingers based on an indicated cost function which determines how the control effort is distributed over the joints of the robotic hand. Finally, authors show experimental verifications on a real robotic manipulation system for some of the controllers derived from the control framework.
Resumo:
Control Engineering is an essential part of university electrical engineering education. Normally, a control course requires considerable mathematical as well as engineering knowledge and is consequently regarded as a difficult course by many undergraduate students. From the academic point of view, how to help the students to improve their learning of the control engineering knowledge is therefore an important task which requires careful planning and innovative teaching methods. Traditionally, the didactic teaching approach has been used to teach the students the concepts needed to solve control problems. This approach is commonly adopted in many mathematics intensive courses; however it generally lacks reflection from the students to improve their learning. This paper addresses the practice of action learning and context-based learning models in teaching university control courses. This context-based approach has been practised in teaching several control engineering courses in a university with promising results, particularly in view of student learning performances.
Resumo:
We consider an inversion-based neurocontroller for solving control problems of uncertain nonlinear systems. Classical approaches do not use uncertainty information in the neural network models. In this paper we show how we can exploit knowledge of this uncertainty to our advantage by developing a novel robust inverse control method. Simulations on a nonlinear uncertain second order system illustrate the approach.
Resumo:
We introduce a novel inversion-based neuro-controller for solving control problems involving uncertain nonlinear systems that could also compensate for multi-valued systems. The approach uses recent developments in neural networks, especially in the context of modelling statistical distributions, which are applied to forward and inverse plant models. Provided that certain conditions are met, an estimate of the intrinsic uncertainty for the outputs of neural networks can be obtained using the statistical properties of networks. More generally, multicomponent distributions can be modelled by the mixture density network. In this work a novel robust inverse control approach is obtained based on importance sampling from these distributions. This importance sampling provides a structured and principled approach to constrain the complexity of the search space for the ideal control law. The performance of the new algorithm is illustrated through simulations with example systems.
Resumo:
This paper presents a general methodology for estimating and incorporating uncertainty in the controller and forward models for noisy nonlinear control problems. Conditional distribution modeling in a neural network context is used to estimate uncertainty around the prediction of neural network outputs. The developed methodology circumvents the dynamic programming problem by using the predicted neural network uncertainty to localize the possible control solutions to consider. A nonlinear multivariable system with different delays between the input-output pairs is used to demonstrate the successful application of the developed control algorithm. The proposed method is suitable for redundant control systems and allows us to model strongly non Gaussian distributions of control signal as well as processes with hysteresis.
Resumo:
Flow control in Computer Communication systems is generally a multi-layered structure, consisting of several mechanisms operating independently at different levels. Evaluation of the performance of networks in which different flow control mechanisms act simultaneously is an important area of research, and is examined in depth in this thesis. This thesis presents the modelling of a finite resource computer communication network equipped with three levels of flow control, based on closed queueing network theory. The flow control mechanisms considered are: end-to-end control of virtual circuits, network access control of external messages at the entry nodes and the hop level control between nodes. The model is solved by a heuristic technique, based on an equivalent reduced network and the heuristic extensions to the mean value analysis algorithm. The method has significant computational advantages, and overcomes the limitations of the exact methods. It can be used to solve large network models with finite buffers and many virtual circuits. The model and its heuristic solution are validated by simulation. The interaction between the three levels of flow control are investigated. A queueing model is developed for the admission delay on virtual circuits with end-to-end control, in which messages arrive from independent Poisson sources. The selection of optimum window limit is considered. Several advanced network access schemes are postulated to improve the network performance as well as that of selected traffic streams, and numerical results are presented. A model for the dynamic control of input traffic is developed. Based on Markov decision theory, an optimal control policy is formulated. Numerical results are given and throughput-delay performance is shown to be better with dynamic control than with static control.
Resumo:
This thesis reviews the existing manufacturing control techniques and identifies their practical drawbacks when applied in a high variety, low and medium volume environment. It advocates that the significant drawbacks inherent in such systems, could impair their applications under such manufacturing environment. The key weaknesses identified in the system were: capacity insensitive nature of Material Requirements Planning (MRP); the centralised approach to planning and control applied in Manufacturing Resources Planning (MRP IT); the fact that Kanban can only be used in repetitive environments; Optimised Productivity Techniques's (OPT) inability to deal with transient bottlenecks, etc. On the other hand, cellular systems offer advantages in simplifying the control problems of manufacturing and the thesis reviews systems designed for cellular manufacturing including Distributed Manufacturing Resources Planning (DMRP) and Flexible Manufacturing System (FMS) controllers. It advocates that a newly developed cellular manufacturing control methodology, which is fully automatic, capacity sensitive and responsive, has the potential to resolve the core manufacturing control problems discussed above. It's development is envisaged within the framework of a DMRP environment, in which each cell is provided with its own MRP II system and decision making capability. It is a cellular based closed loop control system, which revolves on single level Bill-Of-Materials (BOM) structure and hence provides better linkage between shop level scheduling activities and relevant entries in the MPS. This provides a better prospect of undertaking rapid response to changes in the status of manufacturing resources and incoming enquiries. Moreover, it also permits automatic evaluation of capacity and due date constraints and hence facilitates the automation of MPS within such system. A prototype cellular manufacturing control model, was developed to demonstrate the underlying principles and operational logic of the cellular manufacturing control methodology, based on the above concept. This was shown to offer significant advantages from the prospective of operational planning and control. Results of relevant tests proved that the model is capable of producing reasonable due date and undertake automation of MPS. The overall performance of the model proved satisfactory and acceptable.
Resumo:
The main theme of research of this project concerns the study of neutral networks to control uncertain and non-linear control systems. This involves the control of continuous time, discrete time, hybrid and stochastic systems with input, state or output constraints by ensuring good performances. A great part of this project is devoted to the opening of frontiers between several mathematical and engineering approaches in order to tackle complex but very common non-linear control problems. The objectives are: 1. Design and develop procedures for neutral network enhanced self-tuning adaptive non-linear control systems; 2. To design, as a general procedure, neural network generalised minimum variance self-tuning controller for non-linear dynamic plants (Integration of neural network mapping with generalised minimum variance self-tuning controller strategies); 3. To develop a software package to evaluate control system performances using Matlab, Simulink and Neural Network toolbox. An adaptive control algorithm utilising a recurrent network as a model of a partial unknown non-linear plant with unmeasurable state is proposed. Appropriately, it appears that structured recurrent neural networks can provide conveniently parameterised dynamic models for many non-linear systems for use in adaptive control. Properties of static neural networks, which enabled successful design of stable adaptive control in the state feedback case, are also identified. A survey of the existing results is presented which puts them in a systematic framework showing their relation to classical self-tuning adaptive control application of neural control to a SISO/MIMO control. Simulation results demonstrate that the self-tuning design methods may be practically applicable to a reasonably large class of unknown linear and non-linear dynamic control systems.
Resumo:
This work introduces a novel inversion-based neurocontroller for solving control problems involving uncertain nonlinear systems which could also compensate for multi-valued systems. The approach uses recent developments in neural networks, especially in the context of modelling statistical distributions, which are applied to forward and inverse plant models. Provided that certain conditions are met, an estimate of the intrinsic uncertainty for the outputs of neural networks can be obtained using the statistical properties of networks. More generally, multicomponent distributions can be modelled by the mixture density network. Based on importance sampling from these distributions a novel robust inverse control approach is obtained. This importance sampling provides a structured and principled approach to constrain the complexity of the search space for the ideal control law. The developed methodology circumvents the dynamic programming problem by using the predicted neural network uncertainty to localise the possible control solutions to consider. Convergence of the output error for the proposed control method is verified by using a Lyapunov function. Several simulation examples are provided to demonstrate the efficiency of the developed control method. The manner in which such a method is extended to nonlinear multi-variable systems with different delays between the input-output pairs is considered and demonstrated through simulation examples.
Resumo:
Online communities of consumption (OCCs) represent highly diverse groups of consumers whose interests are not always aligned. Social control in OCCs aims to effectively manage problems arising from this heterogeneity. Extant literature on social control in OCCs is fragmented as some studies focus on the principles of social control, while others focus on the implementation. Moreover, the domain is undertheorized. This article integrates the disparate literature on social control in OCCs providing a first unified conceptualization of the topic. The authors conceptualize social control as a system, or configuration, of moderation practices. Moderation practices are executed during interactions operating under different governance structures (market, hierarchy, and clan) and serving different purposes (interaction initiation, maintenance, and termination). From this conceptualization, important areas of future research emerge and research questions are developed. The framework also serves as a community management tool for OCC managers, enabling the diagnosis of social control problems and the elaboration of strategies and tactics to address them.