989 resultados para Locally optimal reforms
Resumo:
We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.
Resumo:
Well known tariff reform rules that are guaranteed to increase welfare will not necessarily increase market access, while rules that are guaranteed to increase market access will not necessarily increase welfare. The present paper proposes a new set of tariff reforms that can achieve both objectives at the same time.
Resumo:
The policy reform literature is primarily concerned with the construction of reforms that yield welfare gains. By contrast, this paper’s contribution is to develop a theoretical concept for which the focus is upon the sizes of welfare gains accruing from policy reforms rather than upon their signs. In undertaking this task, and by focusing on tariff reforms, we introduce the concept of a steepest ascent policy reform, which is a locally optimal reform in the sense that it achieves the highest marginal gain in utility of any feasible local reform. We argue that this reform presents itself as a natural benchmark for the evaluation of the welfare effectiveness of other popular tariff reforms such as the proportional tariff reduction and the concertina rules, since it provides the maximal welfare gain of all possible local reforms. We derive properties of the steepest ascent tariff reform, construct an index to measure the relative welfare effectiveness of any given tariff reform, determine conditions under which proportional and concertina reforms are locally optimal and provide illustrative examples.
Resumo:
Consideramos o problema de controlo óptimo de tempo mínimo para sistemas de controlo mono-entrada e controlo afim num espaço de dimensão finita com condições inicial e final fixas, onde o controlo escalar toma valores num intervalo fechado. Quando aplicamos o método de tiro a este problema, vários obstáculos podem surgir uma vez que a função de tiro não é diferenciável quando o controlo é bang-bang. No caso bang-bang os tempos conjugados são teoricamente bem definidos para este tipo de sistemas de controlo, contudo os algoritmos computacionais directos disponíveis são de difícil aplicação. Por outro lado, no caso suave o conceito teórico e prático de tempos conjugados é bem conhecido, e ferramentas computacionais eficazes estão disponíveis. Propomos um procedimento de regularização para o qual as soluções do problema de tempo mínimo correspondente dependem de um parâmetro real positivo suficientemente pequeno e são definidas por funções suaves em relação à variável tempo, facilitando a aplicação do método de tiro simples. Provamos, sob hipóteses convenientes, a convergência forte das soluções do problema regularizado para a solução do problema inicial, quando o parâmetro real tende para zero. A determinação de tempos conjugados das trajectórias localmente óptimas do problema regularizado enquadra-se na teoria suave conhecida. Provamos, sob hipóteses adequadas, a convergência do primeiro tempo conjugado do problema regularizado para o primeiro tempo conjugado do problema inicial bang-bang, quando o parâmetro real tende para zero. Consequentemente, obtemos um algoritmo eficiente para a computação de tempos conjugados no caso bang-bang.
Resumo:
We present a method for determining the globally optimal on-line learning rule for a soft committee machine under a statistical mechanics framework. This rule maximizes the total reduction in generalization error over the whole learning process. A simple example demonstrates that the locally optimal rule, which maximizes the rate of decrease in generalization error, may perform poorly in comparison.
Resumo:
We present a method for determining the globally optimal on-line learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization error was considered. We maximize the total reduction in generalization error over the whole learning process and show how the resulting rule can significantly outperform the locally optimal rule.
Resumo:
This paper presents a mapping and navigation system for a mobile robot, which uses vision as its sole sensor modality. The system enables the robot to navigate autonomously, plan paths and avoid obstacles using a vision based topometric map of its environment. The map consists of a globally-consistent pose-graph with a local 3D point cloud attached to each of its nodes. These point clouds are used for direction independent loop closure and to dynamically generate 2D metric maps for locally optimal path planning. Using this locally semi-continuous metric space, the robot performs shortest path planning instead of following the nodes of the graph --- as is done with most other vision-only navigation approaches. The system exploits the local accuracy of visual odometry in creating local metric maps, and uses pose graph SLAM, visual appearance-based place recognition and point clouds registration to create the topometric map. The ability of the framework to sustain vision-only navigation is validated experimentally, and the system is provided as open-source software.
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
This paper addresses the problem of automated multiagent search in an unknown environment. Autonomous agents equipped with sensors carry out a search operation in a search space, where the uncertainty, or lack of information about the environment, is known a priori as an uncertainty density distribution function. The agents are deployed in the search space to maximize single step search effectiveness. The centroidal Voronoi configuration, which achieves a locally optimal deployment, forms the basis for the proposed sequential deploy and search strategy. It is shown that with the proposed control law the agent trajectories converge in a globally asymptotic manner to the centroidal Voronoi configuration. Simulation experiments are provided to validate the strategy. Note to Practitioners-In this paper, searching an unknown region to gather information about it is modeled as a problem of using search as a means of reducing information uncertainty about the region. Moreover, multiple automated searchers or agents are used to carry out this operation optimally. This problem has many applications in search and surveillance operations using several autonomous UAVs or mobile robots. The concept of agents converging to the centroid of their Voronoi cells, weighted with the uncertainty density, is used to design a search strategy named as sequential deploy and search. Finally, the performance of the strategy is validated using simulations.
Resumo:
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
Resumo:
This article considers a class of deploy and search strategies for multi-robot systems and evaluates their performance. The application framework used is deployment of a system of autonomous mobile robots equipped with required sensors in a search space to gather information. The lack of information about the search space is modelled as an uncertainty density distribution. The agents are deployed to maximise single-step search effectiveness. The centroidal Voronoi configuration, which achieves a locally optimal deployment, forms the basis for sequential deploy and search (SDS) and combined deploy and search (CDS) strategies. Completeness results are provided for both search strategies. The deployment strategy is analysed in the presence of constraints on robot speed and limit on sensor range for the convergence of trajectories with corresponding control laws responsible for the motion of robots. SDS and CDS strategies are compared with standard greedy and random search strategies on the basis of time taken to achieve reduction in the uncertainty density below a desired level. The simulation experiments reveal several important issues related to the dependence of the relative performances of the search strategies on parameters such as the number of robots, speed of robots and their sensor range limits.
Resumo:
In this paper a generalisation of the Voronoi partition is used for locational optimisation of facilities having different service capabilities and limited range or reach. The facilities can be stationary, such as base stations in a cellular network, hospitals, schools, etc., or mobile units, such as multiple unmanned aerial vehicles, automated guided vehicles, etc., carrying sensors, or mobile units carrying relief personnel and materials. An objective function for optimal deployment of the facilities is formulated, and its critical points are determined. The locally optimal deployment is shown to be a generalised centroidal Voronoi configuration in which the facilities are located at the centroids of the corresponding generalised Voronoi cells. The problem is formulated for more general mobile facilities, and formal results on the stability, convergence and spatial distribution of the proposed control laws responsible for the motion of the agents carrying facilities, under some constraints on the agents' speed and limit on the sensor range, are provided. The theoretical results are supported with illustrative simulation results.
Resumo:
The current power grid is on the cusp of modernization due to the emergence of distributed generation and controllable loads, as well as renewable energy. On one hand, distributed and renewable generation is volatile and difficult to dispatch. On the other hand, controllable loads provide significant potential for compensating for the uncertainties. In a future grid where there are thousands or millions of controllable loads and a large portion of the generation comes from volatile sources like wind and solar, distributed control that shifts or reduces the power consumption of electric loads in a reliable and economic way would be highly valuable.
Load control needs to be conducted with network awareness. Otherwise, voltage violations and overloading of circuit devices are likely. To model these effects, network power flows and voltages have to be considered explicitly. However, the physical laws that determine power flows and voltages are nonlinear. Furthermore, while distributed generation and controllable loads are mostly located in distribution networks that are multiphase and radial, most of the power flow studies focus on single-phase networks.
This thesis focuses on distributed load control in multiphase radial distribution networks. In particular, we first study distributed load control without considering network constraints, and then consider network-aware distributed load control.
Distributed implementation of load control is the main challenge if network constraints can be ignored. In this case, we first ignore the uncertainties in renewable generation and load arrivals, and propose a distributed load control algorithm, Algorithm 1, that optimally schedules the deferrable loads to shape the net electricity demand. Deferrable loads refer to loads whose total energy consumption is fixed, but energy usage can be shifted over time in response to network conditions. Algorithm 1 is a distributed gradient decent algorithm, and empirically converges to optimal deferrable load schedules within 15 iterations.
We then extend Algorithm 1 to a real-time setup where deferrable loads arrive over time, and only imprecise predictions about future renewable generation and load are available at the time of decision making. The real-time algorithm Algorithm 2 is based on model-predictive control: Algorithm 2 uses updated predictions on renewable generation as the true values, and computes a pseudo load to simulate future deferrable load. The pseudo load consumes 0 power at the current time step, and its total energy consumption equals the expectation of future deferrable load total energy request.
Network constraints, e.g., transformer loading constraints and voltage regulation constraints, bring significant challenge to the load control problem since power flows and voltages are governed by nonlinear physical laws. Remarkably, distribution networks are usually multiphase and radial. Two approaches are explored to overcome this challenge: one based on convex relaxation and the other that seeks a locally optimal load schedule.
To explore the convex relaxation approach, a novel but equivalent power flow model, the branch flow model, is developed, and a semidefinite programming relaxation, called BFM-SDP, is obtained using the branch flow model. BFM-SDP is mathematically equivalent to a standard convex relaxation proposed in the literature, but numerically is much more stable. Empirical studies show that BFM-SDP is numerically exact for the IEEE 13-, 34-, 37-, 123-bus networks and a real-world 2065-bus network, while the standard convex relaxation is numerically exact for only two of these networks.
Theoretical guarantees on the exactness of convex relaxations are provided for two types of networks: single-phase radial alternative-current (AC) networks, and single-phase mesh direct-current (DC) networks. In particular, for single-phase radial AC networks, we prove that a second-order cone program (SOCP) relaxation is exact if voltage upper bounds are not binding; we also modify the optimal load control problem so that its SOCP relaxation is always exact. For single-phase mesh DC networks, we prove that an SOCP relaxation is exact if 1) voltage upper bounds are not binding, or 2) voltage upper bounds are uniform and power injection lower bounds are strictly negative; we also modify the optimal load control problem so that its SOCP relaxation is always exact.
To seek a locally optimal load schedule, a distributed gradient-decent algorithm, Algorithm 9, is proposed. The suboptimality gap of the algorithm is rigorously characterized and close to 0 for practical networks. Furthermore, unlike the convex relaxation approach, Algorithm 9 ensures a feasible solution. The gradients used in Algorithm 9 are estimated based on a linear approximation of the power flow, which is derived with the following assumptions: 1) line losses are negligible; and 2) voltages are reasonably balanced. Both assumptions are satisfied in practical distribution networks. Empirical results show that Algorithm 9 obtains 70+ times speed up over the convex relaxation approach, at the cost of a suboptimality within numerical precision.
Resumo:
Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.