265 resultados para Locally optimal reforms
em Indian Institute of Science - Bangalore - Índia
Resumo:
We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
This paper addresses the problem of automated multiagent search in an unknown environment. Autonomous agents equipped with sensors carry out a search operation in a search space, where the uncertainty, or lack of information about the environment, is known a priori as an uncertainty density distribution function. The agents are deployed in the search space to maximize single step search effectiveness. The centroidal Voronoi configuration, which achieves a locally optimal deployment, forms the basis for the proposed sequential deploy and search strategy. It is shown that with the proposed control law the agent trajectories converge in a globally asymptotic manner to the centroidal Voronoi configuration. Simulation experiments are provided to validate the strategy. Note to Practitioners-In this paper, searching an unknown region to gather information about it is modeled as a problem of using search as a means of reducing information uncertainty about the region. Moreover, multiple automated searchers or agents are used to carry out this operation optimally. This problem has many applications in search and surveillance operations using several autonomous UAVs or mobile robots. The concept of agents converging to the centroid of their Voronoi cells, weighted with the uncertainty density, is used to design a search strategy named as sequential deploy and search. Finally, the performance of the strategy is validated using simulations.
Resumo:
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
Resumo:
This article considers a class of deploy and search strategies for multi-robot systems and evaluates their performance. The application framework used is deployment of a system of autonomous mobile robots equipped with required sensors in a search space to gather information. The lack of information about the search space is modelled as an uncertainty density distribution. The agents are deployed to maximise single-step search effectiveness. The centroidal Voronoi configuration, which achieves a locally optimal deployment, forms the basis for sequential deploy and search (SDS) and combined deploy and search (CDS) strategies. Completeness results are provided for both search strategies. The deployment strategy is analysed in the presence of constraints on robot speed and limit on sensor range for the convergence of trajectories with corresponding control laws responsible for the motion of robots. SDS and CDS strategies are compared with standard greedy and random search strategies on the basis of time taken to achieve reduction in the uncertainty density below a desired level. The simulation experiments reveal several important issues related to the dependence of the relative performances of the search strategies on parameters such as the number of robots, speed of robots and their sensor range limits.
Resumo:
In this paper a generalisation of the Voronoi partition is used for locational optimisation of facilities having different service capabilities and limited range or reach. The facilities can be stationary, such as base stations in a cellular network, hospitals, schools, etc., or mobile units, such as multiple unmanned aerial vehicles, automated guided vehicles, etc., carrying sensors, or mobile units carrying relief personnel and materials. An objective function for optimal deployment of the facilities is formulated, and its critical points are determined. The locally optimal deployment is shown to be a generalised centroidal Voronoi configuration in which the facilities are located at the centroids of the corresponding generalised Voronoi cells. The problem is formulated for more general mobile facilities, and formal results on the stability, convergence and spatial distribution of the proposed control laws responsible for the motion of the agents carrying facilities, under some constraints on the agents' speed and limit on the sensor range, are provided. The theoretical results are supported with illustrative simulation results.
Resumo:
Delaunay and Gabriel graphs are widely studied geo-metric proximity structures. Motivated by applications in wireless routing, relaxed versions of these graphs known as Locally Delaunay Graphs (LDGs) and Lo-cally Gabriel Graphs (LGGs) have been proposed. We propose another generalization of LGGs called Gener-alized Locally Gabriel Graphs (GLGGs) in the context when certain edges are forbidden in the graph. Unlike a Gabriel Graph, there is no unique LGG or GLGG for a given point set because no edge is necessarily in-cluded or excluded. This property allows us to choose an LGG/GLGG that optimizes a parameter of interest in the graph. We show that computing an edge max-imum GLGG for a given problem instance is NP-hard and also APX-hard. We also show that computing an LGG on a given point set with dilation ≤k is NP-hard. Finally, we give an algorithm to verify whether a given geometric graph G= (V, E) is a valid LGG.
Resumo:
Wireless adhoc networks transmit information from a source to a destination via multiple hops in order to save energy and, thus, increase the lifetime of battery-operated nodes. The energy savings can be especially significant in cooperative transmission schemes, where several nodes cooperate during one hop to forward the information to the next node along a route to the destination. Finding the best multi-hop transmission policy in such a network which determines nodes that are involved in each hop, is a very important problem, but also a very difficult one especially when the physical wireless channel behavior is to be accounted for and exploited. We model the above optimization problem for randomly fading channels as a decentralized control problem - the channel observations available at each node define the information structure, while the control policy is defined by the power and phase of the signal transmitted by each node. In particular, we consider the problem of computing an energy-optimal cooperative transmission scheme in a wireless network for two different channel fading models: (i) slow fading channels, where the channel gains of the links remain the same for a large number of transmissions, and (ii) fast fading channels, where the channel gains of the links change quickly from one transmission to another. For slow fading, we consider a factored class of policies (corresponding to local cooperation between nodes), and show that the computation of an optimal policy in this class is equivalent to a shortest path computation on an induced graph, whose edge costs can be computed in a decentralized manner using only locally available channel state information (CSI). For fast fading, both CSI acquisition and data transmission consume energy. Hence, we need to jointly optimize over both these; we cast this optimization problem as a large stochastic optimization problem. We then jointly optimize over a set of CSI functions of the local channel states, and a c- - orresponding factored class of control poli.
Resumo:
We examine the effect of subdividing the potential barrier along the reaction coordinate on Kramers' escape rate for a model potential. Using the known supersymmetric potential approach, we show the existence of an optimal number of subdivisions that maximizes the rate.
Resumo:
The problem of optimal scheduling of the generation of a hydro-thermal power system that is faced with a shortage of energy is studied. The deterministic version of the problem is first analyzed, and the results are then extended to cases where the loads and the hydro inflows are random variables.
Resumo:
This paper deals with the optimal load flow problem in a fixed-head hydrothermal electric power system. Equality constraints on the volume of water available for active power generation at the hydro plants as well as inequality constraints on the reactive power generation at the voltage controlled buses are imposed. Conditions for optimal load flow are derived and a successive approximation algorithm for solving the optimal generation schedule is developed. Computer implementation of the algorithm is discussed, and the results obtained from the computer solution of test systems are presented.
Resumo:
Systems of learning automata have been studied by various researchers to evolve useful strategies for decision making under uncertainity. Considered in this paper are a class of hierarchical systems of learning automata where the system gets responses from its environment at each level of the hierarchy. A classification of such sequential learning tasks based on the complexity of the learning problem is presented. It is shown that none of the existing algorithms can perform in the most general type of hierarchical problem. An algorithm for learning the globally optimal path in this general setting is presented, and its convergence is established. This algorithm needs information transfer from the lower levels to the higher levels. Using the methodology of estimator algorithms, this model can be generalized to accommodate other kinds of hierarchical learning tasks.
Resumo:
We consider the problem of estimating the optimal parameter trajectory over a finite time interval in a parameterized stochastic differential equation (SDE), and propose a simulation-based algorithm for this purpose. Towards this end, we consider a discretization of the SDE over finite time instants and reformulate the problem as one of finding an optimal parameter at each of these instants. A stochastic approximation algorithm based on the smoothed functional technique is adapted to this setting for finding the optimal parameter trajectory. A proof of convergence of the algorithm is presented and results of numerical experiments over two different settings are shown. The algorithm is seen to exhibit good performance. We also present extensions of our framework to the case of finding optimal parameterized feedback policies for controlled SDE and present numerical results in this scenario as well.
Resumo:
In this paper, we consider the bi-criteria single machine scheduling problem of n jobs with a learning effect. The two objectives considered are the total completion time (TC) and total absolute differences in completion times (TADC). The objective is to find a sequence that performs well with respect to both the objectives: the total completion time and the total absolute differences in completion times. In an earlier study, a method of solving bi-criteria transportation problem is presented. In this paper, we use the methodology of solvin bi-criteria transportation problem, to our bi-criteria single machine scheduling problem with a learning effect, and obtain the set of optimal sequences,. Numerical examples are presented for illustrating the applicability and ease of understanding.
Resumo:
In a letter RauA proposed a new method for designing statefeedback controllers using eigenvalue sensitivity matrices. However, there appears to be a conceptual mistake in the procedure, or else it is unduly restricted in its applicability. In particular the equation — BR~lBTK = A/.I, in which K is a positive-definite symmetric matrix.