844 resultados para MARKOV DECISION-PROCESSES


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim in this paper is to allocate the `sleep time' of the individual sensors in an intrusion detection application so that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We propose two novel reinforcement learning (RL) based algorithms that attempt to minimize a certain long-run average cost objective. Both our algorithms incorporate feature-based representations to handle the curse of dimensionality associated with the underlying partially-observable Markov decision process (POMDP). Further, the feature selection scheme used in our algorithms intelligently manages the energy cost and tracking cost factors, which in turn assists the search for the optimal sleeping policy. We also extend these algorithms to a setting where the intruder's mobility model is not known by incorporating a stochastic iterative scheme for estimating the mobility model. The simulation results on a synthetic 2-d network setting are encouraging.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In geographical forwarding of packets in a large wireless sensor network (WSN) with sleep-wake cycling nodes, we are interested in the local decision problem faced by a node that has ``custody'' of a packet and has to choose one among a set of next-hop relay nodes to forward the packet toward the sink. Each relay is associated with a ``reward'' that summarizes the benefit of forwarding the packet through that relay. We seek a solution to this local problem, the idea being that such a solution, if adopted by every node, could provide a reasonable heuristic for the end-to-end forwarding problem. Toward this end, we propose a local relay selection problem consisting of a forwarding node and a collection of relay nodes, with the relays waking up sequentially at random times. At each relay wake-up instant, the forwarder can choose to probe a relay to learn its reward value, based on which the forwarder can then decide whether to stop (and forward its packet to the chosen relay) or to continue to wait for further relays to wake up. The forwarder's objective is to select a relay so as to minimize a combination of waiting delay, reward, and probing cost. The local decision problem can be considered as a variant of the asset selling problem studied in the operations research literature. We formulate the local problem as a Markov decision process (MDP) and characterize the solution in terms of stopping sets and probing sets. We provide results illustrating the structure of the stopping sets, namely, the (lower bound) threshold and the stage independence properties. Regarding the probing sets, we make an interesting conjecture that these sets are characterized by upper bounds. Through simulation experiments, we provide valuable insights into the performance of the optimal local forwarding and its use as an end-to-end forwarding heuristic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL) algorithms to obtain dynamic TSC policies. We model each traffic signal junction as an independent agent. An agent decides the signal duration of its phases in a round-robin (RR) manner using multi-agent Q-learning with either is an element of-greedy or UCB 3] based exploration strategies. It updates its Q-factors based on the cost feedback signal received from its neighbouring agents. This feedback signal can be easily constructed and is shown to be effective in minimizing the average delay of the vehicles in the network. We show through simulations over VISSIM that our algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm 15] over two real road networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper considers the problem of receive antenna selection (AS) in a multiple-antenna communication system having a single radio-frequency (RF) chain. The AS decisions are based on noisy channel estimates obtained using known pilot symbols embedded in the data packets. The goal here is to minimize the average packet error rate (PER) by exploiting the known temporal correlation of the channel. As the underlying channels are only partially observed using the pilot symbols, the problem of AS for PER minimization is cast into a partially observable Markov decision process (POMDP) framework. Under mild assumptions, the optimality of a myopic policy is established for the two-state channel case. Moreover, two heuristic AS schemes are proposed based on a weighted combination of the estimated channel states on the different antennas. These schemes utilize the continuous valued received pilot symbols to make the AS decisions, and are shown to offer performance comparable to the POMDP approach, which requires one to quantize the channel and observations to a finite set of states. The performance improvement offered by the POMDP solution and the proposed heuristic solutions relative to existing AS training-based approaches is illustrated using Monte Carlo simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A person walks along a line (which could be an idealisation of a forest trail, for example), placing relays as he walks, in order to create a multihop network for connecting a sensor at a point along the line to a sink at the start of the line. The potential placement points are equally spaced along the line, and at each such location the decision to place or not to place a relay is based on link quality measurements to the previously placed relays. The location of the sensor is unknown apriori, and is discovered as the deployment agent walks. In this paper, we extend our earlier work on this class of problems to include the objective of achieving a 2-connected multihop network. We propose a network cost objective that is additive over the deployed relays, and accounts for possible alternate routing over the multiple available paths. As in our earlier work, the problem is formulated as a Markov decision process. Placement algorithms are obtained for two source location models, which yield a discounted cost MDP and an average cost MDP. In each case we obtain structural results for an optimal policy, and perform a numerical study that provides insights into the advantages and disadvantages of multi-connectivity. We validate the results obtained from numerical study experimentally in a forest-like environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modern robots are increasingly expected to function in uncertain and dynamically challenging environments, often in proximity with humans. In addition, wide scale adoption of robots requires on-the-fly adaptability of software for diverse application. These requirements strongly suggest the need to adopt formal representations of high level goals and safety specifications, especially as temporal logic formulas. This approach allows for the use of formal verification techniques for controller synthesis that can give guarantees for safety and performance. Robots operating in unstructured environments also face limited sensing capability. Correctly inferring a robot's progress toward high level goal can be challenging.

This thesis develops new algorithms for synthesizing discrete controllers in partially known environments under specifications represented as linear temporal logic (LTL) formulas. It is inspired by recent developments in finite abstraction techniques for hybrid systems and motion planning problems. The robot and its environment is assumed to have a finite abstraction as a Partially Observable Markov Decision Process (POMDP), which is a powerful model class capable of representing a wide variety of problems. However, synthesizing controllers that satisfy LTL goals over POMDPs is a challenging problem which has received only limited attention.

This thesis proposes tractable, approximate algorithms for the control synthesis problem using Finite State Controllers (FSCs). The use of FSCs to control finite POMDPs allows for the closed system to be analyzed as finite global Markov chain. The thesis explicitly shows how transient and steady state behavior of the global Markov chains can be related to two different criteria with respect to satisfaction of LTL formulas. First, the maximization of the probability of LTL satisfaction is related to an optimization problem over a parametrization of the FSC. Analytic computation of gradients are derived which allows the use of first order optimization techniques.

The second criterion encourages rapid and frequent visits to a restricted set of states over infinite executions. It is formulated as a constrained optimization problem with a discounted long term reward objective by the novel utilization of a fundamental equation for Markov chains - the Poisson equation. A new constrained policy iteration technique is proposed to solve the resulting dynamic program, which also provides a way to escape local maxima.

The algorithms proposed in the thesis are applied to the task planning and execution challenges faced during the DARPA Autonomous Robotic Manipulation - Software challenge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O presente estudo visa analisar os processos de transformações econômicos, políticos e socioambientais decorrentes da instalação dos grandes empreendimentos em territórios tradicionais da pesca, mais especificamente, as experiências da comunidade pesqueira da Ilha da Madeira/baía de Sepetiba/Itaguaí-RJ, desde a instalação da Cia Ingá Mercantil (1964) até os dias atuais, identificando, nos vários ciclos de industrialização: os fatores endógenos e exógenos que contribuem para a vulnerabilidade ou sustentabilidade da pesca artesanal e do meio ambiente. Sinalizando, nesta experiência, alguns aspectos que possam servir de referência para outras comunidades pesqueiras que vivenciam problemas similares. Introduzimos a problemática a partir da contextualização da pesca artesanal no Brasil, as políticas, a regulamentação da atividade, a organização dos pescadores. Ao evidenciar a pesca artesanal no estado do Rio de Janeiro, destacamos os conflitos socioambientais decorrentes da instalação de complexos industriais em territórios tradicionalmente ocupados por pescadores, com destaque para os conflitos relativos à instalação do Porto de Açu, em São João da Barra/RJ e os gasodutos para a refinaria de petróleo na baía de Guanabara. Aprofundamos a temática, a partir de um estudo de caso na Ilha da Madeira, baía de Sepetiba, Itaguaí/RJ. Esse território, tradicionalmente ocupado por pescadores, mergulhou em uma crise socioambiental a partir da década de 60 e, desde então, vem passando por diversas transformações: alteração radical da paisagem, degradação ambiental além do sufocamento da atividade pesqueira. Os fatos são evidenciados por meio de pesquisas bibliográficas, documentais, registros fotográficos, sobretudo, história de oral. Em entrevistas com informantes-chave resgatamos as memórias pessoais e, nesse percurso, fomos recuperando parte da história do território. Caracterizando a paisagem, a vida e trabalho dos pescadores, a cultura local: tradições, costumes, valores, aspectos materiais e simbólicos, em um período anterior a chegada das indústrias, quando a Ilha da Madeira era de fato, uma Ilha. Em suas narrativas os entrevistados foram pontuando as sucessões dos trágicos acontecimentos que ocorreram após a instalação da Ingá até os dias atuais. Esses fatos são demarcados em ciclos que compõem a crise socioambiental no território. Um estudo que retrata a injustiça ambiental, a vulnerabilidade de uma comunidade pesqueira, cuja experiência serve de alerta para outras comunidades tradicionais. Ressaltamos a importância das articulações entre os movimentos locais com instâncias extras locais, sinalizando para a necessidade de democratização dos processos decisórios e da gestão compartilhada dos recursos de uso comum. Também pontuamos a urgência de superação do paradigma que dissocia desenvolvimento, natureza e sociedade, fortalecendo uma lógica de produção que, ao se impor como hegemônica sufoca todas outras formas de organização do trabalho.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper deals with the resource allocation problem aimed at maximizing users' perception of quality in wireless channels with time-varying capacity. First of all, we model the subjective quality-aware scheduling problem in the framework of Markovian decision processes. Then, given that the obtaining of the optimal solution of this model is unachievable, we propose a simple scheduling index rule with closed-form expression by using a methodology based on Whittle approach. Finally, we analyze the performance of the achieved scheduling proposal in several relevant scenarios, concluding that it outperforms the most popular existing resource allocation strategies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Durante os dois mandatos presidenciais de Luiz Inácio Lula da Silva (2003-2010), percebeu-se, em virtude de pressões intra e extraburocráticas e de causalidades sistêmicas, maior acentuação do esboroamento da histórica condição insular do Ministério das Relações Exteriores (MRE). A participação de novos entes que não o Itamaraty na configuração da política externa, notadamente em seu vetor de execução, enseja novas agendas cooperativas e processos decisórios. Atores da burocracia federal, como os ministérios, vocalizam preferências que influenciam o jogo interburocrático e têm o condão de estabelecer possíveis pontes com a instituição diplomática, unidade de decisão por excelência. Na perspectiva intraburocrática, a ascensão de corrente de ação e de pensamento dos autonomistas, frente aos institucionalistas pragmáticos, permite escolhas de inserção internacional como o reforço da perspectiva sul-sul, na qual se inserem as parcerias com a África, o que indica a inexistência de monolitismo de opiniões no interior do MRE. Essa dinâmica faz-se presente e é necessária para o entendimento da Cooperação Brasileira para o Desenvolvimento Internacional (CBDI), tipo de Cooperação Sul-Sul (CSS) do Brasil que tem na Cooperação Técnica, Científica e Tecnológica (CTC&T) em segurança alimentar uma de suas modalidades mais atuantes e complexas. Convencionada como instrumento de política externa durante a ascendência dos autonomistas, corrente influenciada por quadros do Partido dos Trabalhadores, a cooperação em segurança alimentar teve o continente africano como locus primordial de manifestação. Embasado na internacionalização de políticas públicas domésticas, o compartilhamento de conhecimentos nas agendas de combate à fome, de combate à pobreza e de desenvolvimento agrário é fenômeno tributário da abertura da caixa preta estatal, o que ratifica o argumento de que há correlação entre níveis de análise. As diversas iniciativas cooperativas para com parceiros da outra margem do Atlântico Sul, eivadas de componente retórico de promoção de ordem internacional menos assimétrica, donde também subjace a busca consecução de interesses diretos e indiretos dos formuladores diplomáticos, guardam relação com as diretrizes mais gerais da política externa articulada no período estudado nesta dissertação.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Production responsiveness refers to the ability of a production system to achieve its operational goals in the presence of supplier, internal and customer disturbances, where disturbances are those sources of change which occur independently of the system's intentions. A set of audit tools for assessing the responsiveness of production operations is being prepared as part of an EPSRC funded investigation. These tools are based on the idea that the ability to respond is linked to: the nature of the disturbances or changes requiring a response; their impact on production goals; and the inherent response capabilities of the operation. These response capabilities include information gathering and processing (to detect disturbances and production conditions), decision processes (which initiate system responses to disturbances) and various types of process flexibilities and buffers (which provide the physical means of dealing with disturbances). The paper discusses concepts and issues associated with production responsiveness, describes the audit tools that have been developed and illustrates their use in the context of a steel manufacturing plant.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The partially observable Markov decision process (POMDP) provides a popular framework for modelling spoken dialogue. This paper describes how the expectation propagation algorithm (EP) can be used to learn the parameters of the POMDP user model. Various special probability factors applicable to this task are presented, which allow the parameters be to learned when the structure of the dialogue is complex. No annotations, neither the true dialogue state nor the true semantics of user utterances, are required. Parameters optimised using the proposed techniques are shown to improve the performance of both offline transcription experiments as well as simulated dialogue management performance. ©2010 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that the dialogue statemust encode all relevent information obtained during the dialogue prior to that state. This can be achieved by combining the user goal, the dialogue history, and the last user action to form the dialogue state. In addition, to gain robustness to input errors, dialogue must be modeled as a Partially Observable Markov Decision Process (POMDP) and hence, a distribution over all possible states must be maintained at every dialogue turn. This poses a potential computational limitation since there can be a very large number of dialogue states. The Hidden Information State model provides a principled way of ensuring tractability in a POMDP-based dialogue model. The key feature of this model is the grouping of user goals into partitions that are dynamically built during the dialogue. In this article, we extend this model further to incorporate the notion of complements. This allows for a more complex user goal to be represented, and it enables an effective pruning technique to be implemented that preserves the overall system performance within a limited computational resource more effectively than existing approaches. © 2011 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A set of audit tools is being prepared for assessing the response capability of a production operation, as part of an EPSRC1 funded investigation into improving the responsiveness of manufacturing production systems. These tools are based on the idea that the ability to respond is linked to i) the nature of the disturbances or changes requiring a response, ii) their impact on production goals and iii) the decision processes which initiate system responses to disturbances.