905 resultados para Reinforcement Learning,resource-constrained devices,iOS devices,on-device machine learning
Resumo:
We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
Resumo:
This paper proposes a hybrid coordination method for behavior-based control architectures. The hybrid method takes advantages of the robustness and modularity in competitive approaches as well as optimized trajectories in cooperative ones. This paper shows the feasibility of applying this hybrid method with a 3D-navigation to an autonomous underwater vehicle (AUV). The behaviors are learnt online by means of reinforcement learning. A continuous Q-learning implemented with a feed-forward neural network is employed. Realistic simulations were carried out. The results obtained show the good performance of the hybrid method on behavior coordination as well as the convergence of the behaviors
Resumo:
This paper presents a hybrid behavior-based scheme using reinforcement learning for high-level control of autonomous underwater vehicles (AUVs). Two main features of the presented approach are hybrid behavior coordination and semi on-line neural-Q_learning (SONQL). Hybrid behavior coordination takes advantages of robustness and modularity in the competitive approach as well as efficient trajectories in the cooperative approach. SONQL, a new continuous approach of the Q_learning algorithm with a multilayer neural network is used to learn behavior state/action mapping online. Experimental results show the feasibility of the presented approach for AUVs
Resumo:
This paper proposes a field application of a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot in cable tracking task. The learning system is characterized by using a direct policy search method for learning the internal state/action mapping. Policy only algorithms may suffer from long convergence times when dealing with real robotics. In order to speed up the process, the learning phase has been carried out in a simulated environment and, in a second step, the policy has been transferred and tested successfully on a real robot. Future steps plan to continue the learning process on-line while on the real robot while performing the mentioned task. We demonstrate its feasibility with real experiments on the underwater robot ICTINEU AUV
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.
Resumo:
We discuss the development and performance of a low-power sensor node (hardware, software and algorithms) that autonomously controls the sampling interval of a suite of sensors based on local state estimates and future predictions of water flow. The problem is motivated by the need to accurately reconstruct abrupt state changes in urban watersheds and stormwater systems. Presently, the detection of these events is limited by the temporal resolution of sensor data. It is often infeasible, however, to increase measurement frequency due to energy and sampling constraints. This is particularly true for real-time water quality measurements, where sampling frequency is limited by reagent availability, sensor power consumption, and, in the case of automated samplers, the number of available sample containers. These constraints pose a significant barrier to the ubiquitous and cost effective instrumentation of large hydraulic and hydrologic systems. Each of our sensor nodes is equipped with a low-power microcontroller and a wireless module to take advantage of urban cellular coverage. The node persistently updates a local, embedded model of flow conditions while IP-connectivity permits each node to continually query public weather servers for hourly precipitation forecasts. The sampling frequency is then adjusted to increase the likelihood of capturing abrupt changes in a sensor signal, such as the rise in the hydrograph – an event that is often difficult to capture through traditional sampling techniques. Our architecture forms an embedded processing chain, leveraging local computational resources to assess uncertainty by analyzing data as it is collected. A network is presently being deployed in an urban watershed in Michigan and initial results indicate that the system accurately reconstructs signals of interest while significantly reducing energy consumption and the use of sampling resources. We also expand our analysis by discussing the role of this approach for the efficient real-time measurement of stormwater systems.
Resumo:
On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.
Resumo:
Shared attention is a type of communication very important among human beings. It is sometimes reserved for the more complex form of communication being constituted by a sequence of four steps: mutual gaze, gaze following, imperative pointing and declarative pointing. Some approaches have been proposed in Human-Robot Interaction area to solve part of shared attention process, that is, the most of works proposed try to solve the first two steps. Models based on temporal difference, neural networks, probabilistic and reinforcement learning are methods used in several works. In this article, we are presenting a robotic architecture that provides a robot or agent, the capacity of learning mutual gaze, gaze following and declarative pointing using a robotic head interacting with a caregiver. Three learning methods have been incorporated to this architecture and a comparison of their performance has been done to find the most adequate to be used in real experiment. The learning capabilities of this architecture have been analyzed by observing the robot interacting with the human in a controlled environment. The experimental results show that the robotic head is able to produce appropriate behavior and to learn from sociable interaction.
Resumo:
The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.
Resumo:
The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.
Resumo:
OBJECTIVES: To assess the frequency of and risk factors for discordant responses at 6 months on highly active antiretroviral therapy (HAART) in previously treatment-naive HIV patients from resource-limited countries. METHODS: The Antiretroviral Therapy in Low-Income Countries Collaboration is a network of clinics providing care and treatment to HIV-infected patients in Africa, Latin America, and Asia. Patients who initiated therapy between 1996 and 2004, were aged 16 years or older, and had a baseline CD4 cell count were included in this analysis. Responses were defined based on plasma viral load (PVL) and CD4 cell count at 6 months as complete virologic and immunologic (VR(+)IR(+)), virologic only (VR(+)IR(-)), immunologic only (VR(-)IR(+)), and nonresponse (VR(-)IR(-)). Multinomial logistic regression was used to assess the association between therapy responses and clinical and demographic variables. RESULTS: Of the 3111 patients eligible for analysis, 1914 had available information at 6 months of therapy: 1074 (56.1%) were VR(+)IR(+), 364 (19.0%) were VR(+)IR(-), 283 (14.8%) were (VR(-)IR(+)), and 193 (10.1%) were VR(-)IR(-). OF THE 3111 patients eligible for analysis, 1914 had available information at 6 months of therapy: 1074 (56.1%) were VRIR, 364 (19.0%) were VRIR, 283 (14.8%) were (VRIR), and 193 (10.1%) were VRIR. Compared with complete responders, virologic-only responders were older, had a higher baseline CD4 cell count, had a lower baseline PVL, and were more likely to have received a nonstandard HAART regimen; immunologic-only responders were younger, had a lower baseline CD4 cell count, had a higher baseline PVL, and were more likely to have received a protease inhibitor-based regimen. CONCLUSIONS: The frequency of and risk factors for discordant responses were comparable to those observed in developed countries. Longer follow-up is needed to assess the long-term impact of discordant responses on mortality in these resource-limited settings.
Resumo:
AIMS: To compare the gender distribution of HIV-infected adults receiving highly active antiretroviral treatment (HAART) in resource-constrained settings with estimates of the gender distribution of HIV infection; to describe the clinical characteristics of women and men receiving HAART. METHODS: The Antiretroviral Therapy in Lower-Income Countries, ART-LINC Collaboration is a network of clinics providing HAART in Africa, Latin America, and Asia. We compared UNAIDS data on the gender distribution of HIV infection with the proportions of women and men receiving HAART in the ART-LINC Collaboration. RESULTS: Twenty-nine centers in 13 countries participated. Among 33,164 individuals, 19,989 (60.3%) were women. Proportions of women receiving HAART in ART-LINC centers were similar to, or higher than, UNAIDS estimates of the proportions of HIV-infected women in all but two centers. There were fewer women receiving HAART than expected from UNAIDS data in one center in Uganda and one center in India. Taking into account heterogeneity across cohorts, women were younger than men, less likely to have advanced HIV infection, and more likely to be anemic at HAART initiation. CONCLUSIONS: Women in resource-constrained settings are not necessarily disadvantaged in their access to HAART. More attention needs to be paid to ensuring that HIV-infected men are seeking care and starting HAART.