17 resultados para Reinforcement Learning,resource-constrained devices,iOS devices,on-device machine learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of wireless sensor and actuator networks in industry has been increasing past few years, bringing multiple benefits compared to wired systems, like network flexibility and manageability. Such networks consists of a possibly large number of small and autonomous sensor and actuator devices with wireless communication capabilities. The data collected by sensors are sent directly or through intermediary nodes along the network to a base station called sink node. The data routing in this environment is an essential matter since it is strictly bounded to the energy efficiency, thus the network lifetime. This work investigates the application of a routing technique based on Reinforcement Learning s Q-Learning algorithm to a wireless sensor network by using an NS-2 simulated environment. Several metrics like energy consumption, data packet delivery rates and delays are used to validate de proposal comparing it with another solutions existing in the literature

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new paradigm for collective learning in multi-agent systems (MAS) as a solution to the problem in which several agents acting over the same environment must learn how to perform tasks, simultaneously, based on feedbacks given by each one of the other agents. We introduce the proposed paradigm in the form of a reinforcement learning algorithm, nominating it as reinforcement learning with influence values. While learning by rewards, each agent evaluates the relation between the current state and/or action executed at this state (actual believe) together with the reward obtained after all agents that are interacting perform their actions. The reward is a result of the interference of others. The agent considers the opinions of all its colleagues in order to attempt to change the values of its states and/or actions. The idea is that the system, as a whole, must reach an equilibrium, where all agents get satisfied with the obtained results. This means that the values of the state/actions pairs match the reward obtained by each agent. This dynamical way of setting the values for states and/or actions makes this new reinforcement learning paradigm the first to include, naturally, the fact that the presence of other agents in the environment turns it a dynamical model. As a direct result, we implicitly include the internal state, the actions and the rewards obtained by all the other agents in the internal state of each agent. This makes our proposal the first complete solution to the conceptual problem that rises when applying reinforcement learning in multi-agent systems, which is caused by the difference existent between the environment and agent models. With basis on the proposed model, we create the IVQ-learning algorithm that is exhaustive tested in repetitive games with two, three and four agents and in stochastic games that need cooperation and in games that need collaboration. This algorithm shows to be a good option for obtaining solutions that guarantee convergence to the Nash optimum equilibrium in cooperative problems. Experiments performed clear shows that the proposed paradigm is theoretical and experimentally superior to the traditional approaches. Yet, with the creation of this new paradigm the set of reinforcement learning applications in MAS grows up. That is, besides the possibility of applying the algorithm in traditional learning problems in MAS, as for example coordination of tasks in multi-robot systems, it is possible to apply reinforcement learning in problems that are essentially collaborative

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work describes the study, the analysis, the project methodology and the constructive details of a high frequency DC/AC resonant series converter using sequential commutation techniques for the excitation of an inductive coupled thermal plasma torch. The aim of this thesis is to show the new modulation technique potentialities and to present a technological option for the high-frequency electronic power converters development. The resonant converter operates at 50 kW output power under a 400 kHz frequency and it is constituted by inverter cells using ultra-fast IGBT devices. In order to minimize the turn-off losses, the inverter cells operates in a ZVS mode referred by a modified PLL loop that maintains this condition stable, despite the load variations. The sequential pulse gating command strategy used it allows to operate the IGBT devices on its maximum power limits using the derating and destressing current scheme, as well as it propitiates a frequency multiplication of the inverters set. The output converter is connected to a series resonant circuit constituted by the applicator ICTP torch, a compensation capacitor and an impedance matching RF transformer. At the final, are presented the experimental results and the many tests achieved in laboratory as form to validate the proposed new technique

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The metaheuristics techiniques are known to solve optimization problems classified as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of finding the global optimum. Motivated by the difficulties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efficiency in obtaining efficient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efficiency and speedup (gain in speed with the parallel processing) of the implementations performed

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semiconductor technologies evolutions leads devices to be developed with higher processing capability. Thus, those components have been used widely in more fields. Many industrial environment such as: oils, mines, automotives and hospitals are frequently using those devices on theirs process. Those industries activities are direct related to environment and health safe. So, it is quite important that those systems have extra safe features yield more reliability, safe and availability. The reference model eOSI that will be presented by this work is aimed to allow the development of systems under a new view perspective which can improve and make simpler the choice of strategies for fault tolerant. As a way to validate the model na architecture FPGA-based was developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Dissertation aims to provide a communication mechanism between Digital TV viewers and interaction devices, such as robots, for example, placed on the environment from which a TV program is being live broadcasted. Such communication mechanism has the objective to allow viewers controll the Interaction Devices through their TV devices, using the broadcast channel present in Interactive Digital TV systems, and receive data from the devices by the broadcast channel. This system was projected as a middlewaer system where the Interaction Devices in the TV program set are interconnected, creating a Interactive Device Network. With this approach, the system is capable of manage the devices on the network, controlling the flow of coming and leaving elements, in a transparent way for the viewers. The system yet allows the Interaction Devices communicate each other, with a integrated communication channel with no worries about the physical communication layer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relevant researches have been growing on electric machine without mancal or bearing and that is generally named bearingless motor or specifically, mancal motor. In this paper it is made an introductory presentation about bearingless motor and its peripherical devices with focus on the design and implementation of sensors and interfaces needed to control rotor radial positioning and rotation of the machine. The signals from the machine are conditioned in analogic inputs of DSP TMS320F2812 and used in the control program. This work has a purpose to elaborate and build a system with sensors and interfaces suitable to the input and output of DSP TMS320F2812 to control a mancal motor, bearing in mind the modularity, simplicity of circuits, low number of power used, good noise imunity and good response frequency over 10 kHz. The system is tested at a modified ordinary induction motor of 3,7 kVA to be used with a bearingless motor with divided coil

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In multi-robot systems, both control architecture and work strategy represent a challenge for researchers. It is important to have a robust architecture that can be easily adapted to requirement changes. It is also important that work strategy allows robots to complete tasks efficiently, considering that robots interact directly in environments with humans. In this context, this work explores two approaches for robot soccer team coordination for cooperative tasks development. Both approaches are based on a combination of imitation learning and reinforcement learning. Thus, in the first approach was developed a control architecture, a fuzzy inference engine for recognizing situations in robot soccer games, a software for narration of robot soccer games based on the inference engine and the implementation of learning by imitation from observation and analysis of others robotic teams. Moreover, state abstraction was efficiently implemented in reinforcement learning applied to the robot soccer standard problem. Finally, reinforcement learning was implemented in a form where actions are explored only in some states (for example, states where an specialist robot system used them) differently to the traditional form, where actions have to be tested in all states. In the second approach reinforcement learning was implemented with function approximation, for which an algorithm called RBF-Sarsa($lambda$) was created. In both approaches batch reinforcement learning algorithms were implemented and imitation learning was used as a seed for reinforcement learning. Moreover, learning from robotic teams controlled by humans was explored. The proposal in this work had revealed efficient in the robot soccer standard problem and, when implemented in other robotics systems, they will allow that these robotics systems can efficiently and effectively develop assigned tasks. These approaches will give high adaptation capabilities to requirements and environment changes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of reservoir engineering is to manage fields of oil production in order to maximize the production of hydrocarbons according to economic and physical restrictions. The deciding of a production strategy is a complex activity involving several variables in the process. Thus, a smart system, which assists in the optimization of the options for developing of the field, is very useful in day-to-day of reservoir engineers. This paper proposes the development of an intelligent system to aid decision making, regarding the optimization of strategies of production in oil fields. The intelligence of this system will be implemented through the use of the technique of reinforcement learning, which is presented as a powerful tool in problems of multi-stage decision. The proposed system will allow the specialist to obtain, in time, a great alternative (or near-optimal) for the development of an oil field known

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Neste trabalho é proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solução é comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os parâmetros de comparação significativos. Um algoritmo que apresente desempenho eficiente em relação aos mesmos tende a ser competitivo também, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solução do PKS é baseado em técnicas de aprendizagem por reforço. Para tanto, o problema foi modelado como um processo de decisão em múltiplas etapas, ao qual é aplicado o algoritmo Q-Learning, um dos métodos de solução mais populares para o estabelecimento de políticas ótimas neste tipo de problema de decisão. Entretanto, deve-se observar que a dimensão da estrutura de armazenamento utilizada pela aprendizagem por reforço para se obter a política ótima cresce em função do número de estados e de ações, que por sua vez é proporcional ao número n de nós e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplicação do método a problemas de menor porte, onde o número de nós e de servos é reduzido. Este problema, denominado maldição da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execução de um algoritmo para certas instâncias de um problema pelo esgotamento de recursos computacionais para obtenção de sua saída. De modo a evitar que a solução proposta, baseada exclusivamente na aprendizagem por reforço, seja restrita a aplicações de menor porte, propõe-se uma solução alternativa para problemas mais realistas, que envolvam um número maior de nós e de servos. Esta solução alternativa é hierarquizada e utiliza dois métodos de solução do PKS: a aprendizagem por reforço, aplicada a um número reduzido de nós obtidos a partir de um processo de agregação, e um método guloso, aplicado aos subconjuntos de nós resultantes do processo de agregação, onde o critério de escolha do agendamento dos servos é baseado na menor distância ao local de demanda

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The state has changed over time in order to meet a society with increasingly stringent demands. Techniques of private means begin to be employed in an attempt to overcome the dysfunctions entrenched bureaucracy, making the machine faster. By federal law by the People Management Skills was established as a reference for the administration of Human Resources of the public sector in an attempt to develop professionally servers, based mainly on the three pillars of the model: the knowledge, skills and attitudes. This thesis aims at understanding, in the view of employees, the perceived impacts on the organizational changes occurring in the Department of Administration and Human Resources of the State of Rio Grande do Norte in order to implement a People Management Skills-based. It is a simple case study, characterized by the research during a certain period of time, collecting data in a real environment of an organization, in this case SEARH/RN. The procedures used in collecting data were the literature review, documental research and field research. We used a qualitative approach with exploratory and descriptive approach. Every reform was implemented in the institution and reported from there analyzed the impacts observed by the servers. As a result we observed a considerable advance in institutional activities, mainly relating to physical structure / organizational and human resource policies, with minor advances on labor policies, in much the result of the guiding focus of the reform on SEARH/RN. The impacts in total were more positive than negative and direct paths to improvement in public organizations. Making a general analysis of the modernization program implemented in SEARH/RN, we can conclude that there was a distinct change in all dimensions studied, mostly pointing out positive aspects, and contrary to the opinion of some authors, who claim to be very difficult to implement reforms in public organizations, since they are highly institutionalized environments. What was found was a big organization, with gaps and weaknesses, but with a much larger number of hits and recognition from institutional actors

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work the use of coconut fiber (coir) and bamboo shafts as reinforcement of soil-cement was studied, in order to obtain an alternative material to make stakes for fences in rural properties. The main objective was to study the effect of the addition of reinforcement to the soil-cement matrix. The effect of humidity on the mechanical properties was also analyzed. The soil-cement mortar was composed by a mixture, in equal parts, of soil and river sand, 14% in weight of cement and 10 % in weight of water. As reinforcement, different combinations of (a) coconut fiber with 15 mm mean length (0,3 %, 0,6 % and 1,2 % in weight) and (b) bamboo shafts, also in crescent quantities (2, 4 and 8 shafts per specimen) were used. For each combination 6 specimens were made and these were submitted to three point flexural test after 28 days of cure. In order to evaluate the effect of humidity, 1 specimen from each of the coconut fiber reinforced combination was immersed in water 24 hours prior to flexural test. The results of the tests carried out indicated that the addition of the reinforcement affected negatively the mechanical resistance and, on the other hand, increased the tenacity and the ductility of the material.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beamforming is a technique widely used in various fields. With the aid of an antenna array, the beamforming aims to minimize the contribution of unknown interferents directions, while capturing the desired signal in a given direction. In this thesis are proposed beamforming techniques using Reinforcement Learning (RL) through the Q-Learning algorithm in antennas array. One proposal is to use RL to find the optimal policy selection between the beamforming (BF) and power control (PC) in order to better leverage the individual characteristics of each of them for a certain amount of Signal to Interference plus noise Ration (SINR). Another proposal is to use RL to determine the optimal policy between blind beamforming algorithm of CMA (Constant Modulus Algorithm) and DD (Decision Direct) in multipath environments. Results from simulations showed that the RL technique could be effective in achieving na optimal of switching between different techniques.