905 resultados para Reinforcement Learning,resource-constrained devices,iOS devices,on-device machine learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The re-entrant flow shop scheduling problem (RFSP) is regarded as a NP-hard problem and attracted the attention of both researchers and industry. Current approach attempts to minimize the makespan of RFSP without considering the interdependency between the resource constraints and the re-entrant probability. This paper proposed Multi-level genetic algorithm (GA) by including the co-related re-entrant possibility and production mode in multi-level chromosome encoding. Repair operator is incorporated in the Multi-level genetic algorithm so as to revise the infeasible solution by resolving the resource conflict. With the objective of minimizing the makespan, Multi-level genetic algorithm (GA) is proposed and ANOVA is used to fine tune the parameter setting of GA. The experiment shows that the proposed approach is more effective to find the near-optimal schedule than the simulated annealing algorithm for both small-size problem and large-size problem. © 2013 Published by Elsevier Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

* This research was partially supported by the Latvian Science Foundation under grant No.02-86d.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using data from the 2004 wave of the Afrobarometer survey, this study examines correlates of household hardship in three countries of sub-Saharan Africa: Tanzania, Zambia, and Zimbabwe. Findings provide partial support for the hypothesized relationship. Specifically, poverty reduction initiatives and informal assistance are associated with reduced hardship while civic engagement is related to an increase in household hardship. We also note that certain demographic characteristics are linked to hardship. Policy and practice implications are suggested. © The Author(s) 2011.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional heuristic approaches to the Examination Timetabling Problem normally utilize a stochastic method during Optimization for the selection of the next examination to be considered for timetabling within the neighbourhood search process. This paper presents a technique whereby the stochastic method has been augmented with information from a weighted list gathered during the initial adaptive construction phase, with the purpose of intelligently directing examination selection. In addition, a Reinforcement Learning technique has been adapted to identify the most effective portions of the weighted list in terms of facilitating the greatest potential for overall solution improvement. The technique is tested against the 2007 International Timetabling Competition datasets with solutions generated within a time frame specified by the competition organizers. The results generated are better than those of the competition winner in seven of the twelve examinations, while being competitive for the remaining five examinations. This paper also shows experimentally how using reinforcement learning has improved upon our previous technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses the Batch Reinforcement Learning methods in Robotics. This sub-class of Reinforcement Learning has shown promising results and has been the focus of recent research. Three contributions are proposed that aim to extend the state-of-art methods allowing for a faster and more stable learning process, such as required for learning in Robotics. The Q-learning update-rule is widely applied, since it allows to learn without the presence of a model of the environment. However, this update-rule is transition-based and does not take advantage of the underlying episodic structure of collected batch of interactions. The Q-Batch update-rule is proposed in this thesis, to process experiencies along the trajectories collected in the interaction phase. This allows a faster propagation of obtained rewards and penalties, resulting in faster and more robust learning. Non-parametric function approximations are explored, such as Gaussian Processes. This type of approximators allows to encode prior knowledge about the latent function, in the form of kernels, providing a higher level of exibility and accuracy. The application of Gaussian Processes in Batch Reinforcement Learning presented a higher performance in learning tasks than other function approximations used in the literature. Lastly, in order to extract more information from the experiences collected by the agent, model-learning techniques are incorporated to learn the system dynamics. In this way, it is possible to augment the set of collected experiences with experiences generated through planning using the learned models. Experiments were carried out mainly in simulation, with some tests carried out in a physical robotic platform. The obtained results show that the proposed approaches are able to outperform the classical Fitted Q Iteration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

That humans and animals learn from interaction with the environment is a foundational idea underlying nearly all theories of learning and intelligence. Learning that certain outcomes are associated with specific actions or stimuli (both internal and external), is at the very core of the capacity to adapt behaviour to environmental changes. In the present work, appetitive and aversive reinforcement learning paradigms have been used to investigate the fronto-striatal loops and behavioural correlates of adaptive and maladaptive reinforcement learning processes, aiming to a deeper understanding of how cortical and subcortical substrates interacts between them and with other brain systems to support learning. By combining a large variety of neuroscientific approaches, including behavioral and psychophysiological methods, EEG and neuroimaging techniques, these studies aim at clarifying and advancing the knowledge of the neural bases and computational mechanisms of reinforcement learning, both in normal and neurologically impaired population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Al giorno d'oggi il reinforcement learning ha dimostrato di essere davvero molto efficace nel machine learning in svariati campi, come ad esempio i giochi, il riconoscimento vocale e molti altri. Perciò, abbiamo deciso di applicare il reinforcement learning ai problemi di allocazione, in quanto sono un campo di ricerca non ancora studiato con questa tecnica e perchè questi problemi racchiudono nella loro formulazione un vasto insieme di sotto-problemi con simili caratteristiche, per cui una soluzione per uno di essi si estende ad ognuno di questi sotto-problemi. In questo progetto abbiamo realizzato un applicativo chiamato Service Broker, il quale, attraverso il reinforcement learning, apprende come distribuire l'esecuzione di tasks su dei lavoratori asincroni e distribuiti. L'analogia è quella di un cloud data center, il quale possiede delle risorse interne - possibilmente distribuite nella server farm -, riceve dei tasks dai suoi clienti e li esegue su queste risorse. L'obiettivo dell'applicativo, e quindi del data center, è quello di allocare questi tasks in maniera da minimizzare il costo di esecuzione. Inoltre, al fine di testare gli agenti del reinforcement learning sviluppati è stato creato un environment, un simulatore, che permettesse di concentrarsi nello sviluppo dei componenti necessari agli agenti, invece che doversi anche occupare di eventuali aspetti implementativi necessari in un vero data center, come ad esempio la comunicazione con i vari nodi e i tempi di latenza di quest'ultima. I risultati ottenuti hanno dunque confermato la teoria studiata, riuscendo a ottenere prestazioni migliori di alcuni dei metodi classici per il task allocation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nella prima parte del mio lavoro viene presentato uno studio di una prima soluzione "from scratch" sviluppata da Andrew Karpathy. Seguono due miei miglioramenti: il primo modificando direttamente il codice della precedente soluzione e introducendo, come obbiettivo aggiuntivo per la rete nelle prime fasi di gioco, l'intercettazione della pallina da parte della racchetta, migliorando l'addestramento iniziale; il secondo é una mia personale implementazione utilizzando algoritmi più complessi, che sono allo stato dell'arte su giochi dell'Atari, e che portano un addestramento molto più veloce della rete.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nella letteratura economica e di teoria dei giochi vi è un dibattito aperto sulla possibilità di emergenza di comportamenti anticompetitivi da parte di algoritmi di determinazione automatica dei prezzi di mercato. L'obiettivo di questa tesi è sviluppare un modello di reinforcement learning di tipo actor-critic con entropy regularization per impostare i prezzi in un gioco dinamico di competizione oligopolistica con prezzi continui. Il modello che propongo esibisce in modo coerente comportamenti cooperativi supportati da meccanismi di punizione che scoraggiano la deviazione dall'equilibrio raggiunto a convergenza. Il comportamento di questo modello durante l'apprendimento e a convergenza avvenuta aiuta inoltre a interpretare le azioni compiute da Q-learning tabellare e altri algoritmi di prezzo in condizioni simili. I risultati sono robusti alla variazione del numero di agenti in competizione e al tipo di deviazione dall'equilibrio ottenuto a convergenza, punendo anche deviazioni a prezzi più alti.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Gastric banding still represents one of the most widely used bariatric procedures. It provides acceptable weight loss in many patients, but has frequent long-term complications. Because different types of bands may lead to different results, we designed a randomized study to compare the Lapband® with the SAGB®. We hereby report on the long-term results. METHODS: Between December 1998 and June 2002, 180 morbidly obese patients were randomized between Lapband® or SAGB®. Weight loss, long-term morbidity, and need for reoperation were evaluated. RESULTS: Long-term weight loss did not differ between the two bands. Patients who maintained their band had an acceptable long-term weight loss of between 50 and 60 % EBMIL. In both groups, about half the patients developed long-term complications, with about 50 % requiring major redo surgery. There was no difference in the overall rates of long-term complications or failures between the two groups, but patients who had a Lapband® were significantly more prone to develop band slippage/pouch dilatation (13.3 versus 0 %, p < 0,001). CONCLUSIONS: Although in the absence of complication, gastric banding leads to acceptable weight loss; the long-term complication and major reoperation rates are very high independently from the type of band used or on the operative technique. Gastric banding leads to relatively poor overall long-term results and therefore should not be considered the procedure of choice for the treatment of morbid obesity. Patients should be informed of the limited overall weight loss and the very high complication rates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Im Rahmen dieser interdisziplinären Doktorarbeit wird eine (Al)GaN Halbleiteroberflächenmodifikation untersucht, mit dem Ziel eine verbesserte Grenzfläche zwischen dem Material und dem Dielektrikum zu erzeugen. Aufgrund von Oberflächenzuständen zeigen GaN basierte HEMT Strukturen üblicherweise große Einsatzspannungsverschiebungen. Bisher wurden zur Grenzflächenmodifikation besonders die Entfernung von Verunreinigungen wie Sauerstoff oder Kohlenstoff analysiert. Die nasschemischen Oberflächenbehandlungen werden vor der Abscheidung des Dielektrikums durchgeführt, wobei die Kontaminationen jedoch nicht vollständig entfernt werden können. In dieser Arbeit werden Modifikationen der Oberfläche in wässrigen Lösungen, in Gasen sowie in Plasma analysiert. Detaillierte Untersuchungen zeigen, dass die inerte (0001) c-Ebene der Oberfläche kaum reagiert, sondern hauptsächlich die weniger polaren r- und m- Ebenen. Dies kann deutlich beim Defektätzen sowie bei der thermischen Oxidation beobachtet werden. Einen weiteren Ansatz zur Oberflächenmodifikation stellen Plasmabehandlungen dar. Hierbei wird die Oberflächenterminierung durch eine nukleophile Substitution mit Lewis Basen, wie Fluorid, Chlorid oder Oxid verändert, wodurch sich die Elektronegativitätsdifferenz zwischen dem Metall und dem Anion im Vergleich zur Metall-Stickstoff Bindung erhöht. Dies führt gleichzeitig zu einer Erhöhung der Potentialdifferenz des Schottky Kontakts. Sauerstoff oder Fluor besitzen die nötige thermische Stabilität um während einer Silicium-nitridabscheidung an der (Al)GaN Oberfläche zu bleiben. Sauerstoffvariationen an der Oberfläche werden in NH3 bei 700°C, welches die nötigen Bedingungen für die Abscheidung darstellen, immer zu etwa 6-8% reduziert – solche Grenzflächen zeigen deswegen auch keine veränderten Ergebnisse in Einsatzspannungsuntersuchungen. Im Gegensatz dazu zeigt die fluorierte Oberfläche ein völlig neues elektrisches Verhalten: ein neuer dominanter Oberflächendonator mit einem schnellen Trapping und Detrapping Verhalten wird gefunden. Das Energieniveau dieses neuen, stabilen Donators liegt um ca. 0,5 eV tiefer in der Bandlücke als die ursprünglichen Energieniveaus der Oberflächenzustände. Physikalisch-chemische Oberflächen- und Grenzflächenuntersuchung mit XPS, AES oder SIMS erlauben keine eindeutige Schlussfolgerung, ob das Fluor nach der Si3N4 Abscheidung tatsächlich noch an der Grenzfläche vorhanden ist, oder einfach eine stabilere Oberflächenrekonstruktion induziert wurde, bei welcher es selbst nicht beteiligt ist. In beiden Fällen ist der neue Donator in einer Konzentration von 4x1013 at/cm-2 vorhanden. Diese Dichte entspricht einer Oberflächenkonzentration von etwa 1%, was genau an der Nachweisgrenze der spektroskopischen Methoden liegt. Jedoch werden die elektrischen Oberflächeneigenschaften durch die Oberflächenmodifikation deutlich verändert und ermöglichen eine potentiell weiter optimierbare Grenzfläche.