921 resultados para reinforcement classes
Resumo:
This paper aims to provide an improved NSGA-II (Non-Dominated Sorting Genetic Algorithm-version II) which incorporates a parameter-free self-tuning approach by reinforcement learning technique, called Non-Dominated Sorting Genetic Algorithm Based on Reinforcement Learning (NSGA-RL). The proposed method is particularly compared with the classical NSGA-II when applied to a satellite coverage problem. Furthermore, not only the optimization results are compared with results obtained by other multiobjective optimization methods, but also guarantee the advantage of no time-spending and complex parameter tuning.
Resumo:
Evidence from appetitive Pavlovian and instrumental conditioning studies suggest that the amygdala is involved in modulation of responses correlated with motivational states, and therefore, to the modulation of processes probably underlying reinforcement omission effects. The present study aimed to clarify whether or not the mechanisms related to reinforcement omission effects of different magnitudes depend on basolateral complex and central nucleus of amygdala. Rats were trained on a fixed-interval 12 s with limited hold 6 s signaled schedule in which correct responses were always followed by one of two reinforcement magnitudes. Bilateral lesions of the basolateral complex and central nucleus were made after acquisition of stable performance. After postoperative recovery, the training was changed from 100% to 50% reinforcement schedules. The results showed that lesions of the basolateral complex and central nucleus did not eliminate or reduce, but interfere with reinforcement omission effects. The response from rats of both the basolateral complex and central nucleus lesioned group was higher relative to that of the rats of their respective sham-lesioned groups after reinforcement omission. Thus, the lesioned rats were more sensitive to the omission effect. Moreover, the basolateral complex lesions prevented the magnitude effect on reinforcement omission effects. Basolateral complex lesioned rats showed no differential performance following omission of larger and smaller reinforcement magnitude. Thus, the basolateral complex is involved in incentive processes relative to omission of different reinforcement magnitudes. Therefore, it is possible that reinforcement omission effects are modulated by brain circuitry which involves amygdala. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
It is the aim of the present study to assess factors associated with time spent in class among working college students. Eighty-two working students from 21 to 26 years old participated in this study. They were enrolled in an evening course of the University of Sao Paulo, Brazil. Participants answered a questionnaire on living and working conditions. During seven consecutive days, they wore an actigraph, filled out daily activity diaries (including time spent in classes) and the Karolinska Sleepiness Scale every three hours from waking until bedtime. Linear regression analyses were performed in order to assess the variables associated with time spent in classes. The results showed that gender, sleep length, excessive sleepiness, alcoholic beverage consumption (during workdays) and working hours were associated factors with time spent in class. Thus, those who spent less time in class were males, slept longer hours, reported excessive sleepiness on Saturdays, worked longer hours, and reported alcohol consumption. The combined effects of long work hours (>40 h/week) and reduced sleep length may affect lifestyles and academic performance. Future studies should aim to look at adverse health effects induced by reduced sleep duration, even among working students who spent more time attending evening classes.
Resumo:
The reinforcement omission effect (ROE) has been attributed to both motivational and attentional consequences of surprising reinforcement omission. Recent evidence suggests that the basolateral complex of the amygdala is involved in motivational components related to reinforcement value, whereas the central nucleus of the amygdala is involved in the processing of the attentional consequences of surprise. This study was designed to verify whether the mechanisms involved in the ROE depend on the integrity of either the basolateral amygdala complex or central nucleus of the amygdala. The ROE was evaluated in rats with lesions of either the central nucleus or basolateral complex of the amygdala and trained on a fixed-interval schedule procedure (Experiment 1) and fixed-interval with limited hold signaled schedule procedure (Experiment 2). The results of Experiment 1 showed that sham-operated rats and rats with lesions of either the central nucleus or basolateral area displayed the ROE. In contrast, in Experiment 2, subjects with lesions of the central nucleus or basolateral complex of the amygdala exhibited a smaller ROE compared with sham-operated subjects. Thus, the effects of selective lesions of amygdala subregions on the ROE in rats depended on the training procedure. Furthermore, the absence of differences between the lesioned groups in either experiment did not allow the dissociation of attentional or motivational components of the ROE with functions of specific areas of the amygdala. Thus, results did not show a functional double-dissociation between the central nucleus and basolateral area in the ROE.
Resumo:
This paper presents a method to design membrane elements of concrete with orthogonal mesh of reinforcement which are subject to compressive stress. Design methods, in general, define how to quantify the reinforcement necessary to support the tension stress and verify if the compression in concrete is within the strength limit. In case the compression in membrane is excessive, it is possible to use reinforcements subject to compression. However, there is not much information in the literature about how to design reinforcement for these cases. For that, this paper presents a procedure which uses the model based on Baumann's [1] criteria. The strength limits used herein are those recommended by CEB [3], however, a model is proposed in which this limit varies according to the tensile strain which occur perpendicular to compression. This resistance model is based on concepts proposed by Vecchio e Collins [2].
Resumo:
Die vorliegende Arbeit beschäftigt sich mit der Entwicklung eines Funktionsapproximators und dessen Verwendung in Verfahren zum Lernen von diskreten und kontinuierlichen Aktionen: 1. Ein allgemeiner Funktionsapproximator – Locally Weighted Interpolating Growing Neural Gas (LWIGNG) – wird auf Basis eines Wachsenden Neuralen Gases (GNG) entwickelt. Die topologische Nachbarschaft in der Neuronenstruktur wird verwendet, um zwischen benachbarten Neuronen zu interpolieren und durch lokale Gewichtung die Approximation zu berechnen. Die Leistungsfähigkeit des Ansatzes, insbesondere in Hinsicht auf sich verändernde Zielfunktionen und sich verändernde Eingabeverteilungen, wird in verschiedenen Experimenten unter Beweis gestellt. 2. Zum Lernen diskreter Aktionen wird das LWIGNG-Verfahren mit Q-Learning zur Q-LWIGNG-Methode verbunden. Dafür muss der zugrunde liegende GNG-Algorithmus abgeändert werden, da die Eingabedaten beim Aktionenlernen eine bestimmte Reihenfolge haben. Q-LWIGNG erzielt sehr gute Ergebnisse beim Stabbalance- und beim Mountain-Car-Problem und gute Ergebnisse beim Acrobot-Problem. 3. Zum Lernen kontinuierlicher Aktionen wird ein REINFORCE-Algorithmus mit LWIGNG zur ReinforceGNG-Methode verbunden. Dabei wird eine Actor-Critic-Architektur eingesetzt, um aus zeitverzögerten Belohnungen zu lernen. LWIGNG approximiert sowohl die Zustands-Wertefunktion als auch die Politik, die in Form von situationsabhängigen Parametern einer Normalverteilung repräsentiert wird. ReinforceGNG wird erfolgreich zum Lernen von Bewegungen für einen simulierten 2-rädrigen Roboter eingesetzt, der einen rollenden Ball unter bestimmten Bedingungen abfangen soll.
Resumo:
We deal with five problems arising in the field of logistics: the Asymmetric TSP (ATSP), the TSP with Time Windows (TSPTW), the VRP with Time Windows (VRPTW), the Multi-Trip VRP (MTVRP), and the Two-Echelon Capacitated VRP (2E-CVRP). The ATSP requires finding a lest-cost Hamiltonian tour in a digraph. We survey models and classical relaxations, and describe the most effective exact algorithms from the literature. A survey and analysis of the polynomial formulations is provided. The considered algorithms and formulations are experimentally compared on benchmark instances. The TSPTW requires finding, in a weighted digraph, a least-cost Hamiltonian tour visiting each vertex within a given time window. We propose a new exact method, based on new tour relaxations and dynamic programming. Computational results on benchmark instances show that the proposed algorithm outperforms the state-of-the-art exact methods. In the VRPTW, a fleet of identical capacitated vehicles located at a depot must be optimally routed to supply customers with known demands and time window constraints. Different column generation bounding procedures and an exact algorithm are developed. The new exact method closed four of the five open Solomon instances. The MTVRP is the problem of optimally routing capacitated vehicles located at a depot to supply customers without exceeding maximum driving time constraints. Two set-partitioning-like formulations of the problem are introduced. Lower bounds are derived and embedded into an exact solution method, that can solve benchmark instances with up to 120 customers. The 2E-CVRP requires designing the optimal routing plan to deliver goods from a depot to customers by using intermediate depots. The objective is to minimize the sum of routing and handling costs. A new mathematical formulation is introduced. Valid lower bounds and an exact method are derived. Computational results on benchmark instances show that the new exact algorithm outperforms the state-of-the-art exact methods.
Resumo:
The present thesis is concerned with certain aspects of differential and pseudodifferential operators on infinite dimensional spaces. We aim to generalize classical operator theoretical concepts of pseudodifferential operators on finite dimensional spaces to the infinite dimensional case. At first we summarize some facts about the canonical Gaussian measures on infinite dimensional Hilbert space riggings. Considering the naturally unitary group actions in $L^2(H_-,gamma)$ given by weighted shifts and multiplication with $e^{iSkp{t}{cdot}_0}$ we obtain an unitary equivalence $F$ between them. In this sense $F$ can be considered as an abstract Fourier transform. We show that $F$ coincides with the Fourier-Wiener transform. Using the Fourier-Wiener transform we define pseudodifferential operators in Weyl- and Kohn-Nirenberg form on our Hilbert space rigging. In the case of this Gaussian measure $gamma$ we discuss several possible Laplacians, at first the Ornstein-Uhlenbeck operator and then pseudo-differential operators with negative definite symbol. In the second case, these operators are generators of $L^2_gamma$-sub-Markovian semi-groups and $L^2_gamma$-Dirichlet-forms. In 1992 Gramsch, Ueberberg and Wagner described a construction of generalized Hörmander classes by commutator methods. Following this concept and the classical finite dimensional description of $Psi_{ro,delta}^0$ ($0leqdeltaleqroleq 1$, $delta< 1$) in the $C^*$-algebra $L(L^2)$ by Beals and Cordes we construct in both cases generalized Hörmander classes, which are $Psi^*$-algebras. These classes act on a scale of Sobolev spaces, generated by our Laplacian. In the case of the Ornstein-Uhlenbeck operator, we prove that a large class of continuous pseudodifferential operators considered by Albeverio and Dalecky in 1998 is contained in our generalized Hörmander class. Furthermore, in the case of a Laplacian with negative definite symbol, we develop a symbolic calculus for our operators. We show some Fredholm-criteria for them and prove that these Fredholm-operators are hypoelliptic. Moreover, in the finite dimensional case, using the Gaussian-measure instead of the Lebesgue-measure the index of these Fredholm operators is still given by Fedosov's formula. Considering an infinite dimensional Heisenberg group rigging we discuss the connection of some representations of the Heisenberg group to pseudo-differential operators on infinite dimensional spaces. We use this connections to calculate the spectrum of pseudodifferential operators and to construct generalized Hörmander classes given by smooth elements which are spectrally invariant in $L^2(H_-,gamma)$. Finally, given a topological space $X$ with Borel measure $mu$, a locally compact group $G$ and a representation $B$ of $G$ in the group of all homeomorphisms of $X$, we construct a Borel measure $mu_s$ on $X$ which is invariant under $B(G)$.
Resumo:
Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.
Resumo:
The thesis applies the ICC tecniques to the probabilistic polinomial complexity classes in order to get an implicit characterization of them. The main contribution lays on the implicit characterization of PP (which stands for Probabilistic Polynomial Time) class, showing a syntactical characterisation of PP and a static complexity analyser able to recognise if an imperative program computes in Probabilistic Polynomial Time. The thesis is divided in two parts. The first part focuses on solving the problem by creating a prototype of functional language (a probabilistic variation of lambda calculus with bounded recursion) that is sound and complete respect to Probabilistic Prolynomial Time. The second part, instead, reverses the problem and develops a feasible way to verify if a program, written with a prototype of imperative programming language, is running in Probabilistic polynomial time or not. This thesis would characterise itself as one of the first step for Implicit Computational Complexity over probabilistic classes. There are still open hard problem to investigate and try to solve. There are a lot of theoretical aspects strongly connected with these topics and I expect that in the future there will be wide attention to ICC and probabilistic classes.
Resumo:
A permutation is said to avoid a pattern if it does not contain any subsequence which is order-isomorphic to it. Donald Knuth, in the first volume of his celebrated book "The art of Computer Programming", observed that the permutations that can be computed (or, equivalently, sorted) by some particular data structures can be characterized in terms of pattern avoidance. In more recent years, the topic was reopened several times, while often in terms of sortable permutations rather than computable ones. The idea to sort permutations by using one of Knuth’s devices suggests to look for a deterministic procedure that decides, in linear time, if there exists a sequence of operations which is able to convert a given permutation into the identical one. In this thesis we show that, for the stack and the restricted deques, there exists an unique way to implement such a procedure. Moreover, we use these sorting procedures to create new sorting algorithms, and we prove some unexpected commutation properties between these procedures and the base step of bubblesort. We also show that the permutations that can be sorted by a combination of the base steps of bubblesort and its dual can be expressed, once again, in terms of pattern avoidance. In the final chapter we give an alternative proof of some enumerative results, in particular for the classes of permutations that can be sorted by the two restricted deques. It is well-known that the permutations that can be sorted through a restricted deque are counted by the Schrӧder numbers. In the thesis, we show how the deterministic sorting procedures yield a bijection between sortable permutations and Schrӧder paths.
Resumo:
The present work is included in the context of the assessment of sustainability in the construction field and is aimed at estimating and analyzing life cycle cost of the existing reinforced concrete bridge “Viadotto delle Capre” during its entire life. This was accomplished by a comprehensive data collection and results evaluation. In detail, the economic analysis of the project is performed. The work has investigated possible design alternatives for maintenance/rehabilitation and end-of-life operations, when structural, functional, economic and also environmental requirements have to be fulfilled. In detail, the economic impact of different design options for the given reinforced concrete bridge have been assessed, whereupon the most economically, structurally and environmentally efficient scenario was chosen. The Integrated Life-Cycle Analysis procedure and Environmental Impact Assessment were also discussed in this work. The scope of this thesis is to illustrate that Life Cycle Cost analysis as part of Life Cycle Assessment approach could be effectively used to drive the design and management strategy of new and existing structures. The final objective of this contribution is to show how an economic analysis can influence decision-making in the definition of the most sustainable design alternatives. The designers can monitor the economic impact of different design strategies in order to identify the most appropriate option.
Resumo:
In diesem Arbeitspapier will ich zur künftigen Forschung über soziale Stratifikation in Afrika beitragen, indem ich die theoretischen Implikationen und empirischen Herausforderungen der Konzepte "Elite" und "Mittelklasse" untersuche. Diese Konzepte stammen aus teilweise miteinander konkurrierenden Theorietraditionen. Außerdem haben Sozialwissenschaftler und Historiker sie zu verschiedenen Zeiten und mit Bezug auf verschiedene Regionen unterschiedlich verwendet. So haben Afrikaforscher und -forscherinnen soziale Formationen, die in anderen Teilen der Welt als Mittelklasse kategorisiert wurden, meist als Eliten aufgefasst und tun dies zum Teil noch heute. Elite und Mittelklasse sind aber nicht nur Begriffe der sozialwissenschaftlichen Forschung, sondern zugleich Kategorien der sozialen und politischen Praxis. Die Art und Weise, wie Menschen diese Begriffe benutzen, um sich selbst oder andere zu beschreiben, hat wiederum Rückwirkungen auf sozialwissenschaftliche Diskurse und umgekehrt. Das Arbeitspapier setzt sich mit beiden Aspekten auseinander: mit der Geschichte der theoretischen Debatten über Elite und Mittelklasse und damit, was wir aus empirischen Studien über die umstrittenen Selbstverortungen sozialer Akteure lernen können und über ihre sich verändernden Auffassungen und Praktiken von Elite- oder Mittelklasse-Sein. Weil ich überzeugt bin, dass künftige Forschung zu sozialer Stratifikation in Afrika außerordentlich viel von einer historisch und regional vergleichenden Perspektive profitieren kann, analysiert dieses Arbeitspapier nicht nur Untersuchungen zu afrikanischen Eliten und Mittelklassen, sondern auch eine Fülle von Studien zur Geschichte der Mittelklassen in Europa und Nordamerika sowie zu den neuen Mittelklassen im Globalen Süden.
Resumo:
The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.
Resumo:
The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.