5 resultados para Policy reference framework

em ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we develop further the functional renormalization group (RG) approach to quantum field theory (QFT) based on the effective average action (EAA) and on the exact flow equation that it satisfies. The EAA is a generalization of the standard effective action that interpolates smoothly between the bare action for krightarrowinfty and the standard effective action rnfor krightarrow0. In this way, the problem of performing the functional integral is converted into the problem of integrating the exact flow of the EAA from the UV to the IR. The EAA formalism deals naturally with several different aspects of a QFT. One aspect is related to the discovery of non-Gaussian fixed points of the RG flow that can be used to construct continuum limits. In particular, the EAA framework is a useful setting to search for Asymptotically Safe theories, i.e. theories valid up to arbitrarily high energies. A second aspect in which the EAA reveals its usefulness are non-perturbative calculations. In fact, the exact flow that it satisfies is a valuable starting point for devising new approximation schemes. In the first part of this thesis we review and extend the formalism, in particular we derive the exact RG flow equation for the EAA and the related hierarchy of coupled flow equations for the proper-vertices. We show how standard perturbation theory emerges as a particular way to iteratively solve the flow equation, if the starting point is the bare action. Next, we explore both technical and conceptual issues by means of three different applications of the formalism, to QED, to general non-linear sigma models (NLsigmaM) and to matter fields on curved spacetimes. In the main part of this thesis we construct the EAA for non-abelian gauge theories and for quantum Einstein gravity (QEG), using the background field method to implement the coarse-graining procedure in a gauge invariant way. We propose a new truncation scheme where the EAA is expanded in powers of the curvature or field strength. Crucial to the practical use of this expansion is the development of new techniques to manage functional traces such as the algorithm proposed in this thesis. This allows to project the flow of all terms in the EAA which are analytic in the fields. As an application we show how the low energy effective action for quantum gravity emerges as the result of integrating the RG flow. In any treatment of theories with local symmetries that introduces a reference scale, the question of preserving gauge invariance along the flow emerges as predominant. In the EAA framework this problem is dealt with the use of the background field formalism. This comes at the cost of enlarging the theory space where the EAA lives to the space of functionals of both fluctuation and background fields. In this thesis, we study how the identities dictated by the symmetries are modified by the introduction of the cutoff and we study so called bimetric truncations of the EAA that contain both fluctuation and background couplings. In particular, we confirm the existence of a non-Gaussian fixed point for QEG, that is at the heart of the Asymptotic Safety scenario in quantum gravity; in the enlarged bimetric theory space where the running of the cosmological constant and of Newton's constant is influenced by fluctuation couplings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Die vorliegende Dissertation besteht aus sechs Kapiteln und trägt zur Forschung in den Bereichen der Finanzmarktpolitik und der Geldpolitik bei. Das zweite Kapitel zeigt die Wechselbeziehung zwischen Geldmarktanspannungen und der Stabilität des Finanzsystems auf. Mittels der theoretischen Literatur werden verschiedene Einflussfaktoren einer aggregierten Liquiditätsnachfragefunktion präsentiert. Das dritte Kapitel untersucht den Informationsgehalt der Ergebnisse der Hauptrefinanzierungsgeschäfte für den europäischen Geldmarkt. Unsere Ergebnisse zeigen, dass sich seit der Finanzkrise der Informationsgehalt der Hauptrefinanzierungsgeschäfte in zweierlei Hinsicht verändert hat. Im vierten Kapitel untersuchen wir die Wirksamkeit der Geldpolitik während der Finanzkrise europäische Geldmarktzinssätze zu steuern. Die Ergebnisse deuten auf eine erhebliche Divergenz zwischen den Zinssätzen und den Erwartungen über die zukünftige Geldpolitik hin. Weiterhin finden wir heraus, dass die unkonventionellen Maßnahmen der EZB für einen Rückgang der Euriborsätze von bis zu 60 Basispunkten verantwortlich sind. Das fünfte Kapitel beschäftigt sich mit der Funktionsweise des besonderen geldpolitischen Instrumentariums der Schweizerischen Nationalbank.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coupled-cluster (CC) theory is one of the most successful approaches in high-accuracy quantum chemistry. The present thesis makes a number of contributions to the determination of molecular properties and excitation energies within the CC framework. The multireference CC (MRCC) method proposed by Mukherjee and coworkers (Mk-MRCC) has been benchmarked within the singles and doubles approximation (Mk-MRCCSD) for molecular equilibrium structures. It is demonstrated that Mk-MRCCSD yields reliable results for multireference cases where single-reference CC methods fail. At the same time, the present work also illustrates that Mk-MRCC still suffers from a number of theoretical problems and sometimes gives rise to results of unsatisfactory accuracy. To determine polarizability tensors and excitation spectra in the MRCC framework, the Mk-MRCC linear-response function has been derived together with the corresponding linear-response equations. Pilot applications show that Mk-MRCC linear-response theory suffers from a severe problem when applied to the calculation of dynamic properties and excitation energies: The Mk-MRCC sufficiency conditions give rise to a redundancy in the Mk-MRCC Jacobian matrix, which entails an artificial splitting of certain excited states. This finding has established a new paradigm in MRCC theory, namely that a convincing method should not only yield accurate energies, but ought to allow for the reliable calculation of dynamic properties as well. In the context of single-reference CC theory, an analytic expression for the dipole Hessian matrix, a third-order quantity relevant to infrared spectroscopy, has been derived and implemented within the CC singles and doubles approximation. The advantages of analytic derivatives over numerical differentiation schemes are demonstrated in some pilot applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study was arranged to manifest its objectives through preceding it with an intro-duction. Particular attention was paid in the second part to detect the physical settings of the study area, together with an attempt to show the climatic characteristics in Libya. In the third part, observed temporal and spatial climate change in Libya was investigated through the trends of temperature, precipitation, relative humidity and cloud amount over the peri-ods (1946-2000), (1946-1975), and (1976-2000), comparing the results with the global scales. The forth part detected the natural and human causes of climate change concentrat-ing on the greenhouse effect. The potential impacts of climate change on Libya were ex-amined in the fifth chapter. As a case study, desertification of Jifara Plain was studied in the sixth part. In the seventh chapter, projections and mitigations of climate change and desertification were discussed. Ultimately, the main results and recommendations of the study were summarized. In order to carry through the objectives outlined above, the following methods and approaches were used: a simple linear regression analysis was computed to detect the trends of climatic parameters over time; a trend test based on a trend-to-noise-ratio was applied for detecting linear or non-linear trends; the non-parametric Mann-Kendall test for trend was used to reveal the behavior of the trends and their significance; PCA was applied to construct the all-Libya climatic parameters trends; aridity index after Walter-Lieth was shown for computing humid respectively arid months in Libya; correlation coefficient, (after Pearson) for detecting the teleconnection between sun spot numbers, NAOI, SOI, GHGs, and global warming, climate changes in Libya; aridity index, after De Martonne, to elaborate the trends of aridity in Jifara Plain; Geographical Information System and Re-mote Sensing techniques were applied to clarify the illustrations and to monitor desertifi-cation of Jifara Plain using the available satellite images MSS, TM, ETM+ and Shuttle Radar Topography Mission (SRTM). The results are explained by 88 tables, 96 figures and 10 photos. Temporal and spatial temperature changes in Libya indicated remarkably different an-nual and seasonal trends over the long observation period 1946-2000 and the short obser-vation periods 1946-1975 and 1976-2000. Trends of mean annual temperature were posi-tive at all study stations except at one from 1946-2000, negative trends prevailed at most stations from 1946-1975, while strongly positive trends were computed at all study stations from 1976-2000 corresponding with the global warming trend. Positive trends of mean minimum temperatures were observed at all reference stations from 1946-2000 and 1976-2000, while negative trends prevailed at most stations over the period 1946-1975. For mean maximum temperature, positive trends were shown from 1946-2000 and from 1976-2000 at most stations, while most trends were negative from 1946-1975. Minimum tem-peratures increased at nearly more than twice the rate of maximum temperatures at most stations. In respect of seasonal temperature, warming mostly occurred in summer and au-tumn in contrast to the global observations identifying warming mostly in winter and spring in both study periods. Precipitation across Libya is characterized by scanty and sporadically totals, as well as high intensities and very high spatial and temporal variabilities. From 1946-2000, large inter-annual and intra-annual variabilities were observed. Positive trends of annual precipi-tation totals have been observed from 1946-2000, negative trends from 1976-2000 at most stations. Variabilities of seasonal precipitation over Libya are more strikingly experienced from 1976-2000 than from 1951-1975 indicating a growing magnitude of climate change in more recent times. Negative trends of mean annual relative humidity were computed at eight stations, while positive trends prevailed at seven stations from 1946-2000. For the short observation period 1976-2000, positive trends were computed at most stations. Annual cloud amount totals decreased at most study stations in Libya over both long and short periods. Re-markably large spatial variations of climate changes were observed from north to south over Libya. Causes of climate change were discussed showing high correlation between tempera-ture increasing over Libya and CO2 emissions; weakly positive correlation between pre-cipitation and North Atlantic Oscillation index; negative correlation between temperature and sunspot numbers; negative correlation between precipitation over Libya and Southern Oscillation Index. The years 1992 and 1993 were shown as the coldest in the 1990s result-ing from the eruption of Mount Pinatubo, 1991. Libya is affected by climate change in many ways, in particular, crop production and food security, water resources, human health, population settlement and biodiversity. But the effects of climate change depend on its magnitude and the rate with which it occurs. Jifara Plain, located in northwestern Libya, has been seriously exposed to desertifica-tion as a result of climate change, landforms, overgrazing, over-cultivation and population growth. Soils have been degraded, vegetation cover disappeared and the groundwater wells were getting dry in many parts. The effect of desertification on Jifara Plain appears through reducing soil fertility and crop productivity, leading to long-term declines in agri-cultural yields, livestock yields, plant standing biomass, and plant biodiversity. Desertifi-cation has also significant implications on livestock industry and the national economy. Desertification accelerates migration from rural and nomadic areas to urban areas as the land cannot support the original inhabitants. In the absence of major shifts in policy, economic growth, energy prices, and con-sumer trends, climate change in Libya and desertification of Jifara Plain are expected to continue in the future. Libya cooperated with United Nations and other international organizations. It has signed and ratified a number of international and regional agreements which effectively established a policy framework for actions to mitigate climate change and combat deserti-fication. Libya has implemented several laws and legislative acts, with a number of ancil-lary and supplementary rules to regulate. Despite the current efforts and ongoing projects being undertaken in Libya in the field of climate change and desertification, urgent actions and projects are needed to mitigate climate change and combat desertification in the near future.