997 resultados para Partial Observability


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cyber-physical systems integrate computation, networking, and physical processes. Substantial research challenges exist in the design and verification of such large-scale, distributed sensing, ac- tuation, and control systems. Rapidly improving technology and recent advances in control theory, networked systems, and computer science give us the opportunity to drastically improve our approach to integrated flow of information and cooperative behavior. Current systems rely on text-based spec- ifications and manual design. Using new technology advances, we can create easier, more efficient, and cheaper ways of developing these control systems. This thesis will focus on design considera- tions for system topologies, ways to formally and automatically specify requirements, and methods to synthesize reactive control protocols, all within the context of an aircraft electric power system as a representative application area.

This thesis consists of three complementary parts: synthesis, specification, and design. The first section focuses on the synthesis of central and distributed reactive controllers for an aircraft elec- tric power system. This approach incorporates methodologies from computer science and control. The resulting controllers are correct by construction with respect to system requirements, which are formulated using the specification language of linear temporal logic (LTL). The second section addresses how to formally specify requirements and introduces a domain-specific language for electric power systems. A software tool automatically converts high-level requirements into LTL and synthesizes a controller.

The final sections focus on design space exploration. A design methodology is proposed that uses mixed-integer linear programming to obtain candidate topologies, which are then used to synthesize controllers. The discrete-time control logic is then verified in real-time by two methods: hardware and simulation. Finally, the problem of partial observability and dynamic state estimation is ex- plored. Given a set placement of sensors on an electric power system, measurements from these sensors can be used in conjunction with control logic to infer the state of the system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Target tracking with bearing-only sensors is a challenging problem when the target moves dynamically in complex scenarios. Besides the partial observability of such sensors, they have limited field of views, occlusions can occur, etc. In those cases, cooperative approaches with multiple tracking robots are interesting, but the different sources of uncertain information need to be considered appropriately in order to achieve better estimates. Even though there exist probabilistic filters that can estimate the position of a target dealing with incertainties, bearing-only measurements bring usually additional problems with initialization and data association. In this paper, we propose a multi-robot triangulation method with a dynamic baseline that can triangulate bearing-only measurements in a probabilistic manner to produce 3D observations. This method is combined with a decentralized stochastic filter and used to tackle those initialization and data association issues. The approach is validated with simulations and field experiments where a team of aerial and ground robots with cameras track a dynamic target.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Excessive labor turnover may be considered, to a great extent, an undesirable feature of a given economy. This follows from considerations such as underinvestment in human capital by firms. Understanding the determinants and the evolution of turnover in a particular labor market is therefore of paramount importance, including policy considerations. The present paper proposes an econometric analysis of turnover in the Brazilian labor market, based on a partial observability bivariate probit model. This model considers the interdependence of decisions taken by workers and firms, helping to elucidate the causes that lead each of them to end an employment relationship. The Employment and Unemployment Survey (PED) conducted by the State System of Data Analysis (SEADE) and by the Inter-Union Department of Statistics and Socioeconomic Studies (DIEESE) provides data at the individual worker level, allowing for the estimation of the joint probabilities of decisions to quit or stay on the job on the worker’s side, and to maintain or fire the employee on the firm’s side, during a given time period. The estimated parameters relate these estimated probabilities to the characteristics of workers, job contracts, and to the potential macroeconomic determinants in different time periods. The results confirm the theoretical prediction that the probability of termination of an employment relationship tends to be smaller as the worker acquires specific skills. The results also show that the establishment of a formal employment relationship reduces the probability of a quit decision by the worker, and also the firm’s firing decision in non-industrial sectors. With regard to the evolution of quit probability over time, the results show that an increase in the unemployment rate inhibits quitting, although this tends to wane as the unemployment rate rises.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Enhanced Scan design can significantly improve the fault coverage for two pattern delay tests at the cost of exorbitantly high area overhead. The redundant flip-flops introduced in the scan chains have traditionally only been used to launch the two-pattern delay test inputs, not to capture tests results. This paper presents a new, much lower cost partial Enhanced Scan methodology with both improved controllability and observability. Facilitating observation of some hard to observe internal nodes by capturing their response in the already available and underutilized redundant flip-flops improves delay fault coverage with minimal or almost negligible cost. Experimental results on ISCAS'89 benchmark circuits show significant improvement in TDF fault coverage for this new partial enhance scan methodology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dans cette thèse, je me suis interessé à l’identification partielle des effets de traitements dans différents modèles de choix discrets avec traitements endogènes. Les modèles d’effets de traitement ont pour but de mesurer l’impact de certaines interventions sur certaines variables d’intérêt. Le type de traitement et la variable d’intérêt peuvent être défini de manière générale afin de pouvoir être appliqué à plusieurs différents contextes. Il y a plusieurs exemples de traitement en économie du travail, de la santé, de l’éducation, ou en organisation industrielle telle que les programmes de formation à l’emploi, les techniques médicales, l’investissement en recherche et développement, ou l’appartenance à un syndicat. La décision d’être traité ou pas n’est généralement pas aléatoire mais est basée sur des choix et des préférences individuelles. Dans un tel contexte, mesurer l’effet du traitement devient problématique car il faut tenir compte du biais de sélection. Plusieurs versions paramétriques de ces modèles ont été largement étudiées dans la littérature, cependant dans les modèles à variation discrète, la paramétrisation est une source importante d’identification. Dans un tel contexte, il est donc difficile de savoir si les résultats empiriques obtenus sont guidés par les données ou par la paramétrisation imposée au modèle. Etant donné, que les formes paramétriques proposées pour ces types de modèles n’ont généralement pas de fondement économique, je propose dans cette thèse de regarder la version nonparamétrique de ces modèles. Ceci permettra donc de proposer des politiques économiques plus robustes. La principale difficulté dans l’identification nonparamétrique de fonctions structurelles, est le fait que la structure suggérée ne permet pas d’identifier un unique processus générateur des données et ceci peut être du soit à la présence d’équilibres multiples ou soit à des contraintes sur les observables. Dans de telles situations, les méthodes d’identifications traditionnelles deviennent inapplicable d’où le récent développement de la littérature sur l’identification dans les modèles incomplets. Cette littérature porte une attention particuliere à l’identification de l’ensemble des fonctions structurelles d’intérêt qui sont compatibles avec la vraie distribution des données, cet ensemble est appelé : l’ensemble identifié. Par conséquent, dans le premier chapitre de la thèse, je caractérise l’ensemble identifié pour les effets de traitements dans le modèle triangulaire binaire. Dans le second chapitre, je considère le modèle de Roy discret. Je caractérise l’ensemble identifié pour les effets de traitements dans un modèle de choix de secteur lorsque la variable d’intérêt est discrète. Les hypothèses de sélection du secteur comprennent le choix de sélection simple, étendu et généralisé de Roy. Dans le dernier chapitre, je considère un modèle à variable dépendante binaire avec plusieurs dimensions d’hétérogéneité, tels que les jeux d’entrées ou de participation. je caractérise l’ensemble identifié pour les fonctions de profits des firmes dans un jeux avec deux firmes et à information complète. Dans tout les chapitres, l’ensemble identifié des fonctions d’intérêt sont écrites sous formes de bornes et assez simple pour être estimées à partir des méthodes d’inférence existantes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reduced order multi-functional observer design for multi-input multi-utput (MIMO) linear time-invariant (LTI) systems with constant delayed inputs is studied. This research is useful in the input estimation of LTI systems with actuator delay, as well as system monitoring and fault detection of these systems. Two approaches for designing an asymptotically stable functional observer for the system are proposed: delay-dependent and delay-free. The delay-dependent observer is infinite-dimensional, while the delay-free structure is finite-dimensional. Moreover, since the delay-free observer does not require any information on the time delay, it is more practical in real applications. However, the delay-dependent observer contains less restrictive assumptions and covers more variety of systems. The proposed observer design schemes are novel, simple to implement, and have improved numerical features compared to some of the other available approaches to design (unknown-input) functional observers. In addition, the proposed observers usually possess lower order than ordinary Luenberger observers, and the design schemes do not need the observability or detectability requirements of the system. The necessary and sufficient conditions of the existence of an asymptoticobserver in each scenario are explored. The extensions of the proposed observers to systems with multiple delayed-inputs are also discussed. Several numerical examples and simulation results are employed to support our theories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The accuracy of data derived from linked-segment models depends on how well the system has been represented. Previous investigations describing the gait of persons with partial foot amputation did not account for the unique anthropometry of the residuum or the inclusion of a prosthesis and footwear in the model and, as such, are likely to have underestimated the magnitude of the peak joint moments and powers. This investigation determined the effect of inaccuracies in the anthropometric input data on the kinetics of gait. Toward this end, a geometric model was developed and validated to estimate body segment parameters of various intact and partial feet. These data were then incorporated into customized linked-segment models, and the kinetic data were compared with that obtained from conventional models. Results indicate that accurate modeling increased the magnitude of the peak hip and knee joint moments and powers during terminal swing. Conventional inverse dynamic models are sufficiently accurate for research questions relating to stance phase. More accurate models that account for the anthropometry of the residuum, prosthesis, and footwear better reflect the work of the hip extensors and knee flexors to decelerate the limb during terminal swing phase.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Examined the social adaptation of 32 children in grades 3–6 with mild intellectual disability: 13 Ss were partially integrated into regular primary school classes and 19 Ss were full-time in separate classes. Sociometric status was assessed using best friend and play rating measures. Consistent with previous research, children with intellectual disability were less socially accepted than were a matched group of 32 children with no learning disabilities. Children in partially integrated classes received more play nominations than those in separate classes, but had no greater acceptance as a best friend. On teachers' reports, disabled children had higher levels of inappropriate social behaviours, but there was no significant difference in appropriate behaviours. Self-assessments by integrated children were more negative than those by children in separate classes, and their peer-relationship satisfaction was lower. Ratings by disabled children of their satisfaction with peer relationships were associated with ratings of appropriate social skills by themselves and their teachers, and with self-ratings of negative behaviour. The study confirmed that partial integration can have negative consequences for children with an intellectual disability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider the numerical solution of a fractional partial differential equation with Riesz space fractional derivatives (FPDE-RSFD) on a finite domain. Two types of FPDE-RSFD are considered: the Riesz fractional diffusion equation (RFDE) and the Riesz fractional advection–dispersion equation (RFADE). The RFDE is obtained from the standard diffusion equation by replacing the second-order space derivative with the Riesz fractional derivative of order αset membership, variant(1,2]. The RFADE is obtained from the standard advection–dispersion equation by replacing the first-order and second-order space derivatives with the Riesz fractional derivatives of order βset membership, variant(0,1) and of order αset membership, variant(1,2], respectively. Firstly, analytic solutions of both the RFDE and RFADE are derived. Secondly, three numerical methods are provided to deal with the Riesz space fractional derivatives, namely, the L1/L2-approximation method, the standard/shifted Grünwald method, and the matrix transform method (MTM). Thirdly, the RFDE and RFADE are transformed into a system of ordinary differential equations, which is then solved by the method of lines. Finally, numerical results are given, which demonstrate the effectiveness and convergence of the three numerical methods.