66 resultados para ACTION SELECTION
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This paper proposes a field application of a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot in cable tracking task. The learning system is characterized by using a direct policy search method for learning the internal state/action mapping. Policy only algorithms may suffer from long convergence times when dealing with real robotics. In order to speed up the process, the learning phase has been carried out in a simulated environment and, in a second step, the policy has been transferred and tested successfully on a real robot. Future steps plan to continue the learning process on-line while on the real robot while performing the mentioned task. We demonstrate its feasibility with real experiments on the underwater robot ICTINEU AUV
Resumo:
Autonomous underwater vehicles (AUV) represent a challenging control problem with complex, noisy, dynamics. Nowadays, not only the continuous scientific advances in underwater robotics but the increasing number of subsea missions and its complexity ask for an automatization of submarine processes. This paper proposes a high-level control system for solving the action selection problem of an autonomous robot. The system is characterized by the use of reinforcement learning direct policy search methods (RLDPS) for learning the internal state/action mapping of some behaviors. We demonstrate its feasibility with simulated experiments using the model of our underwater robot URIS in a target following task
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
We conduct a laboratory experiment to study how advice affects the gender gap in the entry into a real-effort tournament. Our experiment is motivated by the concerns raised by approaching the gender gap through affirmative action. Advice is given by subjects who have already had some experience with the participation decision. We show that advice improves the entry decision of subjects, in that forgone earnings due to wrong entry decisions go significantly down. This is mainly driven by significantly increased entry of strong performing women, who also become significantly more confident, and reduced entry of weak performing men.
Resumo:
In Drosophila, the insulin-signaling pathway controls some life history traits, such as fertility and lifespan, and it is considered to be the main metabolic pathway involved in establishing adult body size. Several observations concerning variation in body size in the Drosophila genus are suggestive of its adaptive character. Genes encoding proteins in this pathway are, therefore, good candidates to have experienced adaptive changes and to reveal the footprint of positive selection. The Drosophila insulin-like peptides (DILPs) are the ligands that trigger the insulin-signaling cascade. In Drosophila melanogaster, there are several peptides that are structurally similar to the single mammalian insulin peptide. The footprint of recent adaptive changes on nucleotide variation can be unveiled through the analysis of polymorphism and divergence. With this aim, we have surveyed nucleotide sequence variation at the dilp1-7 genes in a natural population of D. melanogaster. The comparison of polymorphism in D. melanogaster and divergence from D. simulans at different functional classes of the dilp genes provided no evidence of adaptive protein evolution after the split of the D. melanogaster and D. simulans lineages. However, our survey of polymorphism at the dilp gene regions of D. melanogaster has provided some evidence for the action of positive selection at or near these genes. The regions encompassing the dilp1-4 genes and the dilp6 gene stand out as likely affected by recent adaptive events.
Resumo:
Markowitz portfolio theory (1952) has induced research into the efficiency of portfolio management. This paper studies existing nonparametric efficiency measurement approaches for single period portfolio selection from a theoretical perspective and generalises currently used efficiency measures into the full mean-variance space. Therefore, we introduce the efficiency improvement possibility function (a variation on the shortage function), study its axiomatic properties in the context of Markowitz efficient frontier, and establish a link to the indirect mean-variance utility function. This framework allows distinguishing between portfolio efficiency and allocative efficiency. Furthermore, it permits retrieving information about the revealed risk aversion of investors. The efficiency improvement possibility function thus provides a more general framework for gauging the efficiency of portfolio management using nonparametric frontier envelopment methods based on quadratic optimisation.
Resumo:
This comment corrects the errors in the estimation process that appear in Martins (2001). The first error is in the parametric probit estimation, as the previously presented results do not maximize the log-likelihood function. In the global maximum more variables become significant. As for the semiparametric estimation method, the kernel function used in Martins (2001) can take on both positive and negative values, which implies that the participation probability estimates may be outside the interval [0,1]. We have solved the problem by applying local smoothing in the kernel estimation, as suggested by Klein and Spady (1993).
Resumo:
We study whether selection affects motivation. In our experiment subjects first answer a personality questionnaire. They then play a 3-person game. One of the three players decides between an outside option assigning him a positive amount, but leaving the two others empty-handed and allowing one of the other two players to distribute a pie. Treatments differ in the procedure by which distributive power is assigned: to a randomly determined or to a knowingly selected partner. Before making her decision the selecting player could consult the personality questionnaire of the other two players. Results show that knowingly selected players keep less for themselves than randomly selected ones and reward the selecting player more generously.
Resumo:
This paper studies collective choice rules whose outcomes consist of a collection of simultaneous decisions, each one of which is the only concern of some group of individuals in society. The need for such rules arises in different contexts, including the establishment of jurisdictions, the location of multiple public facilities, or the election of representative committees. We define a notion of allocation consistency requiring that each partial aspect of the global decision taken by society as a whole should be ratified by the group of agents who are directly concerned with this particular aspect. We investigate the possibility of designing envy-free allocation consistent rules, we also explore whether such rules may also respect the Condorcet criterion.
Resumo:
We study competition in experimental markets in which two incumbents face entry by three other firms. Our treatments vary with respect to three factors: sequential vs. block or simultaneous entry, the cost functions of entrants and the amount of time during which incumbents are protected from entry. Before entry incumbents are able to collude in all cases. When all firms' costs are the same entry always leads consumer surplus and profits to their equilibrium levels. When entrants are more efficient than incumbents, entry leads consumer surplus to equilibrium. However, total profits remain below equilibrium, due to the fact that the inefficient incumbents produce too much and efficient entrants produce too little. Market behavior is satisfactory from the consumers' standpoint, but does not yield adequate signals to other potential entrants. These results are not affected by whether entry is simultaneous or sequential. The length of the incumbency phase does have some subtle effects.
Resumo:
In this paper a contest game with heterogeneous players is analyzed in which heterogeneity could be the consequence of past discrimination. Based on the normative perception of the heterogeneity there are two policy options to tackle this heterogeneity: either it is ignored and the contestants are treated equally, or affirmative action is implemented which compensates discriminated players. The consequences of these two policy options are analyzed for a simple two-person contest game and it is shown that the frequently criticized trade-off between affirmative action and total effort does not exist: Instead, affirmative action fosters effort incentives. A generalization to the n-person case and to a case with a partially informed contest designer yields the same result if the participation level is similar under each policy.
Resumo:
The productive characteristics of migrating individuals, emigrant selection, affect welfare. The empirical estimation of the degree of selection suffers from a lack of complete and nationally representative data. This paper uses a new and better dataset to address both issues: the ENET (Mexican Labor Survey), which identifies emigrants right before they leave and allows a direct comparison to non-migrants. This dataset presents a relevant dichotomy: it shows on average negative selection for Mexican emigrants to the United States for the period 2000-2004 together with positive selection in Mexican emigration out of rural Mexico to the United States in the same period. Three theories that could explain this dichotomy are tested. Whereas higher skill prices in Mexico than in the US are enough to explain negative selection in urban Mexico, its combination with network effects and wealth constraints is required to account for positive selection in rural Mexico.
Resumo:
This paper examines the extent to which Mexican emigrants to the United States are negatively selected, that is, have lower skills than individuals who remain in Mexico. Previous studies have been limited by the lack of nationally representative longitudinal data. This one uses a newly available household survey, which identifies emigrants before they leave and allows a direct comparison to non-migrants. I find that, on average, US bound Mexican emigrants from 2000 to 2004 earn a lower wage and have less schooling years than individuals who remain in Mexico, evidence of negative selection. This supports the original hypothesis of Borjas (AER, 1987) and argues against recent findings, notably those of Chiquiar and Hanson (JPE, 2005). The discrepancy with the latter is primarily due to an under-count of unskilled migrants in US sources and secondarily to the omission of unobservables in their methodology.