Biblioteca Digital

34 resultados para Regret.

em Queensland University of Technology - ePrints Archive

A Stochastic View of Optimal Regret through Minimax Duality

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary. Peter L. Bartlett, Alexander Rakhlin

Closing the gap between bandit and full-information online optimization : high-probability regret bound

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We demonstrate a modification of the algorithm of Dani et al for the online linear optimization problem in the bandit setting, which allows us to achieve an O( \sqrt{T ln T} ) regret bound in high probability against an adaptive adversary, as opposed to the in expectation result against an oblivious adversary of Dani et al. We obtain the same dependence on the dimension as that exhibited by Dani et al. The results of this paper rest firmly on those of Dani et al and the remarkable technique of Auer et al for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Optimistic linear programming gives logarithmic regret for irreducible MDPs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates, a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time T is within C(P) log T of the reward obtained by the optimal policy, where C(P) is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities, the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.

High-probability regret bounds for bandit online linear optimization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Self-gifting guilt : an examination of self-gifting motivations and post-purchase regret

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose Self-gifting is a performative process in which consumers purchase products for themselves. The literature to date remains silent on a determination and connection between the extents of post-purchase regret resulting from self-gifting behavior. The purpose of this paper is to examine identification and connection of self-gifting antecedents, self-gifting and the effect on post purchase regret. Design/methodology/approach This study claims the two antecedents of hedonistic shopping and indulgence drive self-gifting behaviors and the attendant regret. A total of 307 shoppers responded to a series of statements concerning the relationships between antecedents of self-gifting behavior and the effect on post-purchase regret. Self-gifting is a multi-dimensional construct, consisting of therapeutic, celebratory, reward and hedonistic imports. Confirmatory factor analysis and AMOS path modeling enabled examination of relationships between the consumer traits of hedonistic shopping and indulgence and the four self-gifting concepts. Findings Hedonic and indulgent shoppers engage in self-gifting for different reasons. A strong and positive relationship was identified between hedonic shoppers and reward, hedonic, therapeutic and celebratory self-gift motivations. hedonic shoppers aligned with indulgent shoppers who also engaged the four self-gifting concepts. The only regret concerning purchase of self-gifts was evident in the therapeutic and celebratory self-gift motivations. Research limitations/implications A major limitation was the age range specification of 18 to 45 years which meant the omission of older generations of regular and experienced shoppers. This study emphasizes the importance of variations in self-gift behaviors and of post-purchase consumer regret. Originality/value This research is the first examination of an hedonic attitude to shopping and indulgent antecedents to self-gift purchasing, the concepts of self-gift motivations and their effect on post-purchase regret.

The Pareto regret frontier

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically.

Blackwell approachability and no-regret learning are equivalent

Relevância:

20.00% 20.00%

Publicador:

Examining psychosocial influences on speeding in Australian and Chinese contexts : a social learning approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speeding remains a significant contributing factor to road trauma internationally, despite increasingly sophisticated speed management strategies being adopted around the world. Increases in travel speed are associated with increases in crash risk and crash severity. As speed choice is a voluntary behaviour, driver perceptions are important to our understanding of speeding and, importantly, to designing effective behavioural countermeasures. The four studies conducted in this program of research represent a comprehensive approach to examining psychosocial influences on driving speeds in two countries that are at very different levels of road safety development: Australia and China. Akers’ social learning theory (SLT) was selected as the theoretical framework underpinning this research and guided the development of key research hypotheses. This theory was chosen because of its ability to encompass psychological, sociological, and criminological perspectives in understanding behaviour, each of which has relevance to speeding. A mixed-method design was used to explore the personal, social, and legal influences on speeding among car drivers in Queensland (Australia) and Beijing (China). Study 1 was a qualitative exploration, via focus group interviews, of speeding among 67 car drivers recruited from south east Queensland. Participants were assigned to groups based on their age and gender, and additionally, according to whether they self-identified as speeding excessively or rarely. This study aimed to elicit information about how drivers conceptualise speeding as well as the social and legal influences on driving speeds. The findings revealed a wide variety of reasons and circumstances that appear to be used as personal justifications for exceeding speed limits. Driver perceptions of speeding as personally and socially acceptable, as well as safe and necessary were common. Perceptions of an absence of danger associated with faster driving speeds were evident, particularly with respect to driving alone. An important distinction between the speed-based groups related to the attention given to the driving task. Rare speeders expressed strong beliefs about the need to be mindful of safety (self and others) while excessive speeders referred to the driving task as automatic, an absent-minded endeavour, and to speeding as a necessity in order to remain alert and reduce boredom. For many drivers in this study, compliance with speed limits was expressed as discretionary rather than mandatory. Social factors, such as peer and parental influence were widely discussed in Study 1 and perceptions of widespread community acceptance of speeding were noted. In some instances, the perception that ‘everybody speeds’ appeared to act as one rationale for the need to raise speed limits. Self-presentation, or wanting to project a positive image of self was noted, particularly with respect to concealing speeding infringements from others to protect one’s image as a trustworthy and safe driver. The influence of legal factors was also evident. Legal sanctions do not appear to influence all drivers to the same extent. For instance, fear of apprehension appeared to play a role in reducing speeding for many, although previous experiences of detection and legal sanctions seemed to have had limited influence on reducing speeding among some drivers. Disregard for sanctions (e.g., driving while suspended), fraudulent demerit point use, and other strategies to avoid detection and punishment were widely and openly discussed. In Study 2, 833 drivers were recruited from roadside service stations in metropolitan and regional locations in Queensland. A quantitative research strategy assessed the relative contribution of personal, social, and legal factors to recent and future self-reported speeding (i.e., frequency of speeding and intentions to speed in the future). Multivariate analyses examining a range of factors drawn from SLT revealed that factors including self-identity (i.e., identifying as someone who speeds), favourable definitions (attitudes) towards speeding, personal experiences of avoiding detection and punishment for speeding, and perceptions of family and friends as accepting of speeding were all significantly associated with greater self-reported speeding. Study 3 was an exploratory, qualitative investigation of psychosocial factors associated with speeding among 35 Chinese drivers who were recruited from the membership of a motoring organisation and a university in Beijing. Six focus groups were conducted to explore similar issues to those examined in Study 1. The findings of Study 3 revealed many similarities with respect to the themes that arose in Australia. For example, there were similarities regarding personal justifications for speeding, such as the perception that posted limits are unreasonably low, the belief that individual drivers are able to determine safe travel speeds according to personal comfort with driving fast, and the belief that drivers possess adequate skills to control a vehicle at high speed. Strategies to avoid detection and punishment were also noted, though they appeared more widespread in China and also appeared, in some cases, to involve the use of a third party, a topic that was not reported by Australian drivers. Additionally, higher perceived enforcement tolerance thresholds were discussed by Chinese participants. Overall, the findings indicated perceptions of a high degree of community acceptance of speeding and a perceived lack of risk associated with speeds that were well above posted speed limits. Study 4 extended the exploratory research phase in China with a quantitative investigation involving 299 car drivers recruited from car washes in Beijing. Results revealed a relatively inexperienced sample with less than 5 years driving experience, on average. One third of participants perceived that the certainty of penalties when apprehended was low and a similar proportion of Chinese participants reported having previously avoided legal penalties when apprehended for speeding. Approximately half of the sample reported that legal penalties for speeding were ‘minimally to not at all’ severe. Multivariate analyses revealed that past experiences of avoiding detection and punishment for speeding, as well as favourable attitudes towards speeding, and perceptions of strong community acceptance of speeding were most strongly associated with greater self-reported speeding in the Chinese sample. Overall, the results of this research make several important theoretical contributions to the road safety literature. Akers’ social learning theory was found to be robust across cultural contexts with respect to speeding; similar amounts of variance were explained in self-reported speeding in the quantitative studies conducted in Australia and China. Historically, SLT was devised as a theory of deviance and posits that deviance and conformity are learned in the same way, with the balance of influence stemming from the ways in which behaviour is rewarded and punished (Akers, 1998). This perspective suggests that those who speed and those who do not are influenced by the same mechanisms. The inclusion of drivers from both ends of the ‘speeding spectrum’ in Study 1 provided an opportunity to examine the wider utility of SLT across the full range of the behaviour. One may question the use of a theory of deviance to investigate speeding, a behaviour that could, arguably, be described as socially acceptable and prevalent. However, SLT seemed particularly relevant to investigating speeding because of its inclusion of association, imitation, and reinforcement variables which reflect the breadth of factors already found to be potentially influential on driving speeds. In addition, driving is a learned behaviour requiring observation, guidance, and practice. Thus, the reinforcement and imitation concepts are particularly relevant to this behaviour. Finally, current speed management practices are largely enforcement-based and rely on the principles of behavioural reinforcement captured within the reinforcement component of SLT. Thus, the application of SLT to a behaviour such as speeding offers promise in advancing our understanding of the factors that influence speeding, as well as extending our knowledge of the application of SLT. Moreover, SLT could act as a valuable theoretical framework with which to examine other illegal driving behaviours that may not necessarily be seen as deviant by the community (e.g., mobile phone use while driving). This research also made unique contributions to advancing our understanding of the key components and the overall structure of Akers’ social learning theory. The broader SLT literature is lacking in terms of a thorough structural understanding of the component parts of the theory. For instance, debate exists regarding the relevance of, and necessity for including broader social influences in the model as captured by differential association. In the current research, two alternative SLT models were specified and tested in order to better understand the nature and extent of the influence of differential association on behaviour. Importantly, the results indicated that differential association was able to make a unique contribution to explaining self-reported speeding, thereby negating the call to exclude it from the model. The results also demonstrated that imitation was a discrete theoretical concept that should also be retained in the model. The results suggest a need to further explore and specify mechanisms of social influence in the SLT model. In addition, a novel approach was used to operationalise SLT variables by including concepts drawn from contemporary social psychological and deterrence-based research to enhance and extend the way that SLT variables have traditionally been examined. Differential reinforcement was conceptualised according to behavioural reinforcement principles (i.e., positive and negative reinforcement and punishment) and incorporated concepts of affective beliefs, anticipated regret, and deterrence-related concepts. Although implicit in descriptions of SLT, little research has, to date, made use of the broad range of reinforcement principles to understand the factors that encourage or inhibit behaviour. This approach has particular significance to road user behaviours in general because of the deterrence-based nature of many road safety countermeasures. The concept of self-identity was also included in the model and was found to be consistent with the definitions component of SLT. A final theoretical contribution was the specification and testing of a full measurement model prior to model testing using structural equation modelling. This process is recommended in order to reduce measurement error by providing an examination of the psychometric properties of the data prior to full model testing. Despite calls for such work for a number of decades, the current work appears to be the only example of a full measurement model of SLT. There were also a number of important practical implications that emerged from this program of research. Firstly, perceptions regarding speed enforcement tolerance thresholds were highlighted as a salient influence on driving speeds in both countries. The issue of enforcement tolerance levels generated considerable discussion among drivers in both countries, with Australian drivers reporting lower perceived tolerance levels than Chinese drivers. It was clear that many drivers used the concept of an enforcement tolerance in determining their driving speed, primarily with the desire to drive faster than the posted speed limit, yet remaining within a speed range that would preclude apprehension by police. The quantitative results from Studies 2 and 4 added support to these qualitative findings. Together, the findings supported previous research and suggested that a travel speed may not be seen as illegal until that speed reaches a level over the prescribed enforcement tolerance threshold. In other words, the enforcement tolerance appears to act as a ‘de facto’ speed limit, replacing the posted limit in the minds of some drivers. The findings from the two studies conducted in China (Studies 2 and 4) further highlighted the link between perceived enforcement tolerances and a ‘de facto’ speed limit. Drivers openly discussed driving at speeds that were well above posted speed limits and some participants noted their preference for driving at speeds close to ‘50% above’ the posted limit. This preference appeared to be shaped by the perception that the same penalty would be imposed if apprehended, irrespective of what speed they travelling (at least up to 50% above the limit). Further research is required to determine whether the perceptions of Chinese drivers are mainly influenced by the Law of the People’s Republic of China or by operational practices. Together, the findings from both studies in China indicate that there may be scope to refine enforcement tolerance levels, as has happened in other jurisdictions internationally over time, in order to reduce speeding. Any attempts to do so would likely be assisted by the provision of information about the legitimacy and purpose of speed limits as well as risk factors associated with speeding because these issues were raised by Chinese participants in the qualitative research phase. Another important practical implication of this research for speed management in China is the way in which penalties are determined. Chinese drivers described perceptions of unfairness and a lack of transparency in the enforcement system because they were unsure of the penalty that they would receive if apprehended. Steps to enhance the perceived certainty and consistency of the system to promote a more equitable approach to detection and punishment would appear to be welcomed by the general driving public and would be more consistent with the intended theoretical (deterrence) basis that underpins the current speed enforcement approach. The use of mandatory, fixed penalties may assist in this regard. In many countries, speeding attracts penalties that are dependent on the severity of the offence. In China, there may be safety benefits gained from the introduction of a similar graduated scale of speeding penalties and fixed penalties might also help to address the issue of uncertainty about penalties and related perceptions of unfairness. Such advancements would be in keeping with the principles of best practice for speed management as identified by the World Health Organisation. Another practical implication relating to legal penalties, and applicable to both cultural contexts, relates to the issues of detection and punishment avoidance. These two concepts appeared to strongly influence speeding in the current samples. In Australia, detection avoidance strategies reported by participants generally involved activities that are not illegal (e.g., site learning and remaining watchful for police vehicles). The results from China were similar, although a greater range of strategies were reported. The most common strategy reported in both countries for avoiding detection when speeding was site learning, or familiarisation with speed camera locations. However, a range of illegal practices were also described by Chinese drivers (e.g., tampering with or removing vehicle registration plates so as to render the vehicle unidentifiable on camera and use of in-vehicle radar detectors). With regard to avoiding punishment when apprehended, a range of strategies were reported by drivers from both countries, although a greater range of strategies were reported by Chinese drivers. As the results of the current research indicated that detection avoidance was strongly associated with greater self-reported speeding in both samples, efforts to reduce avoidance opportunities are strongly recommended. The practice of randomly scheduling speed camera locations, as is current practice in Queensland, offers one way to minimise site learning. The findings of this research indicated that this practice should continue. However, they also indicated that additional strategies are needed to reduce opportunities to evade detection. The use of point-to-point speed detection (also known as sectio

Implicit online learning

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online learning algorithms have recently risen to prominence due to their strong theoretical guarantees and an increasing number of practical applications for large-scale data analysis problems. In this paper, we analyze a class of online learning algorithms based on fixed potentials and nonlinearized losses, which yields algorithms with implicit update rules. We show how to efficiently compute these updates, and we prove regret bounds for the algorithms. We apply our formulation to several special cases where our approach has benefits over existing online learning methods. In particular, we provide improved algorithms and bounds for the online metric learning problem, and show improved robustness for online linear prediction problems. Results over a variety of data sets demonstrate the advantages of our framework.

Optimal strategies and minimax lower bounds for online convex games

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A number of learning problems can be cast as an Online Convex Game: on each round, a learner makes a prediction x from a convex set, the environment plays a loss function f, and the learner’s long-term goal is to minimize regret. Algorithms have been proposed by Zinkevich, when f is assumed to be convex, and Hazan et al., when f is assumed to be strongly convex, that have provably low regret. We consider these two settings and analyze such games from a minimax perspective, proving minimax strategies and lower bounds in each case. These results prove that the existing algorithms are essentially optimal.

Matrix regularization techniques for online multitask learning

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we examine the problem of prediction with expert advice in a setup where the learner is presented with a sequence of examples coming from different tasks. In order for the learner to be able to benefit from performing multiple tasks simultaneously, we make assumptions of task relatedness by constraining the comparator to use a lesser number of best experts than the number of tasks. We show how this corresponds naturally to learning under spectral or structural matrix constraints, and propose regularization techniques to enforce the constraints. The regularization techniques proposed here are interesting in their own right and multitask learning is just one application for the ideas. A theoretical analysis of one such regularizer is performed, and a regret bound that shows benefits of this setup is reported.

Adaptive online gradient descent

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between [square root T] and [log T]. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.

REGAL : a regularization based algorithm for reinforcement learning in weakly communicating MDPs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~ O(HS p AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.

Walking when intoxicated : an investigation of the factors which influence individuals' drink walking intentions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Annually, in Australia, 10-15% of all road-related fatalities involve pedestrians. Of those pedestrians fatally injured, approximately 45% were walking while intoxicated or 'drink walking'. Drink walking is increasing in prevalence and younger persons may be especially prone to engage in this behaviour and, thus, are at heightened risk of being injured or killed. Presently, limited research is available regarding the factors which influence individuals to drink walk. This study explored young people's (17-25 years) intentions to drink walk, using an extended Theory of Planned Behaviour (TPB). Participants (N = 215), completed a self-report questionnaire which assessed the standard TPB constructs (attitude, subjective norm, perceived behavioural control) as well as the extended constructs of risk perception, anticipated regret, and past behaviour. It was hypothesised that the standard TPB constructs would significantly predict individuals' reported intentions to drink walk and that the additional constructs would predict intentions over and above the TPB constructs. The TPB variables significantly predicted 63.2% of the variance in individuals' reported intentions to drink walk, and the additional variables, combined, explained a further 6.1% of the variance. Of the additional constructs, anticipated regret and past behaviour, but not risk perception, were significant predictors of drink walking intentions. As one of the first studies to provide a theoretically-based investigation of factors influencing individuals' drink walking intentions, the current study's findings have potentially significant implications for understanding young people's decisions to drink walk and the design of future countermeasures to ultimately reduce this behaviour.

Emotions, crime and justice [Onati International Series in Law and Society, Number 1]

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The return of emotions to debates about crime and criminal justice has been a striking development of recent decades across many jurisdictions. This has been registered in the return of shame to justice procedures, a heightened focus on victims and their emotional needs, fear of crime as a major preoccupation of citizens and politicians, and highly emotionalised public discourses on crime and justice. But how can we best make sense of these developments? Do we need to create "emotionally intelligent" justice systems, or are we messing recklessly with the rational foundations of liberal criminal justice? This volume brings together leading criminologists and sociologists from across the world in a much needed conversation about how to re-calibrate reason and emotion in crime and justice today. The contributions range from the micro-analysis of emotions in violent encounters to the paradoxes and tensions that arise from the emotionalisation of criminal justice in the public sphere. They explore the emotional labour of workers in police and penal institutions, the justice experiences of victims and offenders, and the role of vengeance, forgiveness and regret in the aftermath of violence and conflict resolution. The result is a set of original essays which offer a fresh and timely perspective on problems of crime and justice in contemporary liberal democracies.

«
1
2
3
»