Biblioteca Digital

38 resultados para Minimax-regret

em Queensland University of Technology - ePrints Archive

Second-order quantile methods for experts and combinatorial games

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.

A Stochastic View of Optimal Regret through Minimax Duality

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary. Peter L. Bartlett, Alexander Rakhlin

Optimal strategies and minimax lower bounds for online convex games

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of learning problems can be cast as an Online Convex Game: on each round, a learner makes a prediction x from a convex set, the environment plays a loss function f, and the learner’s long-term goal is to minimize regret. Algorithms have been proposed by Zinkevich, when f is assumed to be convex, and Hazan et al., when f is assumed to be strongly convex, that have provably low regret. We consider these two settings and analyze such games from a minimax perspective, proving minimax strategies and lower bounds in each case. These results prove that the existing algorithms are essentially optimal.

Closing the gap between bandit and full-information online optimization : high-probability regret bound

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We demonstrate a modification of the algorithm of Dani et al for the online linear optimization problem in the bandit setting, which allows us to achieve an O( \sqrt{T ln T} ) regret bound in high probability against an adaptive adversary, as opposed to the in expectation result against an oblivious adversary of Dani et al. We obtain the same dependence on the dimension as that exhibited by Dani et al. The results of this paper rest firmly on those of Dani et al and the remarkable technique of Auer et al for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Optimistic linear programming gives logarithmic regret for irreducible MDPs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates, a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time T is within C(P) log T of the reward obtained by the optimal policy, where C(P) is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities, the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.

High-probability regret bounds for bandit online linear optimization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Self-gifting guilt : an examination of self-gifting motivations and post-purchase regret

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose Self-gifting is a performative process in which consumers purchase products for themselves. The literature to date remains silent on a determination and connection between the extents of post-purchase regret resulting from self-gifting behavior. The purpose of this paper is to examine identification and connection of self-gifting antecedents, self-gifting and the effect on post purchase regret. Design/methodology/approach This study claims the two antecedents of hedonistic shopping and indulgence drive self-gifting behaviors and the attendant regret. A total of 307 shoppers responded to a series of statements concerning the relationships between antecedents of self-gifting behavior and the effect on post-purchase regret. Self-gifting is a multi-dimensional construct, consisting of therapeutic, celebratory, reward and hedonistic imports. Confirmatory factor analysis and AMOS path modeling enabled examination of relationships between the consumer traits of hedonistic shopping and indulgence and the four self-gifting concepts. Findings Hedonic and indulgent shoppers engage in self-gifting for different reasons. A strong and positive relationship was identified between hedonic shoppers and reward, hedonic, therapeutic and celebratory self-gift motivations. hedonic shoppers aligned with indulgent shoppers who also engaged the four self-gifting concepts. The only regret concerning purchase of self-gifts was evident in the therapeutic and celebratory self-gift motivations. Research limitations/implications A major limitation was the age range specification of 18 to 45 years which meant the omission of older generations of regular and experienced shoppers. This study emphasizes the importance of variations in self-gift behaviors and of post-purchase consumer regret. Originality/value This research is the first examination of an hedonic attitude to shopping and indulgent antecedents to self-gift purchasing, the concepts of self-gift motivations and their effect on post-purchase regret.

The Pareto regret frontier

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically.

Asymptotic minimax robust and misspecified Lorden quickest change detection for dependent stochastic processes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The quick detection of an abrupt unknown change in the conditional distribution of a dependent stochastic process has numerous applications. In this paper, we pose a minimax robust quickest change detection problem for cases where there is uncertainty about the post-change conditional distribution. Our minimax robust formulation is based on the popular Lorden criteria of optimal quickest change detection. Under a condition on the set of possible post-change distributions, we show that the widely known cumulative sum (CUSUM) rule is asymptotically minimax robust under our Lorden minimax robust formulation as a false alarm constraint becomes more strict. We also establish general asymptotic bounds on the detection delay of misspecified CUSUM rules (i.e. CUSUM rules that are designed with post- change distributions that differ from those of the observed sequence). We exploit these bounds to compare the delay performance of asymptotically minimax robust, asymptotically optimal, and other misspecified CUSUM rules. In simulation examples, we illustrate that asymptotically minimax robust CUSUM rules can provide better detection delay performance at greatly reduced computation effort compared to competing generalised likelihood ratio procedures.

Blackwell approachability and no-regret learning are equivalent

Relevância:

20.00% 20.00%

Publicador:

Optimal strategies and minimax lower bounds for online convex games [Technical Report No. UCB/EECS-2008-19]

Relevância:

20.00% 20.00%

Publicador:

Efficient minimax strategies for square loss games

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. We derive the minimax solutions for the case where the prediction and action spaces are the simplex (this setup is sometimes called the Brier game) and the \ell_2 ball (this setup is related to Gaussian density estimation). We show that in both cases the value of each sub-game is a quadratic function of a simple statistic of the state, with coefficients that can be efficiently computed using an explicit recurrence relation. The resulting deterministic minimax strategy and randomized maximin strategy are linear functions of the statistic.

Online hidden Markov model parameter estimation and minimax robust quickest change detection in uncertain stochastic processes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Stochastic (or random) processes are inherent to numerous fields of human endeavour including engineering, science, and business and finance. This thesis presents multiple novel methods for quickly detecting and estimating uncertainties in several important classes of stochastic processes. The significance of these novel methods is demonstrated by employing them to detect aircraft manoeuvres in video signals in the important application of autonomous mid-air collision avoidance.

Examining psychosocial influences on speeding in Australian and Chinese contexts : a social learning approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speeding remains a significant contributing factor to road trauma internationally, despite increasingly sophisticated speed management strategies being adopted around the world. Increases in travel speed are associated with increases in crash risk and crash severity. As speed choice is a voluntary behaviour, driver perceptions are important to our understanding of speeding and, importantly, to designing effective behavioural countermeasures. The four studies conducted in this program of research represent a comprehensive approach to examining psychosocial influences on driving speeds in two countries that are at very different levels of road safety development: Australia and China. Akers’ social learning theory (SLT) was selected as the theoretical framework underpinning this research and guided the development of key research hypotheses. This theory was chosen because of its ability to encompass psychological, sociological, and criminological perspectives in understanding behaviour, each of which has relevance to speeding. A mixed-method design was used to explore the personal, social, and legal influences on speeding among car drivers in Queensland (Australia) and Beijing (China). Study 1 was a qualitative exploration, via focus group interviews, of speeding among 67 car drivers recruited from south east Queensland. Participants were assigned to groups based on their age and gender, and additionally, according to whether they self-identified as speeding excessively or rarely. This study aimed to elicit information about how drivers conceptualise speeding as well as the social and legal influences on driving speeds. The findings revealed a wide variety of reasons and circumstances that appear to be used as personal justifications for exceeding speed limits. Driver perceptions of speeding as personally and socially acceptable, as well as safe and necessary were common. Perceptions of an absence of danger associated with faster driving speeds were evident, particularly with respect to driving alone. An important distinction between the speed-based groups related to the attention given to the driving task. Rare speeders expressed strong beliefs about the need to be mindful of safety (self and others) while excessive speeders referred to the driving task as automatic, an absent-minded endeavour, and to speeding as a necessity in order to remain alert and reduce boredom. For many drivers in this study, compliance with speed limits was expressed as discretionary rather than mandatory. Social factors, such as peer and parental influence were widely discussed in Study 1 and perceptions of widespread community acceptance of speeding were noted. In some instances, the perception that ‘everybody speeds’ appeared to act as one rationale for the need to raise speed limits. Self-presentation, or wanting to project a positive image of self was noted, particularly with respect to concealing speeding infringements from others to protect one’s image as a trustworthy and safe driver. The influence of legal factors was also evident. Legal sanctions do not appear to influence all drivers to the same extent. For instance, fear of apprehension appeared to play a role in reducing speeding for many, although previous experiences of detection and legal sanctions seemed to have had limited influence on reducing speeding among some drivers. Disregard for sanctions (e.g., driving while suspended), fraudulent demerit point use, and other strategies to avoid detection and punishment were widely and openly discussed. In Study 2, 833 drivers were recruited from roadside service stations in metropolitan and regional locations in Queensland. A quantitative research strategy assessed the relative contribution of personal, social, and legal factors to recent and future self-reported speeding (i.e., frequency of speeding and intentions to speed in the future). Multivariate analyses examining a range of factors drawn from SLT revealed that factors including self-identity (i.e., identifying as someone who speeds), favourable definitions (attitudes) towards speeding, personal experiences of avoiding detection and punishment for speeding, and perceptions of family and friends as accepting of speeding were all significantly associated with greater self-reported speeding. Study 3 was an exploratory, qualitative investigation of psychosocial factors associated with speeding among 35 Chinese drivers who were recruited from the membership of a motoring organisation and a university in Beijing. Six focus groups were conducted to explore similar issues to those examined in Study 1. The findings of Study 3 revealed many similarities with respect to the themes that arose in Australia. For example, there were similarities regarding personal justifications for speeding, such as the perception that posted limits are unreasonably low, the belief that individual drivers are able to determine safe travel speeds according to personal comfort with driving fast, and the belief that drivers possess adequate skills to control a vehicle at high speed. Strategies to avoid detection and punishment were also noted, though they appeared more widespread in China and also appeared, in some cases, to involve the use of a third party, a topic that was not reported by Australian drivers. Additionally, higher perceived enforcement tolerance thresholds were discussed by Chinese participants. Overall, the findings indicated perceptions of a high degree of community acceptance of speeding and a perceived lack of risk associated with speeds that were well above posted speed limits. Study 4 extended the exploratory research phase in China with a quantitative investigation involving 299 car drivers recruited from car washes in Beijing. Results revealed a relatively inexperienced sample with less than 5 years driving experience, on average. One third of participants perceived that the certainty of penalties when apprehended was low and a similar proportion of Chinese participants reported having previously avoided legal penalties when apprehended for speeding. Approximately half of the sample reported that legal penalties for speeding were ‘minimally to not at all’ severe. Multivariate analyses revealed that past experiences of avoiding detection and punishment for speeding, as well as favourable attitudes towards speeding, and perceptions of strong community acceptance of speeding were most strongly associated with greater self-reported speeding in the Chinese sample. Overall, the results of this research make several important theoretical contributions to the road safety literature. Akers’ social learning theory was found to be robust across cultural contexts with respect to speeding; similar amounts of variance were explained in self-reported speeding in the quantitative studies conducted in Australia and China. Historically, SLT was devised as a theory of deviance and posits that deviance and conformity are learned in the same way, with the balance of influence stemming from the ways in which behaviour is rewarded and punished (Akers, 1998). This perspective suggests that those who speed and those who do not are influenced by the same mechanisms. The inclusion of drivers from both ends of the ‘speeding spectrum’ in Study 1 provided an opportunity to examine the wider utility of SLT across the full range of the behaviour. One may question the use of a theory of deviance to investigate speeding, a behaviour that could, arguably, be described as socially acceptable and prevalent. However, SLT seemed particularly relevant to investigating speeding because of its inclusion of association, imitation, and reinforcement variables which reflect the breadth of factors already found to be potentially influential on driving speeds. In addition, driving is a learned behaviour requiring observation, guidance, and practice. Thus, the reinforcement and imitation concepts are particularly relevant to this behaviour. Finally, current speed management practices are largely enforcement-based and rely on the principles of behavioural reinforcement captured within the reinforcement component of SLT. Thus, the application of SLT to a behaviour such as speeding offers promise in advancing our understanding of the factors that influence speeding, as well as extending our knowledge of the application of SLT. Moreover, SLT could act as a valuable theoretical framework with which to examine other illegal driving behaviours that may not necessarily be seen as deviant by the community (e.g., mobile phone use while driving). This research also made unique contributions to advancing our understanding of the key components and the overall structure of Akers’ social learning theory. The broader SLT literature is lacking in terms of a thorough structural understanding of the component parts of the theory. For instance, debate exists regarding the relevance of, and necessity for including broader social influences in the model as captured by differential association. In the current research, two alternative SLT models were specified and tested in order to better understand the nature and extent of the influence of differential association on behaviour. Importantly, the results indicated that differential association was able to make a unique contribution to explaining self-reported speeding, thereby negating the call to exclude it from the model. The results also demonstrated that imitation was a discrete theoretical concept that should also be retained in the model. The results suggest a need to further explore and specify mechanisms of social influence in the SLT model. In addition, a novel approach was used to operationalise SLT variables by including concepts drawn from contemporary social psychological and deterrence-based research to enhance and extend the way that SLT variables have traditionally been examined. Differential reinforcement was conceptualised according to behavioural reinforcement principles (i.e., positive and negative reinforcement and punishment) and incorporated concepts of affective beliefs, anticipated regret, and deterrence-related concepts. Although implicit in descriptions of SLT, little research has, to date, made use of the broad range of reinforcement principles to understand the factors that encourage or inhibit behaviour. This approach has particular significance to road user behaviours in general because of the deterrence-based nature of many road safety countermeasures. The concept of self-identity was also included in the model and was found to be consistent with the definitions component of SLT. A final theoretical contribution was the specification and testing of a full measurement model prior to model testing using structural equation modelling. This process is recommended in order to reduce measurement error by providing an examination of the psychometric properties of the data prior to full model testing. Despite calls for such work for a number of decades, the current work appears to be the only example of a full measurement model of SLT. There were also a number of important practical implications that emerged from this program of research. Firstly, perceptions regarding speed enforcement tolerance thresholds were highlighted as a salient influence on driving speeds in both countries. The issue of enforcement tolerance levels generated considerable discussion among drivers in both countries, with Australian drivers reporting lower perceived tolerance levels than Chinese drivers. It was clear that many drivers used the concept of an enforcement tolerance in determining their driving speed, primarily with the desire to drive faster than the posted speed limit, yet remaining within a speed range that would preclude apprehension by police. The quantitative results from Studies 2 and 4 added support to these qualitative findings. Together, the findings supported previous research and suggested that a travel speed may not be seen as illegal until that speed reaches a level over the prescribed enforcement tolerance threshold. In other words, the enforcement tolerance appears to act as a ‘de facto’ speed limit, replacing the posted limit in the minds of some drivers. The findings from the two studies conducted in China (Studies 2 and 4) further highlighted the link between perceived enforcement tolerances and a ‘de facto’ speed limit. Drivers openly discussed driving at speeds that were well above posted speed limits and some participants noted their preference for driving at speeds close to ‘50% above’ the posted limit. This preference appeared to be shaped by the perception that the same penalty would be imposed if apprehended, irrespective of what speed they travelling (at least up to 50% above the limit). Further research is required to determine whether the perceptions of Chinese drivers are mainly influenced by the Law of the People’s Republic of China or by operational practices. Together, the findings from both studies in China indicate that there may be scope to refine enforcement tolerance levels, as has happened in other jurisdictions internationally over time, in order to reduce speeding. Any attempts to do so would likely be assisted by the provision of information about the legitimacy and purpose of speed limits as well as risk factors associated with speeding because these issues were raised by Chinese participants in the qualitative research phase. Another important practical implication of this research for speed management in China is the way in which penalties are determined. Chinese drivers described perceptions of unfairness and a lack of transparency in the enforcement system because they were unsure of the penalty that they would receive if apprehended. Steps to enhance the perceived certainty and consistency of the system to promote a more equitable approach to detection and punishment would appear to be welcomed by the general driving public and would be more consistent with the intended theoretical (deterrence) basis that underpins the current speed enforcement approach. The use of mandatory, fixed penalties may assist in this regard. In many countries, speeding attracts penalties that are dependent on the severity of the offence. In China, there may be safety benefits gained from the introduction of a similar graduated scale of speeding penalties and fixed penalties might also help to address the issue of uncertainty about penalties and related perceptions of unfairness. Such advancements would be in keeping with the principles of best practice for speed management as identified by the World Health Organisation. Another practical implication relating to legal penalties, and applicable to both cultural contexts, relates to the issues of detection and punishment avoidance. These two concepts appeared to strongly influence speeding in the current samples. In Australia, detection avoidance strategies reported by participants generally involved activities that are not illegal (e.g., site learning and remaining watchful for police vehicles). The results from China were similar, although a greater range of strategies were reported. The most common strategy reported in both countries for avoiding detection when speeding was site learning, or familiarisation with speed camera locations. However, a range of illegal practices were also described by Chinese drivers (e.g., tampering with or removing vehicle registration plates so as to render the vehicle unidentifiable on camera and use of in-vehicle radar detectors). With regard to avoiding punishment when apprehended, a range of strategies were reported by drivers from both countries, although a greater range of strategies were reported by Chinese drivers. As the results of the current research indicated that detection avoidance was strongly associated with greater self-reported speeding in both samples, efforts to reduce avoidance opportunities are strongly recommended. The practice of randomly scheduling speed camera locations, as is current practice in Queensland, offers one way to minimise site learning. The findings of this research indicated that this practice should continue. However, they also indicated that additional strategies are needed to reduce opportunities to evade detection. The use of point-to-point speed detection (also known as sectio

Implicit online learning

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online learning algorithms have recently risen to prominence due to their strong theoretical guarantees and an increasing number of practical applications for large-scale data analysis problems. In this paper, we analyze a class of online learning algorithms based on fixed potentials and nonlinearized losses, which yields algorithms with implicit update rules. We show how to efficiently compute these updates, and we prove regret bounds for the algorithms. We apply our formulation to several special cases where our approach has benefits over existing online learning methods. In particular, we provide improved algorithms and bounds for the online metric learning problem, and show improved robustness for online linear prediction problems. Results over a variety of data sets demonstrate the advantages of our framework.

«
1
2
3
»