737 resultados para risk minimization
em Queensland University of Technology - ePrints Archive
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion’s dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.
Resumo:
Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.
Resumo:
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than within a large one. Requiring this “strong margin adaptivity” makes the model selection problem more challenging. We first prove, in a general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.
Resumo:
Background There is evidence that certain mutations in the double-strand break repair pathway ataxia-telangiectasia mutated gene act in a dominant-negative manner to increase the risk of breast cancer. There are also some reports to suggest that the amino acid substitution variants T2119C Ser707Pro and C3161G Pro1054Arg may be associated with breast cancer risk. We investigate the breast cancer risk associated with these two nonconservative amino acid substitution variants using a large Australian population-based case–control study. Methods The polymorphisms were genotyped in more than 1300 cases and 600 controls using 5' exonuclease assays. Case–control analyses and genotype distributions were compared by logistic regression. Results The 2119C variant was rare, occurring at frequencies of 1.4 and 1.3% in cases and controls, respectively (P = 0.8). There was no difference in genotype distribution between cases and controls (P = 0.8), and the TC genotype was not associated with increased risk of breast cancer (adjusted odds ratio = 1.08, 95% confidence interval = 0.59–1.97, P = 0.8). Similarly, the 3161G variant was no more common in cases than in controls (2.9% versus 2.2%, P = 0.2), there was no difference in genotype distribution between cases and controls (P = 0.1), and the CG genotype was not associated with an increased risk of breast cancer (adjusted odds ratio = 1.30, 95% confidence interval = 0.85–1.98, P = 0.2). This lack of evidence for an association persisted within groups defined by the family history of breast cancer or by age. Conclusion The 2119C and 3161G amino acid substitution variants are not associated with moderate or high risks of breast cancer in Australian women.