906 resultados para Scale Sensitive Loss Function


Relevância:

100.00% 100.00%

Publicador:

Resumo:

When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a study where the energy loss function of Ta2O5, initially derived in the optical limit for a limited region of excitation energies from reflection electron energy loss spectroscopy (REELS) measurements, was improved and extended to the whole momentum and energy excitation region through a suitable theoretical analysis using the Mermin dielectric function and requiring the fulfillment of physically motivated restrictions, such as the f- and KK-sum rules. The material stopping cross section (SCS) and energy-loss straggling measured for 300–2000 keV proton and 200–6000 keV helium ion beams by means of Rutherford backscattering spectrometry (RBS) were compared to the same quantities calculated in the dielectric framework, showing an excellent agreement, which is used to judge the reliability of the Ta2O5 energy loss function. Based on this assessment, we have also predicted the inelastic mean free path and the SCS of energetic electrons in Ta2O5.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efficiently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y=1|X) is unlikely to be close to certain critical values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multivariate volatility forecasts are an important input in many financial applications, in particular portfolio optimisation problems. Given the number of models available and the range of loss functions to discriminate between them, it is obvious that selecting the optimal forecasting model is challenging. The aim of this thesis is to thoroughly investigate how effective many commonly used statistical (MSE and QLIKE) and economic (portfolio variance and portfolio utility) loss functions are at discriminating between competing multivariate volatility forecasts. An analytical investigation of the loss functions is performed to determine whether they identify the correct forecast as the best forecast. This is followed by an extensive simulation study examines the ability of the loss functions to consistently rank forecasts, and their statistical power within tests of predictive ability. For the tests of predictive ability, the model confidence set (MCS) approach of Hansen, Lunde and Nason (2003, 2011) is employed. As well, an empirical study investigates whether simulation findings hold in a realistic setting. In light of these earlier studies, a major empirical study seeks to identify the set of superior multivariate volatility forecasting models from 43 models that use either daily squared returns or realised volatility to generate forecasts. This study also assesses how the choice of volatility proxy affects the ability of the statistical loss functions to discriminate between forecasts. Analysis of the loss functions shows that QLIKE, MSE and portfolio variance can discriminate between multivariate volatility forecasts, while portfolio utility cannot. An examination of the effective loss functions shows that they all can identify the correct forecast at a point in time, however, their ability to discriminate between competing forecasts does vary. That is, QLIKE is identified as the most effective loss function, followed by portfolio variance which is then followed by MSE. The major empirical analysis reports that the optimal set of multivariate volatility forecasting models includes forecasts generated from daily squared returns and realised volatility. Furthermore, it finds that the volatility proxy affects the statistical loss functions’ ability to discriminate between forecasts in tests of predictive ability. These findings deepen our understanding of how to choose between competing multivariate volatility forecasts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The determination of settlement of shallow foundations on cohesionless soil is an important task in geotechnical engineering. Available methods for the determination of settlement are not reliable. In this study, the support vector machine (SVM), a novel type of learning algorithm based on statistical theory, has been used to predict the settlement of shallow foundations on cohesionless soil. SVM uses a regression technique by introducing an ε – insensitive loss function. A thorough sensitive analysis has been made to ascertain which parameters are having maximum influence on settlement. The study shows that SVM has the potential to be a useful and practical tool for prediction of settlement of shallow foundation on cohesionless soil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We address the problem of detecting cells in biological images. The problem is important in many automated image analysis applications. We identify the problem as one of clustering and formulate it within the framework of robust estimation using loss functions. We show how suitable loss functions may be chosen based on a priori knowledge of the noise distribution. Specifically, in the context of biological images, since the measurement noise is not Gaussian, quadratic loss functions yield suboptimal results. We show that by incorporating the Huber loss function, cells can be detected robustly and accurately. To initialize the algorithm, we also propose a seed selection approach. Simulation results show that Huber loss exhibits better performance compared with some standard loss functions. We also provide experimental results on confocal images of yeast cells. The proposed technique exhibits good detection performance even when the signal-to-noise ratio is low.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider some problems of the calculus of variations on time scales. On the beginning our attention is paid on two inverse extremal problems on arbitrary time scales. Firstly, using the Euler-Lagrange equation and the strengthened Legendre condition, we derive a general form for a variation functional that attains a local minimum at a given point of the vector space. Furthermore, we prove a necessary condition for a dynamic integro-differential equation to be an Euler-Lagrange equation. New and interesting results for the discrete and quantum calculus are obtained as particular cases. Afterwards, we prove Euler-Lagrange type equations and transversality conditions for generalized infinite horizon problems. Next we investigate the composition of a certain scalar function with delta and nabla integrals of a vector valued field. Euler-Lagrange equations in integral form, transversality conditions, and necessary optimality conditions for isoperimetric problems, on an arbitrary time scale, are proved. In the end, two main issues of application of time scales in economic, with interesting results, are presented. In the former case we consider a firm that wants to program its production and investment policies to reach a given production rate and to maximize its future market competitiveness. The model which describes firm activities is studied in two different ways: using classical discretizations; and applying discrete versions of our result on time scales. In the end we compare the cost functional values obtained from those two approaches. The latter problem is more complex and relates to rate of inflation, p, and rate of unemployment, u, which inflict a social loss. Using known relations between p, u, and the expected rate of inflation π, we rewrite the social loss function as a function of π. We present this model in the time scale framework and find an optimal path π that minimizes the total social loss over a given time interval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The optimum quality that can be asymptotically achieved in the estimation of a probability p using inverse binomial sampling is addressed. A general definition of quality is used in terms of the risk associated with a loss function that satisfies certain assumptions. It is shown that the limit superior of the risk for p asymptotically small has a minimum over all (possibly randomized) estimators. This minimum is achieved by certain non-randomized estimators. The model includes commonly used quality criteria as particular cases. Applications to the non-asymptotic regime are discussed considering specific loss functions, for which minimax estimators are derived.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La familia de algoritmos de Boosting son un tipo de técnicas de clasificación y regresión que han demostrado ser muy eficaces en problemas de Visión Computacional. Tal es el caso de los problemas de detección, de seguimiento o bien de reconocimiento de caras, personas, objetos deformables y acciones. El primer y más popular algoritmo de Boosting, AdaBoost, fue concebido para problemas binarios. Desde entonces, muchas han sido las propuestas que han aparecido con objeto de trasladarlo a otros dominios más generales: multiclase, multilabel, con costes, etc. Nuestro interés se centra en extender AdaBoost al terreno de la clasificación multiclase, considerándolo como un primer paso para posteriores ampliaciones. En la presente tesis proponemos dos algoritmos de Boosting para problemas multiclase basados en nuevas derivaciones del concepto margen. El primero de ellos, PIBoost, está concebido para abordar el problema descomponiéndolo en subproblemas binarios. Por un lado, usamos una codificación vectorial para representar etiquetas y, por otro, utilizamos la función de pérdida exponencial multiclase para evaluar las respuestas. Esta codificación produce un conjunto de valores margen que conllevan un rango de penalizaciones en caso de fallo y recompensas en caso de acierto. La optimización iterativa del modelo genera un proceso de Boosting asimétrico cuyos costes dependen del número de etiquetas separadas por cada clasificador débil. De este modo nuestro algoritmo de Boosting tiene en cuenta el desbalanceo debido a las clases a la hora de construir el clasificador. El resultado es un método bien fundamentado que extiende de manera canónica al AdaBoost original. El segundo algoritmo propuesto, BAdaCost, está concebido para problemas multiclase dotados de una matriz de costes. Motivados por los escasos trabajos dedicados a generalizar AdaBoost al terreno multiclase con costes, hemos propuesto un nuevo concepto de margen que, a su vez, permite derivar una función de pérdida adecuada para evaluar costes. Consideramos nuestro algoritmo como la extensión más canónica de AdaBoost para este tipo de problemas, ya que generaliza a los algoritmos SAMME, Cost-Sensitive AdaBoost y PIBoost. Por otro lado, sugerimos un simple procedimiento para calcular matrices de coste adecuadas para mejorar el rendimiento de Boosting a la hora de abordar problemas estándar y problemas con datos desbalanceados. Una serie de experimentos nos sirven para demostrar la efectividad de ambos métodos frente a otros conocidos algoritmos de Boosting multiclase en sus respectivas áreas. En dichos experimentos se usan bases de datos de referencia en el área de Machine Learning, en primer lugar para minimizar errores y en segundo lugar para minimizar costes. Además, hemos podido aplicar BAdaCost con éxito a un proceso de segmentación, un caso particular de problema con datos desbalanceados. Concluimos justificando el horizonte de futuro que encierra el marco de trabajo que presentamos, tanto por su aplicabilidad como por su flexibilidad teórica. Abstract The family of Boosting algorithms represents a type of classification and regression approach that has shown to be very effective in Computer Vision problems. Such is the case of detection, tracking and recognition of faces, people, deformable objects and actions. The first and most popular algorithm, AdaBoost, was introduced in the context of binary classification. Since then, many works have been proposed to extend it to the more general multi-class, multi-label, costsensitive, etc... domains. Our interest is centered in extending AdaBoost to two problems in the multi-class field, considering it a first step for upcoming generalizations. In this dissertation we propose two Boosting algorithms for multi-class classification based on new generalizations of the concept of margin. The first of them, PIBoost, is conceived to tackle the multi-class problem by solving many binary sub-problems. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of penalties for failures and rewards for successes. The stagewise optimization of this model introduces an asymmetric Boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the Boosting procedure takes into account class imbalances when building the ensemble. The resulting algorithm is a well grounded method that canonically extends the original AdaBoost. The second algorithm proposed, BAdaCost, is conceived for multi-class problems endowed with a cost matrix. Motivated by the few cost-sensitive extensions of AdaBoost to the multi-class field, we propose a new margin that, in turn, yields a new loss function appropriate for evaluating costs. Since BAdaCost generalizes SAMME, Cost-Sensitive AdaBoost and PIBoost algorithms, we consider our algorithm as a canonical extension of AdaBoost to this kind of problems. We additionally suggest a simple procedure to compute cost matrices that improve the performance of Boosting in standard and unbalanced problems. A set of experiments is carried out to demonstrate the effectiveness of both methods against other relevant Boosting algorithms in their respective areas. In the experiments we resort to benchmark data sets used in the Machine Learning community, firstly for minimizing classification errors and secondly for minimizing costs. In addition, we successfully applied BAdaCost to a segmentation task, a particular problem in presence of imbalanced data. We conclude the thesis justifying the horizon of future improvements encompassed in our framework, due to its applicability and theoretical flexibility.