26 resultados para outliers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robust regression in statistics leads to challenging optimization problems. Here, we study one such problem, in which the objective is non-smooth, non-convex and expensive to calculate. We study the numerical performance of several derivative-free optimization algorithms with the aim of computing robust multivariate estimators. Our experiences demonstrate that the existing algorithms often fail to deliver optimal solutions. We introduce three new methods that use Powell's derivative-free algorithm. The proposed methods are reliable and can be used when processing very large data sets containing outliers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the limitation of sparse representation based classification with group information for multi-pose face recognition. First, we observe that the key issue of such classification problem lies in the choice of the metric norm of the residual vectors, which represent the fitness of each class. Then we point out that limitation of the current sparse representation classification algorithms is the wrong choice of the ℓ2 norm, which does not match with data statistics as these residual values may be considerably non-Gaussian. We propose an explicit but effective solution using ℓp norm and explain theoretically and numerically why such metric norm would be able to suppress outliers and thus can significantly improve classification performance comparable to the state-of-arts algorithms on some challenging datasets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Density-based means have been recently proposed as a method for dealing with outliers in the stream processing of data. Derived from a weighted arithmetic mean with variable weights that depend on the location of all data samples, these functions are not monotonic and hence cannot be classified as aggregation functions. In this article we establish the weak monotonicity of this class of averaging functions and use this to establish robust generalisations of these means. Specifically, we find that as proposed, the density based means are only robust to isolated outliers. However, by using penalty based formalisms of averaging functions and applying more sophisticated and robust density estimators, we are able to define a broader family of density based means that are more effective at filtering both isolated and clustered outliers. © 2014 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Averaging behaviour of aggregation functions depends on the fundamental property of monotonicity with respect to all arguments. Unfortunately this is a limiting property that ensures that many important averaging functions are excluded from the theoretical framework. We propose a definition for weakly monotone averaging functions to encompass the averaging aggregation functions in a framework with many commonly used non-monotonic means. Weakly monotonic averages are robust to outliers and noise, making them extremely important in practical applications. We show that several robust estimators of location are actually weakly monotone and we provide sufficient conditions for weak monotonicity of the Lehmer and Gini means and some mixture functions. In particular we show that mixture functions with Gaussian kernels, which arise frequently in image and signal processing applications, are actually weakly monotonic averages. Our concept of weak monotonicity provides a sound theoretical and practical basis for understanding both monotone and non-monotone averaging functions within the same framework. This allows us to effectively relate these previously disparate areas of research and gain a deeper understanding of averaging aggregation methods. © Springer International Publishing Switzerland 2014.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Weak monotonicity was recently proposed as a relaxation of the monotonicity condition for averaging aggregation, and weakly monotone functions were shown to have desirable properties when averaging data corrupted with outliers or noise. We extended the study of weakly monotone averages by analyzing their ϕ-transforms, and we established weak monotonicity of several classes of averaging functions, in particular Gini means and mixture operators. Mixture operators with Gaussian weighting functions were shown to be weakly monotone for a broad range of their parameters. This study assists in identifying averaging functions suitable for data analysis and image processing tasks in the presence of outliers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Identifying the parameters of a model such that it best fits an observed set of data points is fundamental to the majority of problems in computer vision. This task is particularly demanding when portions of the data has been corrupted by gross outliers, measurements that are not explained by the assumed distributions. In this paper we present a novel method that uses the Least Quantile of Squares (LQS) estimator, a well known but computationally demanding high-breakdown estimator with several appealing theoretical properties. The proposed method is a meta-algorithm, based on the well established principles of proximal splitting, that allows for the use of LQS estimators while still retaining computational efficiency. Implementing the method is straight-forward as the majority of the resulting sub-problems can be solved using existing standard bundle-adjustment packages. Preliminary experiments on synthetic and real image data demonstrate the impressive practical performance of our method as compared to existing robust estimators used in computer vision.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Certain tasks in image processing require the preservation of fine image details, while applying a broad operation to the image, such as image reduction, filtering, or smoothing. In such cases, the objects of interest are typically represented by small, spatially cohesive clusters of pixels which are to be preserved or removed, depending on the requirements. When images are corrupted by the noise or contain intensity variations generated by imaging sensors, identification of these clusters within the intensity space is problematic as they are corrupted by outliers. This paper presents a novel approach to accounting for spatial organization of the pixels and to measuring the compactness of pixel clusters based on the construction of fuzzy measures with specific properties: monotonicity with respect to the cluster size; invariance with respect to translation, reflection, and rotation; and discrimination between pixel sets of fixed cardinality with different spatial arrangements. We present construction methods based on Sugeno-type fuzzy measures, minimum spanning trees, and fuzzy measure decomposition. We demonstrate their application to generating fuzzy measures on real and artificial images.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An interval type-2 fuzzy logic system is introduced for cancer diagnosis using mass spectrometry-based proteomic data. The fuzzy system is incorporated with a feature extraction procedure that combines wavelet transform and Wilcoxon ranking test. The proposed feature extraction generates feature sets that serve as inputs to the type-2 fuzzy classifier. Uncertainty, noise and outliers that are common in the proteomic data motivate the use of type-2 fuzzy system. Tabu search is applied for structure learning of the fuzzy classifier. Experiments are performed using two benchmark proteomic datasets for the prediction of ovarian and pancreatic cancer. The dominance of the suggested feature extraction as well as type-2 fuzzy classifier against their competing methods is showcased through experimental results. The proposed approach therefore is helpful to clinicians and practitioners as it can be implemented as a medical decision support system in practice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Averaging is ubiquitous in many sciences, engineering, and everyday practice. The notions of the arithmetic, geometric, and harmonic means developed by the ancient Greeks are in widespread use today. When thinking of an average, most people would use arithmetic mean, “the average”, or perhaps its weighted version in order to associate the inputs with the degrees of importance. While this is certainly the simplest and most intuitive averaging function, its use is often not warranted. For example, when averaging the interest rates, it is the geometric and not the arithmetic mean which is the right method. On the other hand, the arithmetic mean can also be biased for a few extreme inputs, and hence can convey false meaning. This is the reason why real estate markets report the median and not the average prices (which could be biased by one or a few outliers), and why judges’ marks in some Olympic sports are trimmed of the smallest and the largest values.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although the hyper-plane based One-Class Support Vector Machine (OCSVM) and the hyper-spherical based Support Vector Data Description (SVDD) algorithms have been shown to be very effective in detecting outliers, their performance on noisy and unlabeled training data has not been widely studied. Moreover, only a few heuristic approaches have been proposed to set the different parameters of these methods in an unsupervised manner. In this paper, we propose two unsupervised methods for estimating the optimal parameter settings to train OCSVM and SVDD models, based on analysing the structure of the data. We show that our heuristic is substantially faster than existing parameter estimation approaches while its accuracy is comparable with supervised parameter learning methods, such as grid-search with crossvalidation on labeled data. In addition, our proposed approaches can be used to prepare a labeled data set for a OCSVM or a SVDD from unlabeled data.