284 resultados para Outliers


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Os princípios e a metodologia estatística estão hoje em dia bastante consolidados. As questões práticas sobre a análise dos dados são, em vista das disponibilidades computacionais, cada vez mais importantes. É fundamental o estudo de valores discordantes, por exemplo em problemas de modelação estatística. Aqui tem particular relevo o uso de software “identificando” observações. Em sintonia com um básico estudo de teoria dos outliers todos este s problemas serão abordados.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Electricity spot prices have always been a demanding data set for time series analysis, mostly because of the non-storability of electricity. This feature, making electric power unlike the other commodities, causes outstanding price spikes. Moreover, the last several years in financial world seem to show that ’spiky’ behaviour of time series is no longer an exception, but rather a regular phenomenon. The purpose of this paper is to seek patterns and relations within electricity price outliers and verify how they affect the overall statistics of the data. For the study techniques like classical Box-Jenkins approach, series DFT smoothing and GARCH models are used. The results obtained for two geographically different price series show that patterns in outliers’ occurrence are not straightforward. Additionally, there seems to be no rule that would predict the appearance of a spike from volatility, while the reverse effect is quite prominent. It is concluded that spikes cannot be predicted based only on the price series; probably some geographical and meteorological variables need to be included in modeling.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis Entitled Bayesian inference in Exponential and pareto populations in the presence of outliers. The main theme of the present thesis is focussed on various estimation problems using the Bayesian appraoch, falling under the general category of accommodation procedures for analysing Pareto data containing outlier. In Chapter II. the problem of estimation of parameters in the classical Pareto distribution specified by the density function. In Chapter IV. we discuss the estimation of (1.19) when the sample contain a known number of outliers under three different data generating mechanisms, viz. the exchangeable model. Chapter V the prediction of a future observation based on a random sample that contains one contaminant. Chapter VI is devoted to the study of estimation problems concerning the exponential parameters under a k-outlier model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An ensemble forecast is a collection of runs of a numerical dynamical model, initialized with perturbed initial conditions. In modern weather prediction for example, ensembles are used to retrieve probabilistic information about future weather conditions. In this contribution, we are concerned with ensemble forecasts of a scalar quantity (say, the temperature at a specific location). We consider the event that the verification is smaller than the smallest, or larger than the largest ensemble member. We call these events outliers. If a K-member ensemble accurately reflected the variability of the verification, outliers should occur with a base rate of 2/(K + 1). In operational forecast ensembles though, this frequency is often found to be higher. We study the predictability of outliers and find that, exploiting information available from the ensemble, forecast probabilities for outlier events can be calculated which are more skilful than the unconditional base rate. We prove this analytically for statistically consistent forecast ensembles. Further, the analytical results are compared to the predictability of outliers in an operational forecast ensemble by means of model output statistics. We find the analytical and empirical results to agree both qualitatively and quantitatively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multibiometrics aims at improving biometric security in presence of spoofing attempts, but exposes a larger availability of points of attack. Standard fusion rules have been shown to be highly sensitive to spoofing attempts – even in case of a single fake instance only. This paper presents a novel spoofing-resistant fusion scheme proposing the detection and elimination of anomalous fusion input in an ensemble of evidence with liveness information. This approach aims at making multibiometric systems more resistant to presentation attacks by modeling the typical behaviour of human surveillance operators detecting anomalies as employed in many decision support systems. It is shown to improve security, while retaining the high accuracy level of standard fusion approaches on the latest Fingerprint Liveness Detection Competition (LivDet) 2013 dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este trabalho tem como objetivo propor um exame sistemático do chamado prêmio do risco soberano dos títulos emitidos pelo governo brasileiro que permita a categorização dos fatores que possam ser entendidos como geradores do conceito de risco soberano.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Outliers são observações que parecem ser inconsistentes com as demais. Também chamadas de valores atípicos, extremos ou aberrantes, estas inconsistências podem ser causadas por mudanças de política ou crises econômicas, ondas inesperadas de frio ou calor, erros de medida ou digitação, entre outras. Outliers não são necessariamente valores incorretos, mas, quando provenientes de erros de medida ou digitação, podem distorcer os resultados de uma análise e levar o pesquisador à conclusões equivocadas. O objetivo deste trabalho é estudar e comparar diferentes métodos para detecção de anormalidades em séries de preços do Índice de Preços ao Consumidor (IPC), calculado pelo Instituto Brasileiro de Economia (IBRE) da Fundação Getulio Vargas (FGV). O IPC mede a variação dos preços de um conjunto fixo de bens e serviços componentes de despesas habituais das famílias com nível de renda situado entre 1 e 33 salários mínimos mensais e é usado principalmente como um índice de referência para avaliação do poder de compra do consumidor. Além do método utilizado atualmente no IBRE pelos analistas de preços, os métodos considerados neste estudo são: variações do Método do IBRE, Método do Boxplot, Método do Boxplot SIQR, Método do Boxplot Ajustado, Método de Cercas Resistentes, Método do Quartil, do Quartil Modificado, Método do Desvio Mediano Absoluto e Algoritmo de Tukey. Tais métodos foram aplicados em dados pertencentes aos municípios Rio de Janeiro e São Paulo. Para que se possa analisar o desempenho de cada método, é necessário conhecer os verdadeiros valores extremos antecipadamente. Portanto, neste trabalho, tal análise foi feita assumindo que os preços descartados ou alterados pelos analistas no processo de crítica são os verdadeiros outliers. O Método do IBRE é bastante correlacionado com os preços alterados ou descartados pelos analistas. Sendo assim, a suposição de que os preços alterados ou descartados pelos analistas são os verdadeiros valores extremos pode influenciar os resultados, fazendo com que o mesmo seja favorecido em comparação com os demais métodos. No entanto, desta forma, é possível computar duas medidas através das quais os métodos são avaliados. A primeira é a porcentagem de acerto do método, que informa a proporção de verdadeiros outliers detectados. A segunda é o número de falsos positivos produzidos pelo método, que informa quantos valores precisaram ser sinalizados para um verdadeiro outlier ser detectado. Quanto maior for a proporção de acerto gerada pelo método e menor for a quantidade de falsos positivos produzidos pelo mesmo, melhor é o desempenho do método. Sendo assim, foi possível construir um ranking referente ao desempenho dos métodos, identificando o melhor dentre os analisados. Para o município do Rio de Janeiro, algumas das variações do Método do IBRE apresentaram desempenhos iguais ou superiores ao do método original. Já para o município de São Paulo, o Método do IBRE apresentou o melhor desempenho. Em trabalhos futuros, espera-se testar os métodos em dados obtidos por simulação ou que constituam bases largamente utilizadas na literatura, de forma que a suposição de que os preços descartados ou alterados pelos analistas no processo de crítica são os verdadeiros outliers não interfira nos resultados.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EN]In this paper the authors show that techniques employed in the prediction of chaotic time series" can also be applied to detection of outliers. A definition of outlier" lS provided and a theorem on hypothesis testing is also proved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct and an incorrect decision. This task is complicated by the large sizes of typical databases. Due to their importance in search processes in large volumes of data, researchers pay special attention to the development of efficient outlier detection techniques. This article presents a computationally efficient algorithm for the detection of outliers in large volumes of information. This proposal is based on an extension of the mathematical framework upon which the basic theory of detection of outliers, founded on Rough Set Theory, has been constructed. From this starting point, current problems are analyzed; a detection method is proposed, along with a computational algorithm that allows the performance of outlier detection tasks with an almost-linear complexity. To illustrate its viability, the results of the application of the outlier-detection algorithm to the concrete example of a large database are presented.