971 resultados para Generalized extreme value distribution
Resumo:
Parametric VaR (Value-at-Risk) is widely used due to its simplicity and easy calculation. However, the normality assumption, often used in the estimation of the parametric VaR, does not provide satisfactory estimates for risk exposure. Therefore, this study suggests a method for computing the parametric VaR based on goodness-of-fit tests using the empirical distribution function (EDF) for extreme returns, and compares the feasibility of this method for the banking sector in an emerging market and in a developed one. The paper also discusses possible theoretical contributions in related fields like enterprise risk management (ERM). © 2013 Elsevier Ltd.
Resumo:
Ng and Kotz (1995) introduced a distribution that provides greater flexibility to extremes. We define and study a new class of distributions called the Kummer beta generalized family to extend the normal, Weibull, gamma and Gumbel distributions, among several other well-known distributions. Some special models are discussed. The ordinary moments of any distribution in the new family can be expressed as linear functions of probability weighted moments of the baseline distribution. We examine the asymptotic distributions of the extreme values. We derive the density function of the order statistics, mean absolute deviations and entropies. We use maximum likelihood estimation to fit the distributions in the new class and illustrate its potentiality with an application to a real data set.
Resumo:
For any continuous baseline G distribution [G. M. Cordeiro and M. de Castro, A new family of generalized distributions, J. Statist. Comput. Simul. 81 (2011), pp. 883-898], proposed a new generalized distribution (denoted here with the prefix 'Kw-G'(Kumaraswamy-G)) with two extra positive parameters. They studied some of its mathematical properties and presented special sub-models. We derive a simple representation for the Kw-Gdensity function as a linear combination of exponentiated-G distributions. Some new distributions are proposed as sub-models of this family, for example, the Kw-Chen [Z.A. Chen, A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function, Statist. Probab. Lett. 49 (2000), pp. 155-161], Kw-XTG [M. Xie, Y. Tang, and T.N. Goh, A modified Weibull extension with bathtub failure rate function, Reliab. Eng. System Safety 76 (2002), pp. 279-285] and Kw-Flexible Weibull [M. Bebbington, C. D. Lai, and R. Zitikis, A flexible Weibull extension, Reliab. Eng. System Safety 92 (2007), pp. 719-726]. New properties of the Kw-G distribution are derived which include asymptotes, shapes, moments, moment generating function, mean deviations, Bonferroni and Lorenz curves, reliability, Renyi entropy and Shannon entropy. New properties of the order statistics are investigated. We discuss the estimation of the parameters by maximum likelihood. We provide two applications to real data sets and discuss a bivariate extension of the Kw-G distribution.
Resumo:
The climatic conditions of mountain habitats are greatly influenced by topography. Large differences in microclimate occur with small changes in elevation, and this complex interaction is an important determinant of mountain plant distributions. In spite of this, elevation is not often considered as a relevant predictor in species distribution models (SDMs) for mountain plants. Here, we evaluated the importance of including elevation as a predictor in SDMs for mountain plant species. We generated two sets of SDMs for each of 73 plant species that occur in the Pacific Northwest of North America; one set of models included elevation as a predictor variable and the other set did not. AUC scores indicated that omitting elevation as a predictor resulted in a negligible reduction of model performance. However, further analysis revealed that the omission of elevation resulted in large over-predictions of species' niche breadths-this effect was most pronounced for species that occupy the highest elevations. In addition, the inclusion of elevation as a predictor constrained the effects of other predictors that superficially affected the outcome of the models generated without elevation. Our results demonstrate that the inclusion of elevation as a predictor variable improves the quality of SDMs for high-elevation plant species. Because of the negligible AUC score penalty for over-predicting niche breadth, our results support the notion that AUC scores alone should not be used as a measure of model quality. More generally, our results illustrate the importance of selecting biologically relevant predictor variables when constructing SDMs.
Resumo:
High-resolution quantitative diatom data are tabulated for the early part of the late Pliocene ( 3.25 to 2.08 Ma ) at DSDP Site 580 in the northwestern Pacific. Sample spacing averages 11 k.y. between 3.1 and 2.8 Ma, but increases to 14 to 19 k.y. prior to 3.1 Ma and after 2.8 Ma. Q-mode factor analysis of the middle Pliocene assemblage reveals four factors which explain 92.4% of the total variance of the 47 samples studied between 3.25 and 2.55 Ma. Three of the factors are closely related to modern subarctic, transitional, and subtropical elements, while the fourth factor, which is dominated by Coscinodiscus marginatus and the extinct Pliocene species Neodenticula kamtschatica, appears to correspond to a middle Pliocene precursor of the subarctic water mass. Knowledge of the modern and generalized Pliocene paleoclimatic relationships of various diatom taxa is used to generate a paleoclimate curve ("Twt") based on the ratio of warm-water (subtropical) to cold-water diatoms with warm-water transitional taxa (Thalassionema nitzschioides, Thalassiosira oestrupii, and Coscinodiscus radiatus) factored into the equation at an intermediate (0.5) value. The "Twt" ratios at more southerly DSDP Sites 579 and 578 are consistently higher (warmer) than those at Site 580 throughout the Pliocene, suggesting the validity of the ratio as a paleoclimatic index. Diatom paleoclimatic data reveal a middle Pliocene (3.1 to 3.0 Ma) warm interval at Site 580 during which paleotemperatures may have exceeded maximum Holocene values by 3 °- 5.5 °C at least three times. This middle Pliocene warm interval is also recognized by planktic foraminifers in the North Atlantic, and it appears to correspond with generalized depleted oxygen isotope values suggesting polar warming. The diatom "Twt" curve for Site 580 compares fairly well with radiolarian and silicoflagellate paleoclimatic curves for Site 580, planktic foraminiferal sea-surface temperature estimates for the North Atlantic, and benthic oxygen isotope curves for late Pliocene, although higher resolution studies on paired samples are required to test the correspondence of these various paleoclimatic indices.
Resumo:
This layer is a georeferenced raster image of the historic paper map entitled: Map of the city of New Orleans : showing proposed water distribution system, [by] Sewerage and Water Board New Orleans, LA.; Geo. G. Earl, genl. sup't. It was published by the Sewerage and Water Board New Orleans in 1902. Scale [ca. 1:50,900]. The image inside the map neatline is georeferenced to the surface of the earth and fit to the Louisiana State Plane Coordinate System, South NAD83 (in Feet) (Fipszone 1702). All map collar and inset information is also available as part of the raster image, including any inset maps, profiles, statistical tables, directories, text, illustrations, or other information associated with the principal map. This map shows water distribution features such as existing and proposed water mains (with sizes), suction pipes, and water purification station sites. Also shows other features such as roads, canals, levees, drainage, cemeteries, Parish boundaries, and more. Shaded to show built-up and unbuilt areas for construction. This layer is part of a selection of digitally scanned and georeferenced historic maps from The Harvard Map Collection as part of the Imaging the Urban Environment project. Maps selected for this project represent major urban areas and cities of the world, at various time periods. These maps typically portray both natural and manmade features at a large scale. The selection represents a range of regions, originators, ground condition dates, scales, and purposes.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
The statistical distribution, when determined from an incomplete set of constraints, is shown to be suitable as host for encrypted information. We design an encoding/decoding scheme to embed such a distribution with hidden information. The encryption security is based on the extreme instability of the encoding procedure. The essential feature of the proposed system lies in the fact that the key for retrieving the code is generated by random perturbations of very small value. The security of the proposed encryption relies on the security to interchange the secret key. Hence, it appears as a good complement to the quantum key distribution protocol. © 2005 Elsevier B.V. All rights reserved.
Resumo:
Fuzzy data envelopment analysis (DEA) models emerge as another class of DEA models to account for imprecise inputs and outputs for decision making units (DMUs). Although several approaches for solving fuzzy DEA models have been developed, there are some drawbacks, ranging from the inability to provide satisfactory discrimination power to simplistic numerical examples that handles only triangular fuzzy numbers or symmetrical fuzzy numbers. To address these drawbacks, this paper proposes using the concept of expected value in generalized DEA (GDEA) model. This allows the unification of three models - fuzzy expected CCR, fuzzy expected BCC, and fuzzy expected FDH models - and the ability of these models to handle both symmetrical and asymmetrical fuzzy numbers. We also explored the role of fuzzy GDEA model as a ranking method and compared it to existing super-efficiency evaluation models. Our proposed model is always feasible, while infeasibility problems remain in certain cases under existing super-efficiency models. In order to illustrate the performance of the proposed method, it is first tested using two established numerical examples and compared with the results obtained from alternative methods. A third example on energy dependency among 23 European Union (EU) member countries is further used to validate and describe the efficacy of our approach under asymmetric fuzzy numbers.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
In this study we propose the use of the performance measure distribution rather than its punctual value to rank hedge funds. Generalized Sharpe Ratio and other similar measures that take into account the higher-order moments of portfolio return distributions are commonly used to evaluate hedge funds performance. The literature in this field has reported non-significant difference in ranking between performance measures that take, and those that do not take, into account higher moments of distribution. Our approach provides a much more powerful manner to differentiate between hedge funds performance. We use a non-semiparametric density based on Gram-Charlier expansions to forecast the conditional distribution of hedge fund returns and its corresponding performance measure distribution. Through a forecasting exercise we show the advantages of our technique in relation to using the more traditional punctual performance measures.
Resumo:
Strong convective events can produce extreme precipitation, hail, lightning or gusts, potentially inducing severe socio-economic impacts. These events have a relatively small spatial extension and, in most cases, a short lifetime. In this study, a model is developed for estimating convective extreme events based on large scale conditions. It is shown that strong convective events can be characterized by a Weibull distribution of radar-based rainfall with a low shape and high scale parameter value. A radius of 90km around a station reporting a convective situation turned out to be suitable. A methodology is developed to estimate the Weibull parameters and thus the occurrence probability of convective events from large scale atmospheric instability and enhanced near-surface humidity, which are usually found on a larger scale than the convective event itself. Here, the probability for the occurrence of extreme convective events is estimated from the KO-index indicating the stability, and relative humidity at 1000hPa. Both variables are computed from ERA-Interim reanalysis. In a first version of the methodology, these two variables are applied to estimate the spatial rainfall distribution and to estimate the occurrence of a convective event. The developed method shows significant skill in estimating the occurrence of convective events as observed at synoptic stations, lightning measurements, and severe weather reports. In order to take frontal influences into account, a scheme for the detection of atmospheric fronts is implemented. While generally higher instability is found in the vicinity of fronts, the skill of this approach is largely unchanged. Additional improvements were achieved by a bias-correction and the use of ERA-Interim precipitation. The resulting estimation method is applied to the ERA-Interim period (1979-2014) to establish a ranking of estimated convective extreme events. Two strong estimated events that reveal a frontal influence are analysed in detail. As a second application, the method is applied to GCM-based decadal predictions in the period 1979-2014, which were initialized every year. It is shown that decadal predictive skill for convective event frequencies over Germany is found for the first 3-4 years after the initialization.