82 resultados para Quantiles


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Revised: 2006-05.-- Published as an article in: Journal of Population Economics, 2007, vol. 18, issue 1, pp. 165-179.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited.We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms which exploit the proposed principle. Experiments on three large realworld data sets demonstrate that the proposed methods vastly outperform the existing alternatives.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extremal quantile index is a concept that the quantile index will drift to zero (or one)

as the sample size increases. The three chapters of my dissertation consists of three

applications of this concept in three distinct econometric problems. In Chapter 2, I

use the concept of extremal quantile index to derive new asymptotic properties and

inference method for quantile treatment effect estimators when the quantile index

of interest is close to zero. In Chapter 3, I rely on the concept of extremal quantile

index to achieve identification at infinity of the sample selection models and propose

a new inference method. Last, in Chapter 4, I use the concept of extremal quantile

index to define an asymptotic trimming scheme which can be used to control the

convergence rate of the estimator of the intercept of binary response models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ranking variables according to their relevance to predict an outcome is an important task in biomedicine. For instance, such ranking can be used for selecting a smaller number of genes to then apply other sophisticated experiments only on genes identified as important. A nonparametric method called Quor is designed to provide a confidence value for the order of arbitrary quantiles of different populations using independent samples. This confidence may provide insights about possible differences among groups and yields a ranking of importance for the variables. Computations are efficient and use exact distributions with no need for asymptotic considerations. Experiments with simulated data and with multiple real -omics data sets are performed and they show advantages and disadvantages of the method. Quor has no assumptions but independence of samples, thus it might be a better option when assumptions of other methods cannot be asserted. The software is publicly available on CRAN.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In quantitative risk analysis, the problem of estimating small threshold exceedance probabilities and extreme quantiles arise ubiquitously in bio-surveillance, economics, natural disaster insurance actuary, quality control schemes, etc. A useful way to make an assessment of extreme events is to estimate the probabilities of exceeding large threshold values and extreme quantiles judged by interested authorities. Such information regarding extremes serves as essential guidance to interested authorities in decision making processes. However, in such a context, data are usually skewed in nature, and the rarity of exceedance of large threshold implies large fluctuations in the distribution's upper tail, precisely where the accuracy is desired mostly. Extreme Value Theory (EVT) is a branch of statistics that characterizes the behavior of upper or lower tails of probability distributions. However, existing methods in EVT for the estimation of small threshold exceedance probabilities and extreme quantiles often lead to poor predictive performance in cases where the underlying sample is not large enough or does not contain values in the distribution's tail. In this dissertation, we shall be concerned with an out of sample semiparametric (SP) method for the estimation of small threshold probabilities and extreme quantiles. The proposed SP method for interval estimation calls for the fusion or integration of a given data sample with external computer generated independent samples. Since more data are used, real as well as artificial, under certain conditions the method produces relatively short yet reliable confidence intervals for small exceedance probabilities and extreme quantiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hot spot identification (HSID) aims to identify potential sites—roadway segments, intersections, crosswalks, interchanges, ramps, etc.—with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing with preponderance of zeros problem or right skewed dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Power calculation and sample size determination are critical in designing environmental monitoring programs. The traditional approach based on comparing the mean values may become statistically inappropriate and even invalid when substantial proportions of the response values are below the detection limits or censored because strong distributional assumptions have to be made on the censored observations when implementing the traditional procedures. In this paper, we propose a quantile methodology that is robust to outliers and can also handle data with a substantial proportion of below-detection-limit observations without the need of imputing the censored values. As a demonstration, we applied the methods to a nutrient monitoring project, which is a part of the Perth Long-Term Ocean Outlet Monitoring Program. In this example, the sample size required by our quantile methodology is, in fact, smaller than that by the traditional t-test, illustrating the merit of our method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modeling and forecasting of implied volatility (IV) is important to both practitioners and academics, especially in trading, pricing, hedging, and risk management activities, all of which require an accurate volatility. However, it has become challenging since the 1987 stock market crash, as implied volatilities (IVs) recovered from stock index options present two patterns: volatility smirk(skew) and volatility term-structure, if the two are examined at the same time, presents a rich implied volatility surface (IVS). This implies that the assumptions behind the Black-Scholes (1973) model do not hold empirically, as asset prices are mostly influenced by many underlying risk factors. This thesis, consists of four essays, is modeling and forecasting implied volatility in the presence of options markets’ empirical regularities. The first essay is modeling the dynamics IVS, it extends the Dumas, Fleming and Whaley (DFW) (1998) framework; for instance, using moneyness in the implied forward price and OTM put-call options on the FTSE100 index, a nonlinear optimization is used to estimate different models and thereby produce rich, smooth IVSs. Here, the constant-volatility model fails to explain the variations in the rich IVS. Next, it is found that three factors can explain about 69-88% of the variance in the IVS. Of this, on average, 56% is explained by the level factor, 15% by the term-structure factor, and the additional 7% by the jump-fear factor. The second essay proposes a quantile regression model for modeling contemporaneous asymmetric return-volatility relationship, which is the generalization of Hibbert et al. (2008) model. The results show strong negative asymmetric return-volatility relationship at various quantiles of IV distributions, it is monotonically increasing when moving from the median quantile to the uppermost quantile (i.e., 95%); therefore, OLS underestimates this relationship at upper quantiles. Additionally, the asymmetric relationship is more pronounced with the smirk (skew) adjusted volatility index measure in comparison to the old volatility index measure. Nonetheless, the volatility indices are ranked in terms of asymmetric volatility as follows: VIX, VSTOXX, VDAX, and VXN. The third essay examines the information content of the new-VDAX volatility index to forecast daily Value-at-Risk (VaR) estimates and compares its VaR forecasts with the forecasts of the Filtered Historical Simulation and RiskMetrics. All daily VaR models are then backtested from 1992-2009 using unconditional, independence, conditional coverage, and quadratic-score tests. It is found that the VDAX subsumes almost all information required for the volatility of daily VaR forecasts for a portfolio of the DAX30 index; implied-VaR models outperform all other VaR models. The fourth essay models the risk factors driving the swaption IVs. It is found that three factors can explain 94-97% of the variation in each of the EUR, USD, and GBP swaption IVs. There are significant linkages across factors, and bi-directional causality is at work between the factors implied by EUR and USD swaption IVs. Furthermore, the factors implied by EUR and USD IVs respond to each others’ shocks; however, surprisingly, GBP does not affect them. Second, the string market model calibration results show it can efficiently reproduce (or forecast) the volatility surface for each of the swaptions markets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Estimation of design quantiles of hydrometeorological variables at critical locations in river basins is necessary for hydrological applications. To arrive at reliable estimates for locations (sites) where no or limited records are available, various regional frequency analysis (RFA) procedures have been developed over the past five decades. The most widely used procedure is based on index-flood approach and L-moments. It assumes that values of scale and shape parameters of frequency distribution are identical across all the sites in a homogeneous region. In real-world scenario, this assumption may not be valid even if a region is statistically homogeneous. To address this issue, a novel mathematical approach is proposed. It involves (i) identification of an appropriate frequency distribution to fit the random variable being analyzed for homogeneous region, (ii) use of a proposed transformation mechanism to map observations of the variable from original space to a dimensionless space where the form of distribution does not change, and variation in values of its parameters is minimal across sites, (iii) construction of a growth curve in the dimensionless space, and (iv) mapping the curve to the original space for the target site by applying inverse transformation to arrive at required quantile(s) for the site. Effectiveness of the proposed approach (PA) in predicting quantiles for ungauged sites is demonstrated through Monte Carlo simulation experiments considering five frequency distributions that are widely used in RFA, and by case study on watersheds in conterminous United States. Results indicate that the PA outperforms methods based on index-flood approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Overland rain retrieval using spaceborne microwave radiometer offers a myriad of complications as land presents itself as a radiometrically warm and highly variable background. Hence, land rainfall algorithms of the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) have traditionally incorporated empirical relations of microwave brightness temperature (Tb) with rain rate, rather than relying on physically based radiative transfer modeling of rainfall (as implemented in the TMI ocean algorithm). In this paper, sensitivity analysis is conducted using the Spearman rank correlation coefficient as benchmark, to estimate the best combination of TMI low-frequency channels that are highly sensitive to the near surface rainfall rate from the TRMM Precipitation Radar (PR). Results indicate that the TMI channel combinations not only contain information about rainfall wherein liquid water drops are the dominant hydrometeors but also aid in surface noise reduction over a predominantly vegetative land surface background. Furthermore, the variations of rainfall signature in these channel combinations are not understood properly due to their inherent uncertainties and highly nonlinear relationship with rainfall. Copula theory is a powerful tool to characterize the dependence between complex hydrological variables as well as aid in uncertainty modeling by ensemble generation. Hence, this paper proposes a regional model using Archimedean copulas, to study the dependence of TMI channel combinations with respect to precipitation, over the land regions of Mahanadi basin, India, using version 7 orbital data from the passive and active sensors on board TRMM, namely, TMI and PR. Studies conducted for different rainfall regimes over the study area show the suitability of Clayton and Gumbel copulas for modeling convective and stratiform rainfall types for the majority of the intraseasonal months. Furthermore, large ensembles of TMI Tb (from the most sensitive TMI channel combination) were generated conditional on various quantiles (25th, 50th, 75th, and 95th) of the convective and the stratiform rainfall. Comparatively greater ambiguity was observed to model extreme values of the convective rain type. Finally, the efficiency of the proposed model was tested by comparing the results with traditionally employed linear and quadratic models. Results reveal the superior performance of the proposed copula-based technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Regionalization approaches are widely used in water resources engineering to identify hydrologically homogeneous groups of watersheds that are referred to as regions. Pooled information from sites (depicting watersheds) in a region forms the basis to estimate quantiles associated with hydrological extreme events at ungauged/sparsely gauged sites in the region. Conventional regionalization approaches can be effective when watersheds (data points) corresponding to different regions can be separated using straight lines or linear planes in the space of watershed related attributes. In this paper, a kernel-based Fuzzy c-means (KFCM) clustering approach is presented for use in situations where such linear separation of regions cannot be accomplished. The approach uses kernel-based functions to map the data points from the attribute space to a higher-dimensional space where they can be separated into regions by linear planes. A procedure to determine optimal number of regions with the KFCM approach is suggested. Further, formulations to estimate flood quantiles at ungauged sites with the approach are developed. Effectiveness of the approach is demonstrated through Monte-Carlo simulation experiments and a case study on watersheds in United States. Comparison of results with those based on conventional Fuzzy c-means clustering, Region-of-influence approach and a prior study indicate that KFCM approach outperforms the other approaches in forming regions that are closer to being statistically homogeneous and in estimating flood quantiles at ungauged sites. Key Points Kernel-based regionalization approach is presented for flood frequency analysis Kernel procedure to estimate flood quantiles at ungauged sites is developed A set of fuzzy regions is delineated in Ohio, USA