849 resultados para Quantile estimation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We provide an incremental quantile estimator for Non-stationary Streaming Data. We propose a method for simultaneous estimation of multiple quantiles corresponding to the given probability levels from streaming data. Due to the limitations of the memory, it is not feasible to compute the quantiles by storing the data. So estimating the quantiles as the data pass by is the only possibility. This can be effective in network measurement. To provide the minimum of the mean-squared error of the estimation, we use parabolic approximation and for comparison we simulate the results for different number of runs and using both linear and parabolic approximations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Regional flood frequency techniques are commonly used to estimate flood quantiles when flood data is unavailable or the record length at an individual gauging station is insufficient for reliable analyses. These methods compensate for limited or unavailable data by pooling data from nearby gauged sites. This requires the delineation of hydrologically homogeneous regions in which the flood regime is sufficiently similar to allow the spatial transfer of information. It is generally accepted that hydrologic similarity results from similar physiographic characteristics, and thus these characteristics can be used to delineate regions and classify ungauged sites. However, as currently practiced, the delineation is highly subjective and dependent on the similarity measures and classification techniques employed. A standardized procedure for delineation of hydrologically homogeneous regions is presented herein. Key aspects are a new statistical metric to identify physically discordant sites, and the identification of an appropriate set of physically based measures of extreme hydrological similarity. A combination of multivariate statistical techniques applied to multiple flood statistics and basin characteristics for gauging stations in the Southeastern U.S. revealed that basin slope, elevation, and soil drainage largely determine the extreme hydrological behavior of a watershed. Use of these characteristics as similarity measures in the standardized approach for region delineation yields regions which are more homogeneous and more efficient for quantile estimation at ungauged sites than those delineated using alternative physically-based procedures typically employed in practice. The proposed methods and key physical characteristics are also shown to be efficient for region delineation and quantile development in alternative areas composed of watersheds with statistically different physical composition. In addition, the use of aggregated values of key watershed characteristics was found to be sufficient for the regionalization of flood data; the added time and computational effort required to derive spatially distributed watershed variables does not increase the accuracy of quantile estimators for ungauged sites. This dissertation also presents a methodology by which flood quantile estimates in Haiti can be derived using relationships developed for data rich regions of the U.S. As currently practiced, regional flood frequency techniques can only be applied within the predefined area used for model development. However, results presented herein demonstrate that the regional flood distribution can successfully be extrapolated to areas of similar physical composition located beyond the extent of that used for model development provided differences in precipitation are accounted for and the site in question can be appropriately classified within a delineated region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prediction at ungauged sites is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. Regression models relate physiographic and climatic basin characteristics to flood quantiles, which can be estimated from observed data at gauged sites. However, these models assume linear relationships between variables Prediction intervals are estimated by the variance of the residuals in the estimated model. Furthermore, the effect of the uncertainties in the explanatory variables on the dependent variable cannot be assessed. This paper presents a methodology to propagate the uncertainties that arise in the process of predicting flood quantiles at ungauged basins by a regression model. In addition, Bayesian networks were explored as a feasible tool for predicting flood quantiles at ungauged sites. Bayesian networks benefit from taking into account uncertainties thanks to their probabilistic nature. They are able to capture non-linear relationships between variables and they give a probability distribution of discharges as result. The methodology was applied to a case study in the Tagus basin in Spain.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the main problems of flood hazard assessment in ungauged or poorly gauged basins is the lack of runoff data. In an attempt to overcome this problem we have combined archival records, dendrogeomorphic time series and instrumental data (daily rainfall and discharge) from four ungauged and poorly gauged mountain basins in Central Spain with the aim of reconstructing and compiling information on 41 flash flood events since the end of the 19th century. Estimation of historical discharge and the incorporation of uncertainty for the at-site and regional flood frequency analysis were performed with an empirical rainfall–runoff assessment as well as stochastic and Bayesian Markov Chain Monte Carlo (MCMC) approaches. Results for each of the ungauged basins include flood frequency, severity, seasonality and triggers (synoptic meteorological situations). The reconstructed data series clearly demonstrates how uncertainty can be reduced by including historical information, but also points to the considerable influence of different approaches on quantile estimation. This uncertainty should be taken into account when these data are used for flood risk management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The L-moments based index-flood procedure had been successfully applied for Regional Flood Frequency Analysis (RFFA) for the Island of Newfoundland in 2002 using data up to 1998. This thesis, however, considered both Labrador and the Island of Newfoundland using the L-Moments index-flood method with flood data up to 2013. For Labrador, the homogeneity test showed that Labrador can be treated as a single homogeneous region and the generalized extreme value (GEV) was found to be more robust than any other frequency distributions. The drainage area (DA) is the only significant variable for estimating the index-flood at ungauged sites in Labrador. In previous studies, the Island of Newfoundland has been considered as four homogeneous regions (A,B,C and D) as well as two Water Survey of Canada's Y and Z sub-regions. Homogeneous regions based on Y and Z was found to provide more accurate quantile estimates than those based on four homogeneous regions. Goodness-of-fit test results showed that the generalized extreme value (GEV) distribution is most suitable for the sub-regions; however, the three-parameter lognormal (LN3) gave a better performance in terms of robustness. The best fitting regional frequency distribution from 2002 has now been updated with the latest flood data, but quantile estimates with the new data were not very different from the previous study. Overall, in terms of quantile estimation, in both Labrador and the Island of Newfoundland, the index-flood procedure based on L-moments is highly recommended as it provided consistent and more accurate result than other techniques such as the regression on quantile technique that is currently used by the government.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents calculations of semiparametric efficiency bounds for quantile treatment effects parameters when se1ection to treatment is based on observable characteristics. The paper also presents three estimation procedures forthese parameters, alI ofwhich have two steps: a nonparametric estimation and a computation ofthe difference between the solutions of two distinct minimization problems. Root-N consistency, asymptotic normality, and the achievement ofthe semiparametric efficiency bound is shown for one ofthe three estimators. In the final part ofthe paper, an empirical application to a job training program reveals the importance of heterogeneous treatment effects, showing that for this program the effects are concentrated in the upper quantiles ofthe earnings distribution.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A Work Project, presented as part of the requirements for the Award of a Masters Degree in Finance from the NOVA – School of Business and Economics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper shows how recently developed regression-based methods for thedecomposition of health inequality can be extended to incorporateindividual heterogeneity in the responses of health to the explanatoryvariables. We illustrate our method with an application to the CanadianNPHS of 1994. Our strategy for the estimation of heterogeneous responsesis based on the quantile regression model. The results suggest that thereis an important degree of heterogeneity in the association of health toexplanatory variables which, in turn, accounts for a substantial percentageof inequality in observed health. A particularly interesting finding isthat the marginal response of health to income is zero for healthyindividuals but positive and significant for unhealthy individuals. Theheterogeneity in the income response reduces both overall health inequalityand income related health inequality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Testing weather or not data belongs could been generated by a family of extreme value copulas is difficult. We generalize a test and we prove that it can be applied whatever the alternative hypothesis. We also study the effect of using different extreme value copulas in the context of risk estimation. To measure the risk we use a quantile. Our results have motivated by a bivariate sample of losses from a real database of auto insurance claims. Methods are implemented in R.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Department of Statistics, Cochin University of Science and Technology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The estimation of labor supply elasticities has been an important issue m the economic literature. Yet all works have estimated conditional mean labor supply functions only. The objective of this paper is to obtain more information on labor supply, by estimating the conditional quantile labor supply function. vI/e use a sample of prime age urban males employees in Brazil. Two stage estimators are used as the net wage and virtual income are found to be endogenous to the model. Contrary to previous works using conditional mean estimators, it is found that labor supply elasticities vary significantly and asymmetrically across hours of work. vVhile the income and wage elasticities at the standard work week are zero, for those working longer hours the elasticities are negative.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06