231 resultados para Quantile autoregression
Resumo:
For non-negative random variables with finite means we introduce an analogous of the equilibrium residual-lifetime distribution based on the quantile function. This allows us to construct new distributions with support (0, 1), and to obtain a new quantile-based version of the probabilistic generalization of Taylor's theorem. Similarly, for pairs of stochastically ordered random variables we come to a new quantile-based form of the probabilistic mean value theorem. The latter involves a distribution that generalizes the Lorenz curve. We investigate the special case of proportional quantile functions and apply the given results to various models based on classes of distributions and measures of risk theory. Motivated by some stochastic comparisons, we also introduce the “expected reversed proportional shortfall order”, and a new characterization of random lifetimes involving the reversed hazard rate function.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.
Resumo:
Direct quantile regression involves estimating a given quantile of a response variable as a function of input variables. We present a new framework for direct quantile regression where a Gaussian process model is learned, minimising the expected tilted loss function. The integration required in learning is not analytically tractable so to speed up the learning we employ the Expectation Propagation algorithm. We describe how this work relates to other quantile regression methods and apply the method on both synthetic and real data sets. The method is shown to be competitive with state of the art methods whilst allowing for the leverage of the full Gaussian process probabilistic framework.
Resumo:
Peer reviewed
Resumo:
Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.
Resumo:
Bahadur representation and its applications have attracted a large number of publications and presentations on a wide variety of problems. Mixing dependency is weak enough to describe the dependent structure of random variables, including observations in time series and longitudinal studies. This note proves the Bahadur representation of sample quantiles for strongly mixing random variables (including ½-mixing and Á-mixing) under very weak mixing coe±cients. As application, the asymptotic normality is derived. These results greatly improves those recently reported in literature.
Resumo:
Tourist accommodation expenditure is a widely investigated topic as it represents a major contribution to the total tourist expenditure. The identification of the determinant factors is commonly based on supply-driven applications while little research has been made on important travel characteristics. This paper proposes a demand-driven analysis of tourist accommodation price by focusing on data generated from room bookings. The investigation focuses on modeling the relationship between key travel characteristics and the price paid to book the accommodation. To accommodate the distributional characteristics of the expenditure variable, the analysis is based on the estimation of a quantile regression model. The findings support the econometric approach used and enable the elaboration of relevant managerial implications.
Resumo:
The phenomenon that married men earn higher average wages than unmarried men, the so-called marriage premium, is well known. However, the robustness of the marriage premium across the wage distribution and the underlying causes of the marriage premium deserve closer scrutiny. Focusing on the entire wage distribution and employing recently developed semi-nonparametric tests for quantile treatment effects, our findings cast doubt on the robustness of the premium. We find that the premium is explained by selection above the median, whereas a positive premium is obtained only at very low wages. We argue that the causal effect at low wages is probably attributable to employer discrimination.
Resumo:
The inquiries to return predictability are traditionally limited to conditional mean, while literature on portfolio selection is replete with moment-based analysis with up to the fourth moment being considered. This paper develops a distribution-based framework for both return prediction and portfolio selection. More specifically, a time-varying return distribution is modeled through quantile regressions and copulas, using quantile regressions to extract information in marginal distributions and copulas to capture dependence structure. A preference function which captures higher moments is proposed for portfolio selection. An empirical application highlights the additional information provided by the distributional approach which cannot be captured by the traditional moment-based methods.
Resumo:
Exposure to air pollution during pregnancy is a potential cause of adverse birth outcomes such as preterm birth and stillbirth. The risk of exposure may be greater during vulnerable windows of the pregnancy which might only be weeks long. We demonstrate a method to find these windows based on smoothing the risk of weekly exposure using conditional autoregression. We use incidence density sampling to match cases with adverse birth outcomes to controls whose gestation lasted at least as long as the case. This matching means that cases and controls are have equal length exposure periods, rather than comparing, for example, cases with short gestations to controls with longer gestations. We demonstrate the ability of the method to find vulnerable windows using two simulation studies. We illustrate the method by examining the association between particulate matter air pollution and stillbirth in Brisbane, Australia.
Resumo:
Indirect inference (II) is a methodology for estimating the parameters of an intractable (generative) model on the basis of an alternative parametric (auxiliary) model that is both analytically and computationally easier to deal with. Such an approach has been well explored in the classical literature but has received substantially less attention in the Bayesian paradigm. The purpose of this paper is to compare and contrast a collection of what we call parametric Bayesian indirect inference (pBII) methods. One class of pBII methods uses approximate Bayesian computation (referred to here as ABC II) where the summary statistic is formed on the basis of the auxiliary model, using ideas from II. Another approach proposed in the literature, referred to here as parametric Bayesian indirect likelihood (pBIL), we show to be a fundamentally different approach to ABC II. We devise new theoretical results for pBIL to give extra insights into its behaviour and also its differences with ABC II. Furthermore, we examine in more detail the assumptions required to use each pBII method. The results, insights and comparisons developed in this paper are illustrated on simple examples and two other substantive applications. The first of the substantive examples involves performing inference for complex quantile distributions based on simulated data while the second is for estimating the parameters of a trivariate stochastic process describing the evolution of macroparasites within a host based on real data. We create a novel framework called Bayesian indirect likelihood (BIL) which encompasses pBII as well as general ABC methods so that the connections between the methods can be established.
Resumo:
This thesis has contributed to the advancement of knowledge in disease modelling by addressing interesting and crucial issues relevant to modelling health data over space and time. The research has led to the increased understanding of spatial scales, temporal scales, and spatial smoothing for modelling diseases, in terms of their methodology and applications. This research is of particular significance to researchers seeking to employ statistical modelling techniques over space and time in various disciplines. A broad class of statistical models are employed to assess what impact of spatial and temporal scales have on simulated and real data.
Resumo:
Ecological studies are based on characteristics of groups of individuals, which are common in various disciplines including epidemiology. It is of great interest for epidemiologists to study the geographical variation of a disease by accounting for the positive spatial dependence between neighbouring areas. However, the choice of scale of the spatial correlation requires much attention. In view of a lack of studies in this area, this study aims to investigate the impact of differing definitions of geographical scales using a multilevel model. We propose a new approach -- the grid-based partitions and compare it with the popular census region approach. Unexplained geographical variation is accounted for via area-specific unstructured random effects and spatially structured random effects specified as an intrinsic conditional autoregressive process. Using grid-based modelling of random effects in contrast to the census region approach, we illustrate conditions where improvements are observed in the estimation of the linear predictor, random effects, parameters, and the identification of the distribution of residual risk and the aggregate risk in a study region. The study has found that grid-based modelling is a valuable approach for spatially sparse data while the SLA-based and grid-based approaches perform equally well for spatially dense data.