47 resultados para quantile regression
Resumo:
Direct quantile regression involves estimating a given quantile of a response variable as a function of input variables. We present a new framework for direct quantile regression where a Gaussian process model is learned, minimising the expected tilted loss function. The integration required in learning is not analytically tractable so to speed up the learning we employ the Expectation Propagation algorithm. We describe how this work relates to other quantile regression methods and apply the method on both synthetic and real data sets. The method is shown to be competitive with state of the art methods whilst allowing for the leverage of the full Gaussian process probabilistic framework.
Resumo:
Using panel data for 52 developed and developing countries over the period 1998-2006, this article examines the links between information and communication technology diffusion and human development. We conducted a panel regression analysis of the investments per capita in healthcare, education and information and communication technology against human development index scores. Using a quantile regression approach, our findings suggest that changes in healthcare, education and information and communication technology provision have a stronger impact on human development index scores for less developed than for highly developed countries. Furthermore, at lower levels of development education fosters development directly and also indirectly through their enhanced effects on ICT. At higher levels of development education has only an indirect effect on development through the return to ICT.
Resumo:
This is the first study to provide comprehensive analyses of the relative performance of both socially responsible investment (SRI) and Islamic mutual funds. The analysis proceeds in two stages. In the first, the performance of the two categories of funds is measured using partial frontier methods. In the second stage, we use quantile regression techniques.By combining two variants of the Free Disposal Hull (FDH) methods (order-m and order-?) in the first stage of analysis and quantile regression in the second stage, we provide detailed analyses of the impact of different covariates across methods and across different quantiles. In spite of the differences in the screening criteria and portfolio management of both types of funds, variation in the performance is only found for some of the quantiles of the conditional distribution of mutual fund performance. We established that for the most inefficient funds the superior performance of SRI funds is significant. In contrast, for the best mutual funds this evidence vanished and even Islamic funds perform better than SRI.These results show the benefits of performing the analysis using quantile regression.
Resumo:
This is the first study to provide comprehensive analyses of the relative performance of both socially responsible investment (SRI) and Islamic mutual funds. The analysis proceeds in two stages. In the first, the performance of the two categories of funds is measured using partial frontier methods. In the second stage, we use quantile regression techniques. By combining two variants of the Free Disposal Hull (FDH) methods (order- m and order- α) in the first stage of analysis and quantile regression in the second stage, we provide detailed analyses of the impact of different covariates across methods and across different quantiles. In spite of the differences in the screening criteria and portfolio management of both types of funds, variation in the performance is only found for some of the quantiles of the conditional distribution of mutual fund performance. We established that for the most inefficient funds the superior performance of SRI funds is significant. In contrast, for the best mutual funds this evidence vanished and even Islamic funds perform better than SRI. These results show the benefits of performing the analysis using quantile regression. © 2013 Elsevier B.V.
The Long-Term impact of Business Support? - Exploring the Role of Evaluation Timing using Micro Data
Resumo:
The original contribution of this work is threefold. Firstly, this thesis develops a critical perspective on current evaluation practice of business support, with focus on the timing of evaluation. The general time frame applied for business support policy evaluation is limited to one to two, seldom three years post intervention. This is despite calls for long-term impact studies by various authors, concerned about time lags before effects are fully realised. This desire for long-term evaluation opposes the requirements by policy-makers and funders, seeking quick results. Also, current ‘best practice’ frameworks do not refer to timing or its implications, and data availability affects the ability to undertake long-term evaluation. Secondly, this thesis provides methodological value for follow-up and similar studies by using data linking of scheme-beneficiary data with official performance datasets. Thus data availability problems are avoided through the use of secondary data. Thirdly, this thesis builds the evidence, through the application of a longitudinal impact study of small business support in England, covering seven years of post intervention data. This illustrates the variability of results for different evaluation periods, and the value in using multiple years of data for a robust understanding of support impact. For survival, impact of assistance is found to be immediate, but limited. Concerning growth, significant impact centres on a two to three year period post intervention for the linear selection and quantile regression models – positive for employment and turnover, negative for productivity. Attribution of impact may present a problem for subsequent periods. The results clearly support the argument for the use of longitudinal data and analysis, and a greater appreciation by evaluators of the factor time. This analysis recommends a time frame of four to five years post intervention for soft business support evaluation.
The long-term impact of business support? - Exploring the role of evaluation timing using micro data
Resumo:
The original contribution of this work is threefold. Firstly, this thesis develops a critical perspective on current evaluation practice of business support, with focus on the timing of evaluation. The general time frame applied for business support policy evaluation is limited to one to two, seldom three years post intervention. This is despite calls for long-term impact studies by various authors, concerned about time lags before effects are fully realised. This desire for long-term evaluation opposes the requirements by policy-makers and funders, seeking quick results. Also, current ‘best practice’ frameworks do not refer to timing or its implications, and data availability affects the ability to undertake long-term evaluation. Secondly, this thesis provides methodological value for follow-up and similar studies by using data linking of scheme-beneficiary data with official performance datasets. Thus data availability problems are avoided through the use of secondary data. Thirdly, this thesis builds the evidence, through the application of a longitudinal impact study of small business support in England, covering seven years of post intervention data. This illustrates the variability of results for different evaluation periods, and the value in using multiple years of data for a robust understanding of support impact. For survival, impact of assistance is found to be immediate, but limited. Concerning growth, significant impact centres on a two to three year period post intervention for the linear selection and quantile regression models – positive for employment and turnover, negative for productivity. Attribution of impact may present a problem for subsequent periods. The results clearly support the argument for the use of longitudinal data and analysis, and a greater appreciation by evaluators of the factor time. This analysis recommends a time frame of four to five years post intervention for soft business support evaluation.
Resumo:
In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.
Resumo:
We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.
Resumo:
The Bayesian analysis of neural networks is difficult because the prior over functions has a complex form, leading to implementations that either make approximations or use Monte Carlo integration techniques. In this paper I investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis to be carried out exactly using matrix operations. The method has been tested on two challenging problems and has produced excellent results.
Resumo:
The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results.
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.