93 resultados para Limited dependent variable regression


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data-driven approaches such as Gaussian Process (GP) regression have been used extensively in recent robotics literature to achieve estimation by learning from experience. To ensure satisfactory performance, in most cases, multiple learning inputs are required. Intuitively, adding new inputs can often contribute to better estimation accuracy, however, it may come at the cost of a new sensor, larger training dataset and/or more complex learning, some- times for limited benefits. Therefore, it is crucial to have a systematic procedure to determine the actual impact each input has on the estimation performance. To address this issue, in this paper we propose to analyse the impact of each input on the estimate using a variance-based sensitivity analysis method. We propose an approach built on Analysis of Variance (ANOVA) decomposition, which can characterise how the prediction changes as one or more of the input changes, and also quantify the prediction uncertainty as attributed from each of the inputs in the framework of dependent inputs. We apply the proposed approach to a terrain-traversability estimation method we proposed in prior work, which is based on multi-task GP regression, and we validate this implementation experimentally using a rover on a Mars-analogue terrain.