965 resultados para Variable selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variable selection for regression is a classical statistical problem, motivated by concerns that too large a number of covariates may bring about overfitting and unnecessarily high measurement costs. Novel difficulties arise in streaming contexts, where the correlation structure of the process may be drifting, in which case it must be constantly tracked so that selections may be revised accordingly. A particularly interesting phenomenon is that non-selected covariates become missing variables, inducing bias on subsequent decisions. This raises an intricate exploration-exploitation tradeoff, whose dependence on the covariance tracking algorithm and the choice of variable selection scheme is too complex to be dealt with analytically. We hence capitalise on the strength of simulations to explore this problem, taking the opportunity to tackle the difficult task of simulating dynamic correlation structures. © 2008 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In chemistry for chemical analysis of a multi-component sample or quantitative structure-activity/property relationship (QSAR/QSPR) studies, variable selection is a key step. In this study, comparisons between different methods were performed. These methods include three classical methods such as forward selection, backward elimination and stepwise regression; orthogonal descriptors; leaps-and-bounds regression and genetic algorithm. Thirty-five nitrobenzenes were taken as the data set. From these structures quantum chemical parameters, topological indices and indicator variable were extracted as the descriptors for the comparisons of variable selections. The interesting results have been obtained. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Orthogonal descriptors is a viable method for variable selection, but this method strongly depend on the orthogonalisation ordering of the descriptors. In this paper, we compared the different methods used for order the descriptors. It showed that better results could be achieved with the use of backward elimination ordering. We predicted R-f value of phenol and aniline derivatives by this method, and compared it with classical algorithms such as forward selection, backward elimination, and stepwise procedure. Some interesting hints were obtained.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. © Institute of Mathematical Statistics, 2010.