208 resultados para attribute selection
em Cambridge University Engineering Department Publications Database
Resumo:
Variable selection for regression is a classical statistical problem, motivated by concerns that too large a number of covariates may bring about overfitting and unnecessarily high measurement costs. Novel difficulties arise in streaming contexts, where the correlation structure of the process may be drifting, in which case it must be constantly tracked so that selections may be revised accordingly. A particularly interesting phenomenon is that non-selected covariates become missing variables, inducing bias on subsequent decisions. This raises an intricate exploration-exploitation tradeoff, whose dependence on the covariance tracking algorithm and the choice of variable selection scheme is too complex to be dealt with analytically. We hence capitalise on the strength of simulations to explore this problem, taking the opportunity to tackle the difficult task of simulating dynamic correlation structures. © 2008 IEEE.
Resumo:
Sensor networks can be naturally represented as graphical models, where the edge set encodes the presence of sparsity in the correlation structure between sensors. Such graphical representations can be valuable for information mining purposes as well as for optimizing bandwidth and battery usage with minimal loss of estimation accuracy. We use a computationally efficient technique for estimating sparse graphical models which fits a sparse linear regression locally at each node of the graph via the Lasso estimator. Using a recently suggested online, temporally adaptive implementation of the Lasso, we propose an algorithm for streaming graphical model selection over sensor networks. With battery consumption minimization applications in mind, we use this algorithm as the basis of an adaptive querying scheme. We discuss implementation issues in the context of environmental monitoring using sensor networks, where the objective is short-term forecasting of local wind direction. The algorithm is tested against real UK weather data and conclusions are drawn about certain tradeoffs inherent in decentralized sensor networks data analysis. © 2010 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Resumo:
We present a stochastic simulation technique for subset selection in time series models, based on the use of indicator variables with the Gibbs sampler within a hierarchical Bayesian framework. As an example, the method is applied to the selection of subset linear AR models, in which only significant lags are included. Joint sampling of the indicators and parameters is found to speed convergence. We discuss the possibility of model mixing where the model is not well determined by the data, and the extension of the approach to include non-linear model terms.
Modelling and simulation techniques for supporting healthcare decision making: a selection framework