971 resultados para Bayesian Variable Selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops methods for Stochastic Search Variable Selection (currently popular with regression and Vector Autoregressive models) for Vector Error Correction models where there are many possible restrictions on the cointegration space. We show how this allows the researcher to begin with a single unrestricted model and either do model selection or model averaging in an automatic and computationally efficient manner. We apply our methods to a large UK macroeconomic model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops stochastic search variable selection (SSVS) for zero-inflated count models which are commonly used in health economics. This allows for either model averaging or model selection in situations with many potential regressors. The proposed techniques are applied to a data set from Germany considering the demand for health care. A package for the free statistical software environment R is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the Bayesian approach as the model selection criteria, the main purpose in this study is to establish a practical road accident model that can provide a better interpretation and prediction performance. For this purpose we are using a structural explanatory model with autoregressive error term. The model estimation is carried out through Bayesian inference and the best model is selected based on the goodness of fit measures. To cross validate the model estimation further prediction analysis were done. As the road safety measures the number of fatal accidents in Spain, during 2000-2011 were employed. The results of the variable selection process show that the factors explaining fatal road accidents are mainly exposure, economic factors, and surveillance and legislative measures. The model selection shows that the impact of economic factors on fatal accidents during the period under study has been higher compared to surveillance and legislative measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Causal inference methods - mainly path analysis and structural equation modeling - offer plant physiologists information about cause-and-effect relationships among plant traits. Recently, an unusual approach to causal inference through stepwise variable selection has been proposed and used in various works on plant physiology. The approach should not be considered correct from a biological point of view. Here, it is explained why stepwise variable selection should not be used for causal inference, and shown what strange conclusions can be drawn based upon the former analysis when one aims to interpret cause-and-effect relationships among plant traits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Copyright © 2013 Springer Netherlands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper is to predict time series of SO2 concentrations emitted by coal-fired power stations in order to estimate in advance emission episodes and analyze the influence of some meteorological variables in the prediction. An emission episode is said to occur when the series of bi-hourly means of SO2 is greater than a specific level. For coal-fired power stations it is essential to predict emission epi- sodes sufficiently in advance so appropriate preventive measures can be taken. We proposed a meth- odology to predict SO2 emission episodes based on using an additive model and an algorithm for variable selection. The methodology was applied to the estimation of SO2 emissions registered in sampling lo- cations near a coal-fired power station located in Northern Spain. The results obtained indicate a good performance of the model considering only two terms of the time series and that the inclusion of the meteorological variables in the model is not significant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers Bayesian variable selection in regressions with a large number of possibly highly correlated macroeconomic predictors. I show that by acknowledging the correlation structure in the predictors can improve forecasts over existing popular Bayesian variable selection algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An input variable selection procedure is introduced for the identification and construction of multi-input multi-output (MIMO) neurofuzzy operating point dependent models. The algorithm is an extension of a forward modified Gram-Schmidt orthogonal least squares procedure for a linear model structure which is modified to accommodate nonlinear system modeling by incorporating piecewise locally linear model fitting. The proposed input nodes selection procedure effectively tackles the problem of the curse of dimensionality associated with lattice-based modeling algorithms such as radial basis function neurofuzzy networks, enabling the resulting neurofuzzy operating point dependent model to be widely applied in control and estimation. Some numerical examples are given to demonstrate the effectiveness of the proposed construction algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is concerned with the use of a genetic algorithm to select financial ratios for corporate distress classification models. For this purpose, the fitness value associated to a set of ratios is made to reflect the requirements of maximizing the amount of information available for the model and minimizing the collinearity between the model inputs. A case study involving 60 failed and continuing British firms in the period 1997-2000 is used for illustration. The classification model based on ratios selected by the genetic algorithm compares favorably with a model employing ratios usually found in the financial distress literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este artigo apresenta uma aplicação do método para determinação espectrofotométrica simultânea dos íons divalentes de cobre, manganês e zinco à análise de medicamento polivitamínico/polimineral. O método usa 4-(2-piridilazo) resorcinol (PAR), calibração multivariada e técnicas de seleção de variáveis e foi otimizado o empregando-se o algoritmo das projeções sucessivas (APS) e o algoritmo genético (AG), para escolha dos comprimentos de onda mais informativos para a análise. Com essas técnicas, foi possível construir modelos de calibração por regressão linear múltipla (RLM-APS e RLM-AG). Os resultados obtidos foram comparados com modelos de regressão em componentes principais (PCR) e nos mínimos quadrados parciais (PLS). Demonstra-se a partir do erro médio quadrático de previsão (RMSEP) que os modelos apresentam desempenhos semelhantes ao prever as concentrações dos três analitos no medicamento. Todavia os modelos RLM são mais simples pois requerem um número muito menor de comprimentos de onda e são mais fáceis de interpretar que os baseados em variáveis latentes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.