978 resultados para VARIABLE SELECTION


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper is motivated by the recent interest in the use of Bayesian VARs for forecasting, even in cases where the number of dependent variables is large. In such cases, factor methods have been traditionally used but recent work using a particular prior suggests that Bayesian VAR methods can forecast better. In this paper, we consider a range of alternative priors which have been used with small VARs, discuss the issues which arise when they are used with medium and large VARs and examine their forecast performance using a US macroeconomic data set containing 168 variables. We nd that Bayesian VARs do tend to forecast better than factor methods and provide an extensive comparison of the strengths and weaknesses of various approaches. Our empirical results show the importance of using forecast metrics which use the entire predictive density, instead of using only point forecasts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers Bayesian variable selection in regressions with a large number of possibly highly correlated macroeconomic predictors. I show that by acknowledging the correlation structure in the predictors can improve forecasts over existing popular Bayesian variable selection algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop methods for Bayesian model averaging (BMA) or selection (BMS) in Panel Vector Autoregressions (PVARs). Our approach allows us to select between or average over all possible combinations of restricted PVARs where the restrictions involve interdependencies between and heterogeneities across cross-sectional units. The resulting BMA framework can find a parsimonious PVAR specification, thus dealing with overparameterization concerns. We use these methods in an application involving the euro area sovereign debt crisis and show that our methods perform better than alternatives. Our findings contradict a simple view of the sovereign debt crisis which divides the euro zone into groups of core and peripheral countries and worries about financial contagion within the latter group.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop methods for Bayesian model averaging (BMA) or selection (BMS) in Panel Vector Autoregressions (PVARs). Our approach allows us to select between or average over all possible combinations of restricted PVARs where the restrictions involve interdependencies between and heterogeneities across cross-sectional units. The resulting BMA framework can find a parsimonious PVAR specification, thus dealing with overparameterization concerns. We use these methods in an application involving the euro area sovereign debt crisis and show that our methods perform better than alternatives. Our findings contradict a simple view of the sovereign debt crisis which divides the euro zone into groups of core and peripheral countries and worries about financial contagion within the latter group.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Vector Autoregressive Moving Average (VARMA) models have many theoretical properties which should make them popular among empirical macroeconomists. However, they are rarely used in practice due to over-parameterization concerns, difficulties in ensuring identification and computational challenges. With the growing interest in multivariate time series models of high dimension, these problems with VARMAs become even more acute, accounting for the dominance of VARs in this field. In this paper, we develop a Bayesian approach for inference in VARMAs which surmounts these problems. It jointly ensures identification and parsimony in the context of an efficient Markov chain Monte Carlo (MCMC) algorithm. We use this approach in a macroeconomic application involving up to twelve dependent variables. We find our algorithm to work successfully and provide insights beyond those provided by VARs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this work is to evaluate the capabilities and limitations of chemometric methods and other mathematical treatments applied on spectroscopic data and more specifically on paint samples. The uniqueness of the spectroscopic data comes from the fact that they are multivariate - a few thousands variables - and highly correlated. Statistical methods are used to study and discriminate samples. A collection of 34 red paint samples was measured by Infrared and Raman spectroscopy. Data pretreatment and variable selection demonstrated that the use of Standard Normal Variate (SNV), together with removal of the noisy variables by a selection of the wavelengths from 650 to 1830 cm−1 and 2730-3600 cm−1, provided the optimal results for infrared analysis. Principal component analysis (PCA) and hierarchical clusters analysis (HCA) were then used as exploratory techniques to provide evidence of structure in the data, cluster, or detect outliers. With the FTIR spectra, the Principal Components (PCs) correspond to binder types and the presence/absence of calcium carbonate. 83% of the total variance is explained by the four first PCs. As for the Raman spectra, we observe six different clusters corresponding to the different pigment compositions when plotting the first two PCs, which account for 37% and 20% respectively of the total variance. In conclusion, the use of chemometrics for the forensic analysis of paints provides a valuable tool for objective decision-making, a reduction of the possible classification errors, and a better efficiency, having robust results with time saving data treatments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES: The aim of the study was to assess whether prospective follow-up data within the Swiss HIV Cohort Study can be used to predict patients who stop smoking; or among smokers who stop, those who start smoking again. METHODS: We built prediction models first using clinical reasoning ('clinical models') and then by selecting from numerous candidate predictors using advanced statistical methods ('statistical models'). Our clinical models were based on literature that suggests that motivation drives smoking cessation, while dependence drives relapse in those attempting to stop. Our statistical models were based on automatic variable selection using additive logistic regression with component-wise gradient boosting. RESULTS: Of 4833 smokers, 26% stopped smoking, at least temporarily; because among those who stopped, 48% started smoking again. The predictive performance of our clinical and statistical models was modest. A basic clinical model for cessation, with patients classified into three motivational groups, was nearly as discriminatory as a constrained statistical model with just the most important predictors (the ratio of nonsmoking visits to total visits, alcohol or drug dependence, psychiatric comorbidities, recent hospitalization and age). A basic clinical model for relapse, based on the maximum number of cigarettes per day prior to stopping, was not as discriminatory as a constrained statistical model with just the ratio of nonsmoking visits to total visits. CONCLUSIONS: Predicting smoking cessation and relapse is difficult, so that simple models are nearly as discriminatory as complex ones. Patients with a history of attempting to stop and those known to have stopped recently are the best candidates for an intervention.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: A central question for understanding the evolutionary responses of plant species to rapidly changing environments is the assessment of their potential for short-term (in one or a few generations) genetic change. In our study, we consider the case of Pinus pinaster Aiton (maritime pine), a widespread Mediterranean tree, and (i) test, under different experimental conditions (growth chamber and semi-natural), whether higher recruitment in the wild from the most successful mothers is due to better performance of their offspring; and (ii) evaluate genetic change in quantitative traits across generations at two different life stages (mature trees and seedlings) that are known to be under strong selection pressure in forest trees. RESULTS: Genetic control was high for most traits (h2 = 0.137-0.876) under the milder conditions of the growth chamber, but only for ontogenetic change (0.276), total height (0.415) and survival (0.719) under the more stressful semi-natural conditions. Significant phenotypic selection gradients were found in mature trees for traits related to seed quality (germination rate and number of empty seeds). Moreover, female relative reproductive success was significantly correlated with offspring performance for specific leaf area (SLA) in the growth chamber experiment, and stem mass fraction (SMF) in the experiment under semi-natural conditions, two adaptive traits related to abiotic stress-response in pines. Selection gradients based on genetic covariance of seedling traits and responses to selection at this stage involved traits related to biomass allocation (SMF) and growth (as decomposed by a Gompertz model) or delayed ontogenetic change, depending also on the testing environment. CONCLUSIONS: Despite the evidence of microevolutionary change in adaptive traits in maritime pine, directional or disruptive changes are difficult to predict due to variable selection at different life stages and environments. At mature-tree stages, higher female effective reproductive success can be explained by differences in their production of offspring (due to seed quality) and, to a lesser extent, by seemingly better adapted seedlings. Selection gradients and responses to selection for seedlings also differed across experimental conditions. The distinct processes involved at the two life stages (mature trees or seedlings) together with environment-specific responses advice caution when predicting likely evolutionary responses to environmental change in Mediterranean forest trees.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Geographical information systems (GIS) are tools that have been recently tested for improving our understanding of the spatial distribution of disease. The objective of this paper was to further develop the GIS technology to model and control schistosomiasis using environmental, social, biological and remote-sensing variables. A final regression model (R² = 0.39) was established, after a variable selection phase, with a set of spatial variables including the presence or absence of Biomphalaria glabrata, winter enhanced vegetation index, summer minimum temperature and percentage of houses with water coming from a spring or well. A regional model was also developed by splitting the state of Minas Gerais (MG) into four regions and establishing a linear regression model for each of the four regions: 1 (R² = 0.97), 2 (R² = 0.60), 3 (R² = 0.63) and 4 (R² = 0.76). Based on these models, a schistosomiasis risk map was built for MG. In this paper, geostatistics was also used to make inferences about the presence of Biomphalaria spp. The result was a map of species and risk areas. The obtained risk map permits the association of uncertainties, which can be used to qualify the inferences and it can be thought of as an auxiliary tool for public health strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En los últimos 30 años la proliferación de modelos cuantitativos de predicción de la insolvencia empresarial en la literatura contable y financiera ha despertado un gran interés entre los especialistas e investigadores de lamateria. Lo que en un principio fueron unos modelos elaborados con un único objetivo, han derivado en una fuente de investigación constante.En este documento se formula un modelo de predicción de la insolvencia a través de la combinación de diferentes variables cuantitativas extraídas de los estados contables de una muestra de empresas para los años 1994-1997. A través de un procedimiento por etapas se selecciona e interpreta cuáles son las más relevantes en cuanto a aportación de información.Una vez formulado este primer tipo de modelos se busca una alternativa a las variables anteriores a través de la técnica factorial del análisis de componentes principales. Con ella se hace una selección de variables y se aplica, junto conlos ratios anteriores, el análisis univariante. Por último, se comparan los modelos obtenidos y se concluye que aunque la literatura previa ofrece mejores porcentajes de clasificación, los modelos obtenidos a través del análisis decomponentes principales no deben ser rechazados por la claridad en la explicación de las causas que conducen a una empresa a la insolvencia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En los últimos 30 años la proliferación de modelos cuantitativos de predicción de la insolvencia empresarial en la literatura contable y financiera ha despertado un gran interés entre los especialistas e investigadores de lamateria. Lo que en un principio fueron unos modelos elaborados con un único objetivo, han derivado en una fuente de investigación constante.En este documento se formula un modelo de predicción de la insolvencia a través de la combinación de diferentes variables cuantitativas extraídas de los estados contables de una muestra de empresas para los años 1994-1997. A través de un procedimiento por etapas se selecciona e interpreta cuáles son las más relevantes en cuanto a aportación de información.Una vez formulado este primer tipo de modelos se busca una alternativa a las variables anteriores a través de la técnica factorial del análisis de componentes principales. Con ella se hace una selección de variables y se aplica, junto conlos ratios anteriores, el análisis univariante. Por último, se comparan los modelos obtenidos y se concluye que aunque la literatura previa ofrece mejores porcentajes de clasificación, los modelos obtenidos a través del análisis decomponentes principales no deben ser rechazados por la claridad en la explicación de las causas que conducen a una empresa a la insolvencia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Species distribution modelling is central to both fundamental and applied research in biogeography. Despite widespread use of models, there are still important conceptual ambiguities as well as biotic and algorithmic uncertainties that need to be investigated in order to increase confidence in model results. We identify and discuss five areas of enquiry that are of high importance for species distribution modelling: (1) clarification of the niche concept; (2) improved designs for sampling data for building models; (3) improved parameterization; (4) improved model selection and predictor contribution; and (5) improved model evaluation. The challenges discussed in this essay do not preclude the need for developments of other areas of research in this field. However, they are critical for allowing the science of species distribution modelling to move forward.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La spectroscopie infrarouge (FTIR) est une technique de choix dans l'analyse des peintures en spray (traces ou bonbonnes de référence), grâce à son fort pouvoir discriminant, sa sensibilité, et ses nombreuses possibilités d'échantillonnage. La comparaison des spectres obtenus est aujourd'hui principalement faite visuellement, mais cette procédure présente des limitations telles que la subjectivité de la prise de décision car celle-ci dépend de l'expérience et de la formation suivie par l'expert. De ce fait, de faibles différences d'intensités relatives entre deux pics peuvent être perçues différemment par des experts, même au sein d'un même laboratoire. Lorsqu'il s'agit de justifier ces différences, certains les expliqueront par la méthode analytique utilisée, alors que d'autres estimeront plutôt qu'il s'agit d'une variabilité intrinsèque à la peinture et/ou à son vécu (par exemple homogénéité, sprayage, ou dégradation). Ce travail propose d'étudier statistiquement les différentes sources de variabilité observables dans les spectres infrarouges, de les identifier, de les comprendre et tenter de les minimiser. Le deuxième objectif principal est de proposer une procédure de comparaison des spectres qui soit davantage transparente et permette d'obtenir des réponses reproductibles indépendamment des experts interrogés. La première partie du travail traite de l'optimisation de la mesure infrarouge et des principaux paramètres analytiques. Les conditions nécessaires afin d'obtenir des spectres reproductibles et minimisant la variation au sein d'un même échantillon (intra-variabilité) sont présentées. Par la suite une procédure de correction des spectres est proposée au moyen de prétraitements et de sélections de variables, afin de minimiser les erreurs systématiques et aléatoires restantes, et de maximiser l'information chimique pertinente. La seconde partie présente une étude de marché effectuée sur 74 bonbonnes de peintures en spray représentatives du marché suisse. Les capacités de discrimination de la méthode FTIR au niveau de la marque et du modèle sont évaluées au moyen d'une procédure visuelle, et comparées à diverses procédures statistiques. Les limites inférieures de discrimination sont testées sur des peintures de marques et modèles identiques mais provenant de différents lots de production. Les résultats ont montré que la composition en pigments était particulièrement discriminante, à cause des étapes de corrections et d'ajustement de la couleur subies lors de la production. Les particularités associées aux peintures en spray présentes sous forme de traces (graffitis, gouttelettes) ont également été testées. Trois éléments sont mis en évidence et leur influence sur le spectre infrarouge résultant testée : 1) le temps minimum de secouage nécessaire afin d'obtenir une homogénéité suffisante de la peinture et, en conséquence, de la surface peinte, 2) la dégradation initiée par le rayonnement ultra- violet en extérieur, et 3) la contamination provenant du support lors du prélèvement. Finalement une étude de population a été réalisée sur 35 graffitis de la région lausannoise et les résultats comparés à l'étude de marché des bonbonnes en spray. La dernière partie de ce travail s'est concentrée sur l'étape de prise de décision lors de la comparaison de spectres deux-à-deux, en essayant premièrement de comprendre la pratique actuelle au sein des laboratoires au moyen d'un questionnaire, puis de proposer une méthode statistique de comparaison permettant d'améliorer l'objectivité et la transparence lors de la prise de décision. Une méthode de comparaison basée sur la corrélation entre les spectres est proposée, et ensuite combinée à une évaluation Bayesienne de l'élément de preuve au niveau de la source et au niveau de l'activité. Finalement des exemples pratiques sont présentés et la méthodologie est discutée afin de définir le rôle précis de l'expert et des statistiques dans la procédure globale d'analyse des peintures. -- Infrared spectroscopy (FTIR) is a technique of choice for analyzing spray paint speciments (i.e. traces) and reference samples (i.e. cans seized from suspects) due to its high discriminating power, sensitivity and sampling possibilities. The comparison of the spectra is currently carried out visually, but this procedure has limitations such as the subjectivity in the decision due to its dependency on the experience and training of the expert. This implies that small differences in the relative intensity of two peaks can be perceived differently by experts, even between analysts working in the same laboratory. When it comes to justifying these differences, some will explain them by the analytical technique, while others will estimate that the observed differences are mostly due to an intrinsic variability from the paint sample and/or its acquired characteristics (for example homogeneity, spraying, or degradation). This work proposes to statistically study the different sources of variability observed in infrared spectra, to identify them, understand them and try to minimize them. The second goal is to propose a procedure for spectra comparison that is more transparent, and allows obtaining reproducible answers being independent from the expert. The first part of the manuscript focuses on the optimization of infrared measurement and on the main analytical parameters. The necessary conditions to obtain reproducible spectra with a minimized variation within a sample (intra-variability) are presented. Following that a procedure of spectral correction is then proposed using pretreatments and variable selection methods, in order to minimize systematic and random errors, and increase simultaneously relevant chemical information. The second part presents a market study of 74 spray paints representative of the Swiss market. The discrimination capabilities of FTIR at the brand and model level are evaluated by means of visual and statistical procedures. The inferior limits of discrimination are tested on paints coming from the same brand and model, but from different production batches. The results showed that the pigment composition was particularly discriminatory, because of the corrections and adjustments made to the paint color during its manufacturing process. The features associated with spray paint traces (graffitis, droplets) were also tested. Three elements were identified and their influence on the resulting infrared spectra were tested: 1) the minimum shaking time necessary to obtain a sufficient homogeneity of the paint and subsequently of the painted surface, 2) the degradation initiated by ultraviolet radiation in an exterior environment, and 3) the contamination from the support when paint is recovered. Finally a population study was performed on 35 graffitis coming from the city of Lausanne and surroundings areas, and the results were compared to the previous market study of spray cans. The last part concentrated on the decision process during the pairwise comparison of spectra. First, an understanding of the actual practice among laboratories was initiated by submitting a questionnaire. Then, a proposition for a statistical method of comparison was advanced to improve the objectivity and transparency during the decision process. A method of comparison based on the correlation between spectra is proposed, followed by the integration into a Bayesian framework at both source and activity levels. Finally, some case examples are presented and the recommended methodology is discussed in order to define the role of the expert as well as the contribution of the tested statistical approach within a global analytical sequence for paint examinations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genetic algorithm was used for variable selection in simultaneous determination of mixtures of glucose, maltose and fructose by mid infrared spectroscopy. Different models, using partial least squares (PLS) and multiple linear regression (MLR) with and without data pre-processing, were used. Based on the results obtained, it was verified that a simpler model (multiple linear regression with variable selection by genetic algorithm) produces results comparable to more complex methods (partial least squares). The relative errors obtained for the best model was around 3% for the sugar determination, which is acceptable for this kind of determination.