955 resultados para Variable-selection Problems
Resumo:
Understanding the genetic underpinnings of adaptive change is a fundamental but largely unresolved problem in evolutionary biology. Drosophila melanogaster, an ancestrally tropical insect that has spread to temperate regions and become cosmopolitan, offers a powerful opportunity for identifying the molecular polymorphisms underlying clinal adaptation. Here, we use genome-wide next-generation sequencing of DNA pools ('pool-seq') from three populations collected along the North American east coast to examine patterns of latitudinal differentiation. Comparing the genomes of these populations is particularly interesting since they exhibit clinal variation in a number of important life history traits. We find extensive latitudinal differentiation, with many of the most strongly differentiated genes involved in major functional pathways such as the insulin/TOR, ecdysone, torso, EGFR, TGFβ/BMP, JAK/STAT, immunity and circadian rhythm pathways. We observe particularly strong differentiation on chromosome 3R, especially within the cosmopolitan inversion In(3R)Payne, which contains a large number of clinally varying genes. While much of the differentiation might be driven by clinal differences in the frequency of In(3R)P, we also identify genes that are likely independent of this inversion. Our results provide genome-wide evidence consistent with pervasive spatially variable selection acting on numerous loci and pathways along the well-known North American cline, with many candidates implicated in life history regulation and exhibiting parallel differentiation along the previously investigated Australian cline.
Resumo:
This paper is motivated by the recent interest in the use of Bayesian VARs for forecasting, even in cases where the number of dependent variables is large. In such cases, factor methods have been traditionally used but recent work using a particular prior suggests that Bayesian VAR methods can forecast better. In this paper, we consider a range of alternative priors which have been used with small VARs, discuss the issues which arise when they are used with medium and large VARs and examine their forecast performance using a US macroeconomic data set containing 168 variables. We nd that Bayesian VARs do tend to forecast better than factor methods and provide an extensive comparison of the strengths and weaknesses of various approaches. Our empirical results show the importance of using forecast metrics which use the entire predictive density, instead of using only point forecasts.
Resumo:
This paper considers Bayesian variable selection in regressions with a large number of possibly highly correlated macroeconomic predictors. I show that by acknowledging the correlation structure in the predictors can improve forecasts over existing popular Bayesian variable selection algorithms.
Resumo:
We develop methods for Bayesian model averaging (BMA) or selection (BMS) in Panel Vector Autoregressions (PVARs). Our approach allows us to select between or average over all possible combinations of restricted PVARs where the restrictions involve interdependencies between and heterogeneities across cross-sectional units. The resulting BMA framework can find a parsimonious PVAR specification, thus dealing with overparameterization concerns. We use these methods in an application involving the euro area sovereign debt crisis and show that our methods perform better than alternatives. Our findings contradict a simple view of the sovereign debt crisis which divides the euro zone into groups of core and peripheral countries and worries about financial contagion within the latter group.
Resumo:
We develop methods for Bayesian model averaging (BMA) or selection (BMS) in Panel Vector Autoregressions (PVARs). Our approach allows us to select between or average over all possible combinations of restricted PVARs where the restrictions involve interdependencies between and heterogeneities across cross-sectional units. The resulting BMA framework can find a parsimonious PVAR specification, thus dealing with overparameterization concerns. We use these methods in an application involving the euro area sovereign debt crisis and show that our methods perform better than alternatives. Our findings contradict a simple view of the sovereign debt crisis which divides the euro zone into groups of core and peripheral countries and worries about financial contagion within the latter group.
Resumo:
The aim of this work is to evaluate the capabilities and limitations of chemometric methods and other mathematical treatments applied on spectroscopic data and more specifically on paint samples. The uniqueness of the spectroscopic data comes from the fact that they are multivariate - a few thousands variables - and highly correlated. Statistical methods are used to study and discriminate samples. A collection of 34 red paint samples was measured by Infrared and Raman spectroscopy. Data pretreatment and variable selection demonstrated that the use of Standard Normal Variate (SNV), together with removal of the noisy variables by a selection of the wavelengths from 650 to 1830 cm−1 and 2730-3600 cm−1, provided the optimal results for infrared analysis. Principal component analysis (PCA) and hierarchical clusters analysis (HCA) were then used as exploratory techniques to provide evidence of structure in the data, cluster, or detect outliers. With the FTIR spectra, the Principal Components (PCs) correspond to binder types and the presence/absence of calcium carbonate. 83% of the total variance is explained by the four first PCs. As for the Raman spectra, we observe six different clusters corresponding to the different pigment compositions when plotting the first two PCs, which account for 37% and 20% respectively of the total variance. In conclusion, the use of chemometrics for the forensic analysis of paints provides a valuable tool for objective decision-making, a reduction of the possible classification errors, and a better efficiency, having robust results with time saving data treatments.
Resumo:
OBJECTIVES: The aim of the study was to assess whether prospective follow-up data within the Swiss HIV Cohort Study can be used to predict patients who stop smoking; or among smokers who stop, those who start smoking again. METHODS: We built prediction models first using clinical reasoning ('clinical models') and then by selecting from numerous candidate predictors using advanced statistical methods ('statistical models'). Our clinical models were based on literature that suggests that motivation drives smoking cessation, while dependence drives relapse in those attempting to stop. Our statistical models were based on automatic variable selection using additive logistic regression with component-wise gradient boosting. RESULTS: Of 4833 smokers, 26% stopped smoking, at least temporarily; because among those who stopped, 48% started smoking again. The predictive performance of our clinical and statistical models was modest. A basic clinical model for cessation, with patients classified into three motivational groups, was nearly as discriminatory as a constrained statistical model with just the most important predictors (the ratio of nonsmoking visits to total visits, alcohol or drug dependence, psychiatric comorbidities, recent hospitalization and age). A basic clinical model for relapse, based on the maximum number of cigarettes per day prior to stopping, was not as discriminatory as a constrained statistical model with just the ratio of nonsmoking visits to total visits. CONCLUSIONS: Predicting smoking cessation and relapse is difficult, so that simple models are nearly as discriminatory as complex ones. Patients with a history of attempting to stop and those known to have stopped recently are the best candidates for an intervention.
Resumo:
BACKGROUND: A central question for understanding the evolutionary responses of plant species to rapidly changing environments is the assessment of their potential for short-term (in one or a few generations) genetic change. In our study, we consider the case of Pinus pinaster Aiton (maritime pine), a widespread Mediterranean tree, and (i) test, under different experimental conditions (growth chamber and semi-natural), whether higher recruitment in the wild from the most successful mothers is due to better performance of their offspring; and (ii) evaluate genetic change in quantitative traits across generations at two different life stages (mature trees and seedlings) that are known to be under strong selection pressure in forest trees. RESULTS: Genetic control was high for most traits (h2 = 0.137-0.876) under the milder conditions of the growth chamber, but only for ontogenetic change (0.276), total height (0.415) and survival (0.719) under the more stressful semi-natural conditions. Significant phenotypic selection gradients were found in mature trees for traits related to seed quality (germination rate and number of empty seeds). Moreover, female relative reproductive success was significantly correlated with offspring performance for specific leaf area (SLA) in the growth chamber experiment, and stem mass fraction (SMF) in the experiment under semi-natural conditions, two adaptive traits related to abiotic stress-response in pines. Selection gradients based on genetic covariance of seedling traits and responses to selection at this stage involved traits related to biomass allocation (SMF) and growth (as decomposed by a Gompertz model) or delayed ontogenetic change, depending also on the testing environment. CONCLUSIONS: Despite the evidence of microevolutionary change in adaptive traits in maritime pine, directional or disruptive changes are difficult to predict due to variable selection at different life stages and environments. At mature-tree stages, higher female effective reproductive success can be explained by differences in their production of offspring (due to seed quality) and, to a lesser extent, by seemingly better adapted seedlings. Selection gradients and responses to selection for seedlings also differed across experimental conditions. The distinct processes involved at the two life stages (mature trees or seedlings) together with environment-specific responses advice caution when predicting likely evolutionary responses to environmental change in Mediterranean forest trees.
Resumo:
The purpose of this paper is to examine the determinants of use internal or external labour market to fill a firm vacancy in SME’s taking into account the differences existing among blue and white collar jobs. Following different theories we can identify three main reasons for use internal candidates rather than external ones‐ firm specific knowledge, adverse selection problems and motivation‐. However, there are others factors that might affect this choice but the last theories don’t take into account. In this paper we try to shed some light on what are these other factors that may affect firm decision to use internal or external labour market. Particularly we analyses the relationship among new technologies, innovation activity and firm location on the staffing strategy. The results shows difference behaviour on the decision to fill a vacancy using internal or external labour markets between manufacturing and service firms, and this decision depends not only on firm internal characteristics, like technological complexity or innovation activity, but also on firm location. The results also support the hypothesis of ports of entry especially in the manufacturing sector.
Resumo:
Geographical information systems (GIS) are tools that have been recently tested for improving our understanding of the spatial distribution of disease. The objective of this paper was to further develop the GIS technology to model and control schistosomiasis using environmental, social, biological and remote-sensing variables. A final regression model (R² = 0.39) was established, after a variable selection phase, with a set of spatial variables including the presence or absence of Biomphalaria glabrata, winter enhanced vegetation index, summer minimum temperature and percentage of houses with water coming from a spring or well. A regional model was also developed by splitting the state of Minas Gerais (MG) into four regions and establishing a linear regression model for each of the four regions: 1 (R² = 0.97), 2 (R² = 0.60), 3 (R² = 0.63) and 4 (R² = 0.76). Based on these models, a schistosomiasis risk map was built for MG. In this paper, geostatistics was also used to make inferences about the presence of Biomphalaria spp. The result was a map of species and risk areas. The obtained risk map permits the association of uncertainties, which can be used to qualify the inferences and it can be thought of as an auxiliary tool for public health strategies.
Resumo:
En los últimos 30 años la proliferación de modelos cuantitativos de predicción de la insolvencia empresarial en la literatura contable y financiera ha despertado un gran interés entre los especialistas e investigadores de lamateria. Lo que en un principio fueron unos modelos elaborados con un único objetivo, han derivado en una fuente de investigación constante.En este documento se formula un modelo de predicción de la insolvencia a través de la combinación de diferentes variables cuantitativas extraídas de los estados contables de una muestra de empresas para los años 1994-1997. A través de un procedimiento por etapas se selecciona e interpreta cuáles son las más relevantes en cuanto a aportación de información.Una vez formulado este primer tipo de modelos se busca una alternativa a las variables anteriores a través de la técnica factorial del análisis de componentes principales. Con ella se hace una selección de variables y se aplica, junto conlos ratios anteriores, el análisis univariante. Por último, se comparan los modelos obtenidos y se concluye que aunque la literatura previa ofrece mejores porcentajes de clasificación, los modelos obtenidos a través del análisis decomponentes principales no deben ser rechazados por la claridad en la explicación de las causas que conducen a una empresa a la insolvencia.
Resumo:
En los últimos 30 años la proliferación de modelos cuantitativos de predicción de la insolvencia empresarial en la literatura contable y financiera ha despertado un gran interés entre los especialistas e investigadores de lamateria. Lo que en un principio fueron unos modelos elaborados con un único objetivo, han derivado en una fuente de investigación constante.En este documento se formula un modelo de predicción de la insolvencia a través de la combinación de diferentes variables cuantitativas extraídas de los estados contables de una muestra de empresas para los años 1994-1997. A través de un procedimiento por etapas se selecciona e interpreta cuáles son las más relevantes en cuanto a aportación de información.Una vez formulado este primer tipo de modelos se busca una alternativa a las variables anteriores a través de la técnica factorial del análisis de componentes principales. Con ella se hace una selección de variables y se aplica, junto conlos ratios anteriores, el análisis univariante. Por último, se comparan los modelos obtenidos y se concluye que aunque la literatura previa ofrece mejores porcentajes de clasificación, los modelos obtenidos a través del análisis decomponentes principales no deben ser rechazados por la claridad en la explicación de las causas que conducen a una empresa a la insolvencia.
Resumo:
Species distribution modelling is central to both fundamental and applied research in biogeography. Despite widespread use of models, there are still important conceptual ambiguities as well as biotic and algorithmic uncertainties that need to be investigated in order to increase confidence in model results. We identify and discuss five areas of enquiry that are of high importance for species distribution modelling: (1) clarification of the niche concept; (2) improved designs for sampling data for building models; (3) improved parameterization; (4) improved model selection and predictor contribution; and (5) improved model evaluation. The challenges discussed in this essay do not preclude the need for developments of other areas of research in this field. However, they are critical for allowing the science of species distribution modelling to move forward.
Resumo:
This letter presents advanced classification methods for very high resolution images. Efficient multisource information, both spectral and spatial, is exploited through the use of composite kernels in support vector machines. Weighted summations of kernels accounting for separate sources of spectral and spatial information are analyzed and compared to classical approaches such as pure spectral classification or stacked approaches using all the features in a single vector. Model selection problems are addressed, as well as the importance of the different kernels in the weighted summation.
Resumo:
This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.