912 resultados para model selection
Disentangling the effects of key innovations on the diversification of Bromelioideae (bromeliaceae).
Resumo:
The evolution of key innovations, novel traits that promote diversification, is often seen as major driver for the unequal distribution of species richness within the tree of life. In this study, we aim to determine the factors underlying the extraordinary radiation of the subfamily Bromelioideae, one of the most diverse clades among the neotropical plant family Bromeliaceae. Based on an extended molecular phylogenetic data set, we examine the effect of two putative key innovations, that is, the Crassulacean acid metabolism (CAM) and the water-impounding tank, on speciation and extinction rates. To this aim, we develop a novel Bayesian implementation of the phylogenetic comparative method, binary state speciation and extinction, which enables hypotheses testing by Bayes factors and accommodates the uncertainty on model selection by Bayesian model averaging. Both CAM and tank habit were found to correlate with increased net diversification, thus fulfilling the criteria for key innovations. Our analyses further revealed that CAM photosynthesis is correlated with a twofold increase in speciation rate, whereas the evolution of the tank had primarily an effect on extinction rates that were found five times lower in tank-forming lineages compared to tank-less clades. These differences are discussed in the light of biogeography, ecology, and past climate change.
Resumo:
The role of land cover change as a significant component of global change has become increasingly recognized in recent decades. Large databases measuring land cover change, and the data which can potentially be used to explain the observed changes, are also becoming more commonly available. When developing statistical models to investigate observed changes, it is important to be aware that the chosen sampling strategy and modelling techniques can influence results. We present a comparison of three sampling strategies and two forms of grouped logistic regression models (multinomial and ordinal) in the investigation of patterns of successional change after agricultural land abandonment in Switzerland. Results indicated that both ordinal and nominal transitional change occurs in the landscape and that the use of different sampling regimes and modelling techniques as investigative tools yield different results. Synthesis and applications. Our multimodel inference identified successfully a set of consistently selected indicators of land cover change, which can be used to predict further change, including annual average temperature, the number of already overgrown neighbouring areas of land and distance to historically destructive avalanche sites. This allows for more reliable decision making and planning with respect to landscape management. Although both model approaches gave similar results, ordinal regression yielded more parsimonious models that identified the important predictors of land cover change more efficiently. Thus, this approach is favourable where land cover change pattern can be interpreted as an ordinal process. Otherwise, multinomial logistic regression is a viable alternative.
Resumo:
We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher effect.
Resumo:
Block factor methods offer an attractive approach to forecasting with many predictors. These extract the information in these predictors into factors reflecting different blocks of variables (e.g. a price block, a housing block, a financial block, etc.). However, a forecasting model which simply includes all blocks as predictors risks being over-parameterized. Thus, it is desirable to use a methodology which allows for different parsimonious forecasting models to hold at different points in time. In this paper, we use dynamic model averaging and dynamic model selection to achieve this goal. These methods automatically alter the weights attached to different forecasting model as evidence comes in about which has forecast well in the recent past. In an empirical study involving forecasting output and inflation using 139 UK monthly time series variables, we find that the set of predictors changes substantially over time. Furthermore, our results show that dynamic model averaging and model selection can greatly improve forecast performance relative to traditional forecasting methods.
Resumo:
We develop methods for Bayesian inference in vector error correction models which are subject to a variety of switches in regime (e.g. Markov switches in regime or structural breaks). An important aspect of our approach is that we allow both the cointegrating vectors and the number of cointegrating relationships to change when the regime changes. We show how Bayesian model averaging or model selection methods can be used to deal with the high-dimensional model space that results. Our methods are used in an empirical study of the Fisher e ffect.
Resumo:
This paper discusses the challenges faced by the empirical macroeconomist and methods for surmounting them. These challenges arise due to the fact that macroeconometric models potentially include a large number of variables and allow for time variation in parameters. These considerations lead to models which have a large number of parameters to estimate relative to the number of observations. A wide range of approaches are surveyed which aim to overcome the resulting problems. We stress the related themes of prior shrinkage, model averaging and model selection. Subsequently, we consider a particular modelling approach in detail. This involves the use of dynamic model selection methods with large TVP-VARs. A forecasting exercise involving a large US macroeconomic data set illustrates the practicality and empirical success of our approach.
Multimodel inference and multimodel averaging in empirical modeling of occupational exposure levels.
Resumo:
Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
We analysed the spatial variation in morphological diversity (MDiv) and species richness (SR) for 91 species of Neotropical Triatominae to determine the ecological relationships between SR and MDiv and to explore the roles that climate, productivity, environmental heterogeneity and the presence of biomes and rivers may play in the structuring of species assemblages. For each 110 km x 110 km-cell on a grid map of America, we determined the number of species (SR) and estimated the mean Gower index (MDiv) based on 12 morphological attributes. We performed bootstrapping analyses of species assemblages to identify whether those assemblages were more similar or dissimilar in their morphology than expected by chance. We applied a multi-model selection procedure and spatial explicit analyses to account for the association of diversity-environment relationships. MDiv and SR both showed a latitudinal gradient, although each peaked at different locations and were thus not strictly spatially congruent. SR decreased with temperature variability and MDiv increased with mean temperature, suggesting a predominant role for ambient energy in determining Triatominae diversity. Species that were more similar than expected by chance co-occurred near the limits of the Triatominae distribution in association with changes in environmental variables. Environmental filtering may underlie the structuring of species assemblages near their distributional limits.
Resumo:
A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation
Resumo:
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Resumo:
Despite the advancement of phylogenetic methods to estimate speciation and extinction rates, their power can be limited under variable rates, in particular for clades with high extinction rates and small number of extant species. Fossil data can provide a powerful alternative source of information to investigate diversification processes. Here, we present PyRate, a computer program to estimate speciation and extinction rates and their temporal dynamics from fossil occurrence data. The rates are inferred in a Bayesian framework and are comparable to those estimated from phylogenetic trees. We describe how PyRate can be used to explore different models of diversification. In addition to the diversification rates, it provides estimates of the parameters of the preservation process (fossilization and sampling) and the times of speciation and extinction of each species in the data set. Moreover, we develop a new birth-death model to correlate the variation of speciation/extinction rates with changes of a continuous trait. Finally, we demonstrate the use of Bayes factors for model selection and show how the posterior estimates of a PyRate analysis can be used to generate calibration densities for Bayesian molecular clock analysis. PyRate is an open-source command-line Python program available at http://sourceforge.net/projects/pyrate/.
Resumo:
We present a new general concentration-of-measure inequality and illustrate its power by applications in random combinatorics. The results find direct applications in some problems of learning theory.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
L'étude du mouvement des organismes est essentiel pour la compréhension du fonctionnement des écosystèmes. Dans le cas des écosystèmes marins exploités, cela amène à s'intéresser aux stratégies spatiales des pêcheurs. L'une des approches les plus utilisées pour la modélisation du mouvement des prédateurs supé- rieurs est la marche aléatoire de Lévy. Une marche aléatoire est un modèle mathématique composé par des déplacements aléatoires. Dans le cas de Lévy, les longueurs des déplacements suivent une loi stable de Lévy. Dans ce cas également, les longueurs, lorsqu'elles tendent vers l'in ni (in praxy lorsqu'elles sont grandes, grandes par rapport à la médiane ou au troisième quartile par exemple), suivent une loi puissance caractéristique du type de marche aléatoire de Lévy (Cauchy, Brownien ou strictement Lévy). Dans la pratique, outre que cette propriété est utilisée de façon réciproque sans fondement théorique, les queues de distribution, notion par ailleurs imprécise, sont modélisée par des lois puissances sans que soient discutées la sensibilité des résultats à la dé nition de la queue de distribution, et la pertinence des tests d'ajustement et des critères de choix de modèle. Dans ce travail portant sur les déplacements observés de trois bateaux de pêche à l'anchois du Pérou, plusieurs modèles de queues de distribution (log-normal, exponentiel, exponentiel tronqué, puissance et puissance tronqué) ont été comparés ainsi que deux dé nitions possible de queues de distribution (de la médiane à l'in ni ou du troisième quartile à l'in ni). Au plan des critères et tests statistiques utilisés, les lois tronquées (exponentielle et puissance) sont apparues les meilleures. Elles intègrent en outre le fait que, dans la pratique, les bateaux ne dépassent pas une certaine limite de longueur de déplacement. Le choix de modèle est apparu sensible au choix du début de la queue de distribution : pour un même bateau, le choix d'un modèle tronqué ou l'autre dépend de l'intervalle des valeurs de la variable sur lequel le modèle est ajusté. Pour nir, nous discutons les implications en écologie des résultats de ce travail.