988 resultados para model averaging
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
Bagging is a method of obtaining more ro- bust predictions when the model class under consideration is unstable with respect to the data, i.e., small changes in the data can cause the predicted values to change significantly. In this paper, we introduce a Bayesian ver- sion of bagging based on the Bayesian boot- strap. The Bayesian bootstrap resolves a the- oretical problem with ordinary bagging and often results in more efficient estimators. We show how model averaging can be combined within the Bayesian bootstrap and illustrate the procedure with several examples.
Resumo:
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
Resumo:
L’intérêt principal de cette recherche porte sur la validation d’une méthode statistique en pharmaco-épidémiologie. Plus précisément, nous allons comparer les résultats d’une étude précédente réalisée avec un devis cas-témoins niché dans la cohorte utilisé pour tenir compte de l’exposition moyenne au traitement : – aux résultats obtenus dans un devis cohorte, en utilisant la variable exposition variant dans le temps, sans faire d’ajustement pour le temps passé depuis l’exposition ; – aux résultats obtenus en utilisant l’exposition cumulative pondérée par le passé récent ; – aux résultats obtenus selon la méthode bayésienne. Les covariables seront estimées par l’approche classique ainsi qu’en utilisant l’approche non paramétrique bayésienne. Pour la deuxième le moyennage bayésien des modèles sera utilisé pour modéliser l’incertitude face au choix des modèles. La technique utilisée dans l’approche bayésienne a été proposée en 1997 mais selon notre connaissance elle n’a pas été utilisée avec une variable dépendante du temps. Afin de modéliser l’effet cumulatif de l’exposition variant dans le temps, dans l’approche classique la fonction assignant les poids selon le passé récent sera estimée en utilisant des splines de régression. Afin de pouvoir comparer les résultats avec une étude précédemment réalisée, une cohorte de personnes ayant un diagnostique d’hypertension sera construite en utilisant les bases des données de la RAMQ et de Med-Echo. Le modèle de Cox incluant deux variables qui varient dans le temps sera utilisé. Les variables qui varient dans le temps considérées dans ce mémoire sont iv la variable dépendante (premier évènement cérébrovasculaire) et une des variables indépendantes, notamment l’exposition
Resumo:
To identify the causes of population decline in migratory birds, researchers must determine the relative influence of environmental changes on population dynamics while the birds are on breeding grounds, wintering grounds, and en route between the two. This is problematic when the wintering areas of specific populations are unknown. Here, we first identified the putative wintering areas of Common House-Martin (Delichon urbicum) and Common Swift (Apus apus) populations breeding in northern Italy as those areas, within the wintering ranges of these species, where the winter Normalized Difference Vegetation Index (NDVI), which may affect winter survival, best predicted annual variation in population indices observed in the breeding grounds in 1992–2009. In these analyses, we controlled for the potentially confounding effects of rainfall in the breeding grounds during the previous year, which may affect reproductive success; the North Atlantic Oscillation Index (NAO), which may account for climatic conditions faced by birds during migration; and the linear and squared term of year, which account for nonlinear population trends. The areas thus identified ranged from Guinea to Nigeria for the Common House-Martin, and were located in southern Ghana for the Common Swift. We then regressed annual population indices on mean NDVI values in the putative wintering areas and on the other variables, and used Bayesian model averaging (BMA) and hierarchical partitioning (HP) of variance to assess their relative contribution to population dynamics. We re-ran all the analyses using NDVI values at different spatial scales, and consistently found that our population of Common House-Martin was primarily affected by spring rainfall (43%–47.7% explained variance) and NDVI (24%–26.9%), while the Common Swift population was primarily affected by the NDVI (22.7%–34.8%). Although these results must be further validated, currently they are the only hypotheses about the wintering grounds of the Italian populations of these species, as no Common House-Martin and Common Swift ringed in Italy have been recovered in their wintering ranges.
Resumo:
Common Loon (Gavia immer) is considered an emblematic and ecologically important example of aquatic-dependent wildlife in North America. The northern breeding range of Common Loon has contracted over the last century as a result of habitat degradation from human disturbance and lakeshore development. We focused on the state of New Hampshire, USA, where a long-term monitoring program conducted by the Loon Preservation Committee has been collecting biological data on Common Loon since 1976. The Common Loon population in New Hampshire is distributed throughout the state across a wide range of lake-specific habitats, water quality conditions, and levels of human disturbance. We used a multiscale approach to evaluate the association of Common Loon and breeding habitat within three natural physiographic ecoregions of New Hampshire. These multiple scales reflect Common Loon-specific extents such as territories, home ranges, and lake-landscape influences. We developed ecoregional multiscale models and compared them to single-scale models to evaluate model performance in distinguishing Common Loon breeding habitat. Based on information-theoretic criteria, there is empirical support for both multiscale and single-scale models across all three ecoregions, warranting a model-averaging approach. Our results suggest that the Common Loon responds to both ecological and anthropogenic factors at multiple scales when selecting breeding sites. These multiscale models can be used to identify and prioritize the conservation of preferred nesting habitat for Common Loon populations.
Resumo:
The political economy literature on agriculture emphasizes influence over political outcomes via lobbying conduits in general, political action committee contributions in particular and the pervasive view that political preferences with respect to agricultural issues are inherently geographic. In this context, ‘interdependence’ in Congressional vote behaviour manifests itself in two dimensions. One dimension is the intensity by which neighboring vote propensities influence one another and the second is the geographic extent of voter influence. We estimate these facets of dependence using data on a Congressional vote on the 2001 Farm Bill using routine Markov chain Monte Carlo procedures and Bayesian model averaging, in particular. In so doing, we develop a novel procedure to examine both the reliability and the consequences of different model representations for measuring both the ‘scale’ and the ‘scope’ of spatial (geographic) co-relations in voting behaviour.
Resumo:
The performance of rank dependent preference functionals under risk is comprehensively evaluated using Bayesian model averaging. Model comparisons are made at three levels of heterogeneity plus three ways of linking deterministic and stochastic models: the differences in utilities, the differences in certainty equivalents and contextualutility. Overall, the"bestmodel", which is conditional on the form of heterogeneity is a form of Rank Dependent Utility or Prospect Theory that cap tures the majority of behaviour at both the representative agent and individual level. However, the curvature of the probability weighting function for many individuals is S-shaped, or ostensibly concave or convex rather than the inverse S-shape commonly employed. Also contextual utility is broadly supported across all levels of heterogeneity. Finally, the Priority Heuristic model, previously examined within a deterministic setting, is estimated within a stochastic framework, and allowing for endogenous thresholds does improve model performance although it does not compete well with the other specications considered.
Resumo:
1. Analyses of species association have major implications for selecting indicators for freshwater biomonitoring and conservation, because they allow for the elimination of redundant information and focus on taxa that can be easily handled and identified. These analyses are particularly relevant in the debate about using speciose groups (such as the Chironomidae) as indicators in the tropics, because they require difficult and time-consuming analysis, and their responses to environmental gradients, including anthropogenic stressors, are poorly known. 2. Our objective was to show whether chironomid assemblages in Neotropical streams include clear associations of taxa and, if so, how well these associations could be explained by a set of models containing information from different spatial scales. For this, we formulated a priori models that allowed for the influence of local, landscape and spatial factors on chironomid taxon associations (CTA). These models represented biological hypotheses capable of explaining associations between chironomid taxa. For instance, CTA could be best explained by local variables (e.g. pH, conductivity and water temperature) or by processes acting at wider landscape scales (e.g. percentage of forest cover). 3. Biological data were taken from 61 streams in Southeastern Brazil, 47 of which were in well-preserved regions, and 14 of which drained areas severely affected by anthropogenic activities. We adopted a model selection procedure using Akaike`s information criterion to determine the most parsimonious models for explaining CTA. 4. Applying Kendall`s coefficient of concordance, seven genera (Tanytarsus/Caladomyia, Ablabesmyia, Parametriocnemus, Pentaneura, Nanocladius, Polypedilum and Rheotanytarsus) were identified as associated taxa. The best-supported model explained 42.6% of the total variance in the abundance of associated taxa. This model combined local and landscape environmental filters and spatial variables (which were derived from eigenfunction analysis). However, the model with local filters and spatial variables also had a good chance of being selected as the best model. 5. Standardised partial regression coefficients of local and landscape filters, including spatial variables, derived from model averaging allowed an estimation of which variables were best correlated with the abundance of associated taxa. In general, the abundance of the associated genera tended to be lower in streams characterised by a high percentage of forest cover (landscape scale), lower proportion of muddy substrata and high values of pH and conductivity (local scale). 6. Overall, our main result adds to the increasing number of studies that have indicated the importance of local and landscape variables, as well as the spatial relationships among sampling sites, for explaining aquatic insect community patterns in streams. Furthermore, our findings open new possibilities for the elimination of redundant data in the assessment of anthropogenic impacts on tropical streams.
Resumo:
In the first essay, "Determinants of Credit Expansion in Brazil", analyzes the determinants of credit using an extensive bank level panel dataset. Brazilian economy has experienced a major boost in leverage in the first decade of 2000 as a result of a set factors ranging from macroeconomic stability to the abundant liquidity in international financial markets before 2008 and a set of deliberate decisions taken by President Lula's to expand credit, boost consumption and gain political support from the lower social strata. As relevant conclusions to our investigation we verify that: credit expansion relied on the reduction of the monetary policy rate, international financial markets are an important source of funds, payroll-guaranteed credit and investment grade status affected positively credit supply. We were not able to confirm the importance of financial inclusion efforts. The importance of financial sector sanity indicators of credit conditions cannot be underestimated. These results raise questions over the sustainability of this expansion process and financial stability in the future. The second essay, “Public Credit, Monetary Policy and Financial Stability”, discusses the role of public credit. The supply of public credit in Brazil has successfully served to relaunch the economy after the Lehman-Brothers demise. It was later transformed into a driver for economic growth as well as a regulation device to force private banks to reduce interest rates. We argue that the use of public funds to finance economic growth has three important drawbacks: it generates inflation, induces higher loan rates and may induce financial instability. An additional effect is the prevention of market credit solutions. This study contributes to the understanding of the costs and benefits of credit as a fiscal policy tool. The third essay, “Bayesian Forecasting of Interest Rates: Do Priors Matter?”, discusses the choice of priors when forecasting short-term interest rates. Central Banks that commit to an Inflation Target monetary regime are bound to respond to inflation expectation spikes and product hiatus widening in a clear and transparent way by abiding to a Taylor rule. There are various reports of central banks being more responsive to inflationary than to deflationary shocks rendering the monetary policy response to be indeed non-linear. Besides that there is no guarantee that coefficients remain stable during time. Central Banks may switch to a dual target regime to consider deviations from inflation and the output gap. The estimation of a Taylor rule may therefore have to consider a non-linear model with time varying parameters. This paper uses Bayesian forecasting methods to predict short-term interest rates. We take two different approaches: from a theoretic perspective we focus on an augmented version of the Taylor rule and include the Real Exchange Rate, the Credit-to-GDP and the Net Public Debt-to-GDP ratios. We also take an ”atheoretic” approach based on the Expectations Theory of the Term Structure to model short-term interest. The selection of priors is particularly relevant for predictive accuracy yet, ideally, forecasting models should require as little a priori expert insight as possible. We present recent developments in prior selection, in particular we propose the use of hierarchical hyper-g priors for better forecasting in a framework that can be easily extended to other key macroeconomic indicators.
Resumo:
The dissertation contains five parts: An introduction, three major chapters, and a short conclusion. The First Chapter starts from a survey and discussion of the studies on corporate law and financial development literature. The commonly used methods in these cross-sectional analyses are biased as legal origins are no longer valid instruments. Hence, the model uncertainty becomes a salient problem. The Bayesian Model Averaging algorithm is applied to test the robustness of empirical results in Djankov et al. (2008). The analysis finds that their constructed legal index is not robustly correlated with most of the various stock market outcome variables. The second Chapter looks into the effects of minority shareholders protection in corporate governance regime on entrepreneurs' ex ante incentives to undertake IPO. Most of the current literature focuses on the beneficial part of minority shareholder protection on valuation, while overlooks its private costs on entrepreneur's control. As a result, the entrepreneur trade-offs the costs of monitoring with the benefits of cheap sources of finance when minority shareholder protection improves. The theoretical predictions are empirically tested using panel data and GMM-sys estimator. The third Chapter investigates the corporate law and corporate governance reform in China. The corporate law in China regards shareholder control as the means to the ends of pursuing the interests of stakeholders, which is inefficient. The Chapter combines the recent development of theories of the firm, i.e., the team production theory and the property rights theory, to solve such problem. The enlightened shareholder value, which emphasizes on the long term valuation of the firm, should be adopted as objectives of listed firms. In addition, a move from the mandatory division of power between shareholder meeting and board meeting to the default regime, is proposed.
Resumo:
Aims: Species diversity and genetic diversity may be affected in parallel by similar environmental drivers. However, genetic diversity may also be affected independently by habitat characteristics. We aim at disentangling relationships between genetic diversity, species diversity and habitat characteristics of woody species in subtropical forest. Methods: We studied 11 dominant tree and shrub species in 27 plots in Gutianshan, China, and assessed their genetic diversity (Ar) and population differentiation (F’ST) with microsatellite markers. We tested if Ar and population specific F’ST were correlated to local species diversity and plot characteristics. Multi-model inference and model averaging were used to determine the relative importance of each predictor. Additionally we tested for isolation-by-distance and isolation-by-elevation by regressing pairwise F’ST against pairwise spatial and elevational distances. Important findings: Genetic diversity was not related to species diversity for any of the study species. Thus, our results do not support joint effects of habitat characteristics on these two levels of biodiversity. Instead, genetic diversity in two understory shrubs, Rhododendron simsii and Vaccinium carlesii, was affected by plot age with decreasing genetic diversity in successionally older plots. Population differentiation increased with plot age in Rhododendron simsii and Lithocarpus glaber. This shows that succession can reduce genetic diversity within, and increase genetic diversity between populations. Furthermore, we found four cases of isolation-by-distance and two cases of isolation-by-elevation. The former indicates inefficient pollen and seed dispersal by animals whereas the latter might be due to phenological asynchronies. These patterns indicate that succession can affect genetic diversity without parallel effects on species diversity and that gene flow in a continuous subtropical forest can be restricted even at a local scale.
Resumo:
Treating patients with combined agents is a growing trend in cancer clinical trials. Evaluating the synergism of multiple drugs is often the primary motivation for such drug-combination studies. Focusing on the drug combination study in the early phase clinical trials, our research is composed of three parts: (1) We conduct a comprehensive comparison of four dose-finding designs in the two-dimensional toxicity probability space and propose using the Bayesian model averaging method to overcome the arbitrariness of the model specification and enhance the robustness of the design; (2) Motivated by a recent drug-combination trial at MD Anderson Cancer Center with a continuous-dose standard of care agent and a discrete-dose investigational agent, we propose a two-stage Bayesian adaptive dose-finding design based on an extended continual reassessment method; (3) By combining phase I and phase II clinical trials, we propose an extension of a single agent dose-finding design. We model the time-to-event toxicity and efficacy to direct dose finding in two-dimensional drug-combination studies. We conduct extensive simulation studies to examine the operating characteristics of the aforementioned designs and demonstrate the designs' good performances in various practical scenarios.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Understanding how the environment influences patterns of diversity is vital for effective conservation management, especially in a changing global climate. While assemblage structure and species richness patterns are often correlated with current environmental factors, historical influences may also be considerable, especially for taxa with poor dispersal abilities. Mountain-top regions throughout tropical rainforests can act as important refugia for taxa characterised by low dispersal capacities such as flightless ground beetles (Carabidae), an ecologically significant predatory group. We surveyed flightless ground beetles along elevational gradients in five different subregions within the Australian Wet Tropics World Heritage Area to investigate (1) whether the diversity and composition of flightless ground beetles are elevationally stratified, and, if so, (2) what environmental factors (other than elevation per se) are associated with these patterns. Generalised linear models and model averaging techniques were used to relate patterns of diversity to environmental factors. Unlike most taxonomic groups, flightless ground beetles increased in species richness and abundance with elevation. Additionally, each subregion consisted of distinct assemblages containing a high level of regional endemic species. Species richness was most strongly positively associated with the historical climatic conditions and negatively associated with severity of recent disturbance (treefalls) and current climatic conditions. Assemblage composition was associated with latitude and current and historical climatic conditions. Our results suggest that distributional patterns of flightless ground beetles are not only likely to be associated with factors that change with elevation (current climatic conditions), but also factors that are independent of elevation (recent disturbance and historical climatic conditions). Variation in historical vegetation stability explained both species richness and assemblage composition patterns, probably reflecting the significance of upland refugia at a geographic time scale. These findings are important for conservation management as upland habitats are under threat from climate change.