31 resultados para Generalized Additive Models
em Université de Lausanne, Switzerland
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
Aim To assess the geographical transferability of niche-based species distribution models fitted with two modelling techniques. Location Two distinct geographical study areas in Switzerland and Austria, in the subalpine and alpine belts. Methods Generalized linear and generalized additive models (GLM and GAM) with a binomial probability distribution and a logit link were fitted for 54 plant species, based on topoclimatic predictor variables. These models were then evaluated quantitatively and used for spatially explicit predictions within (internal evaluation and prediction) and between (external evaluation and prediction) the two regions. Comparisons of evaluations and spatial predictions between regions and models were conducted in order to test if species and methods meet the criteria of full transferability. By full transferability, we mean that: (1) the internal evaluation of models fitted in region A and B must be similar; (2) a model fitted in region A must at least retain a comparable external evaluation when projected into region B, and vice-versa; and (3) internal and external spatial predictions have to match within both regions. Results The measures of model fit are, on average, 24% higher for GAMs than for GLMs in both regions. However, the differences between internal and external evaluations (AUC coefficient) are also higher for GAMs than for GLMs (a difference of 30% for models fitted in Switzerland and 54% for models fitted in Austria). Transferability, as measured with the AUC evaluation, fails for 68% of the species in Switzerland and 55% in Austria for GLMs (respectively for 67% and 53% of the species for GAMs). For both GAMs and GLMs, the agreement between internal and external predictions is rather weak on average (Kulczynski's coefficient in the range 0.3-0.4), but varies widely among individual species. The dominant pattern is an asymmetrical transferability between the two study regions (a mean decrease of 20% for the AUC coefficient when the models are transferred from Switzerland and 13% when they are transferred from Austria). Main conclusions The large inter-specific variability observed among the 54 study species underlines the need to consider more than a few species to test properly the transferability of species distribution models. The pronounced asymmetry in transferability between the two study regions may be due to peculiarities of these regions, such as differences in the ranges of environmental predictors or the varied impact of land-use history, or to species-specific reasons like differential phenotypic plasticity, existence of ecotypes or varied dependence on biotic interactions that are not properly incorporated into niche-based models. The lower variation between internal and external evaluation of GLMs compared to GAMs further suggests that overfitting may reduce transferability. Overall, a limited geographical transferability calls for caution when projecting niche-based models for assessing the fate of species in future environments.
Resumo:
BACKGROUND: Spirometry reference values are important for the interpretation of spirometry results. Reference values should be updated regularly, derived from a population as similar to the population for which they are to be used and span across all ages. Such spirometry reference equations are currently lacking for central European populations. OBJECTIVE: To develop spirometry reference equations for central European populations between 8 and 90 years of age. MATERIALS: We used data collected between January 1993 and December 2010 from a central European population. The data was modelled using "Generalized Additive Models for Location, Scale and Shape" (GAMLSS). RESULTS: The spirometry reference equations were derived from 118'891 individuals consisting of 60'624 (51%) females and 58'267 (49%) males. Altogether, there were 18'211 (15.3%) children under the age of 18 years. CONCLUSION: We developed spirometry reference equations for a central European population between 8 and 90 years of age that can be implemented in a wide range of clinical settings.
Resumo:
Aim We examined whether species occurrences are primarily limited by physiological tolerance in the abiotically more stressful end of climatic gradients (the asymmetric abiotic stress limitation (AASL) hypothesis) and the geographical predictions of this hypothesis: abiotic stress mainly determines upper-latitudinal and upper-altitudinal species range limits, and the importance of abiotic stress for these range limits increases the further northwards and upwards a species occurs. Location Europe and the Swiss Alps. Methods The AASL hypothesis predicts that species have skewed responses to climatic gradients, with a steep decline towards the more stressful conditions. Based on presence-absence data we examined the shape of plant species responses (measured as probability of occurrence) along three climatic gradients across latitudes in Europe (1577 species) and altitudes in the Swiss Alps (284 species) using Huisman-Olff-Fresco, generalized linear and generalized additive models. Results We found that almost half of the species from Europe and one-third from the Swiss Alps showed responses consistent with the predictions of the AASL hypothesis. Cold temperatures and a short growing season seemed to determine the upper-latitudinal and upper-altitudinal range limits of up to one-third of the species, while drought provided an important constraint at lower-latitudinal range limits for up to one-fifth of the species. We found a biome-dependent influence of abiotic stress and no clear support for abiotic stress as a stronger upper range-limit determinant for species with higher latitudinal and altitudinal distributions. However, the overall influence of climate as a range-limit determinant increased with latitude. Main conclusions Our results support the AASL hypothesis for almost half of the studied species, and suggest that temperature-related stress controls the upper-latitudinal and upper-altitudinal range limits of a large proportion of these species, while other factors including drought stress may be important at the lower range limits.
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
1. Model-based approaches have been used increasingly in conservation biology over recent years. Species presence data used for predictive species distribution modelling are abundant in natural history collections, whereas reliable absence data are sparse, most notably for vagrant species such as butterflies and snakes. As predictive methods such as generalized linear models (GLM) require absence data, various strategies have been proposed to select pseudo-absence data. However, only a few studies exist that compare different approaches to generating these pseudo-absence data. 2. Natural history collection data are usually available for long periods of time (decades or even centuries), thus allowing historical considerations. However, this historical dimension has rarely been assessed in studies of species distribution, although there is great potential for understanding current patterns, i.e. the past is the key to the present. 3. We used GLM to model the distributions of three 'target' butterfly species, Melitaea didyma, Coenonympha tullia and Maculinea teleius, in Switzerland. We developed and compared four strategies for defining pools of pseudo-absence data and applied them to natural history collection data from the last 10, 30 and 100 years. Pools included: (i) sites without target species records; (ii) sites where butterfly species other than the target species were present; (iii) sites without butterfly species but with habitat characteristics similar to those required by the target species; and (iv) a combination of the second and third strategies. Models were evaluated and compared by the total deviance explained, the maximized Kappa and the area under the curve (AUC). 4. Among the four strategies, model performance was best for strategy 3. Contrary to expectations, strategy 2 resulted in even lower model performance compared with models with pseudo-absence data simulated totally at random (strategy 1). 5. Independent of the strategy model, performance was enhanced when sites with historical species presence data were not considered as pseudo-absence data. Therefore, the combination of strategy 3 with species records from the last 100 years achieved the highest model performance. 6. Synthesis and applications. The protection of suitable habitat for species survival or reintroduction in rapidly changing landscapes is a high priority among conservationists. Model-based approaches offer planning authorities the possibility of delimiting priority areas for species detection or habitat protection. The performance of these models can be enhanced by fitting them with pseudo-absence data relying on large archives of natural history collection species presence data rather than using randomly sampled pseudo-absence data.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
1. Digital elevation models (DEMs) are often used in landscape ecology to retrieve elevation or first derivative terrain attributes such as slope or aspect in the context of species distribution modelling. However, DEM-derived variables are scale-dependent and, given the increasing availability of very high-resolution (VHR) DEMs, their ecological relevancemust be assessed for different spatial resolutions. 2. In a study area located in the Swiss Western Alps, we computed VHR DEMs-derived variables related to morphometry, hydrology and solar radiation. Based on an original spatial resolution of 0.5 m, we generated DEM-derived variables at 1, 2 and 4 mspatial resolutions, applying a Gaussian Pyramid. Their associations with local climatic factors, measured by sensors (direct and ambient air temperature, air humidity and soil moisture) as well as ecological indicators derived fromspecies composition, were assessed with multivariate generalized linearmodels (GLM) andmixed models (GLMM). 3. Specific VHR DEM-derived variables showed significant associations with climatic factors. In addition to slope, aspect and curvature, the underused wetness and ruggedness indices modelledmeasured ambient humidity and soilmoisture, respectively. Remarkably, spatial resolution of VHR DEM-derived variables had a significant influence on models' strength, with coefficients of determination decreasing with coarser resolutions or showing a local optimumwith a 2 mresolution, depending on the variable considered. 4. These results support the relevance of using multi-scale DEM variables to provide surrogates for important climatic variables such as humidity, moisture and temperature, offering suitable alternatives to direct measurements for evolutionary ecology studies at a local scale.
Resumo:
1. Landscape modification is often considered the principal cause of population decline in many bat species. Thus, schemes for bat conservation rely heavily on knowledge about species-landscape relationships. So far, however, few studies have quantified the possible influence of landscape structure on large-scale spatial patterns in bat communities. 2. This study presents quantitative models that use landscape structure to predict (i) spatial patterns in overall community composition and (ii) individual species' distributions through canonical correspondence analysis and generalized linear models, respectively. A geographical information system (GIS) was then used to draw up maps of (i) overall community patterns and (ii) distribution of potential species' habitats. These models relied on field data from the Swiss Jura mountains. 3. Fight descriptors of landscape structure accounted for 30% of the variation in bat community composition. For some species, more than 60% of the variance in distribution could be explained by landscape structure. Elevation, forest or woodland cover, lakes and suburbs, were the most frequent predictors. 4. This study shows that community composition in bats is related to landscape structure through species-specific relationships to resources. Due to their nocturnal activities and the difficulties of remote identification, a comprehensive bat census is rarely possible, and we suggest that predictive modelling of the type described here provides an indispensable conservation tool.
Resumo:
Western European landscapes have drastically changed since the 1950s, with agricultural intensifications and the spread of urban settlements considered the most important drivers of this land-use/land-cover change. Losses of habitat for fauna and flora have been a direct consequence of this development. In the present study, we relate butterfly occurrence to land-use/land-cover changes over five decades between 1951 and 2000. The study area covers the entire Swiss territory. The 10 explanatory variables originate from agricultural statistics and censuses. Both state as well as rate was used as explanatory variables. Species distribution data were obtained from natural history collections. We selected eight butterfly species: four species occur on wetlands and four occur on dry grasslands. We used cluster analysis to track land-use/land-cover changes and to group communes based on similar trajectories of change. Generalized linear models were applied to identify factors that were significantly correlated with the persistence or disappearance of butterfly species. Results showed that decreasing agricultural areas and densities of farms with more than 10 ha of cultivated land are significantly related with wetland species decline, and increasing densities of livestock seem to have favored disappearance of dry grassland species. Moreover, we show that species declines are not only dependent on land-use/land-cover states but also on the rates of change; that is, the higher the transformation rate from small to large farms, the higher the loss of dry grassland species. We suggest that more attention should be paid to the rates of landscape change as feasible drivers of species change and derive some management suggestions.
Resumo:
Aim To explore the respective power of climate and topography to predict the distribution of reptiles in Switzerland, hence at a mesoscale level. A more detailed knowledge of these relationships, in combination with maps of the potential distribution derived from the models, is a valuable contribution to the design of conservation strategies. Location All of Switzerland. Methods Generalized linear models are used to derive predictive habitat distribution models from eco-geographical predictors in a geographical information system, using species data from a field survey conducted between 1980 and 1999. Results The maximum amount of deviance explained by climatic models is 65%, and 50% by topographical models. Low values were obtained with both sets of predictors for three species that are widely distributed in all parts of the country (Anguis fragilis , Coronella austriaca , and Natrix natrix), a result that suggests that including other important predictors, such as resources, should improve the models in further studies. With respect to topographical predictors, low values were also obtained for two species where we anticipated a strong response to aspect and slope, Podarcis muralis and Vipera aspis . Main conclusions Overall, both models and maps derived from climatic predictors more closely match the actual reptile distributions than those based on topography. These results suggest that the distributional limits of reptile species with a restricted range in Switzerland are largely set by climatic, predominantly temperature-related, factors.
Resumo:
PURPOSE: Not in Education, Employment, or Training (NEET) youth are youth disengaged from major social institutions and constitute a worrying concern. However, little is known about this subgroup of vulnerable youth. This study aimed to examine if NEET youth differ from other contemporaries in terms of personality, mental health, and substance use and to provide longitudinal examination of NEET status, testing its stability and prospective pathways with mental health and substance use. METHODS: As part of the Cohort Study on Substance Use Risk Factors, 4,758 young Swiss men in their early 20s answered questions concerning their current professional and educational status, personality, substance use, and symptomatology related to mental health. Descriptive statistics, generalized linear models for cross-sectional comparisons, and cross-lagged panel models for longitudinal associations were computed. RESULTS: NEET youth were 6.1% at baseline and 7.4% at follow-up with 1.4% being NEET at both time points. Comparisons between NEET and non-NEET youth showed significant differences in substance use and depressive symptoms only. Longitudinal associations showed that previous mental health, cannabis use, and daily smoking increased the likelihood of being NEET. Reverse causal paths were nonsignificant. CONCLUSIONS: NEET status seemed to be unlikely and transient among young Swiss men, associated with differences in mental health and substance use but not in personality. Causal paths presented NEET status as a consequence of mental health and substance use rather than a cause. Additionally, this study confirmed that cannabis use and daily smoking are public health problems. Prevention programs need to focus on these vulnerable youth to avoid them being disengaged.
Resumo:
Question Does a land-use variable improve spatial predictions of plant species presence-absence and abundance models at the regional scale in a mountain landscape? Location Western Swiss Alps. Methods Presence-absence generalized linear models (GLM) and abundance ordinal logistic regression models (LRM) were fitted to data on 78 mountain plant species, with topo-climatic and/or land-use variables available at a 25-m resolution. The additional contribution of land use when added to topo-climatic models was evaluated by: (1) assessing the changes in model fit and (2) predictive power, (3) partitioning the deviance respectively explained by the topo-climatic variables and the land-use variable through variation partitioning, and (5) comparing spatial projections. Results Land use significantly improved the fit of presence-absence models but not their predictive power. In contrast, land use significantly improved both the fit and predictive power of abundance models. Variation partitioning also showed that the individual contribution of land use to the deviance explained by presence-absence models was, on average, weak for both GLM and LRM (3.7% and 4.5%, respectively), but changes in spatial projections could nevertheless be important for some species. Conclusions In this mountain area and at our regional scale, land use is important for predicting abundance, but not presence-absence. The importance of adding land-use information depends on the species considered. Even without a marked effect on model fit and predictive performance, adding land use can affect spatial projections of both presence-absence and abundance models.
Resumo:
We present models predicting the potential distribution of a threatened ant species, Formica exsecta Nyl., in the Swiss National Park ( SNP). Data to fit the models have been collected according to a random-stratified design with an equal number of replicates per stratum. The basic aim of such a sampling strategy is to allow the formal testing of biological hypotheses about those factors most likely to account for the distribution of the modeled species. The stratifying factors used in this study were: vegetation, slope angle and slope aspect, the latter two being used as surrogates of solar radiation, considered one of the basic requirements of F. exsecta. Results show that, although the basic stratifying predictors account for more than 50% of the deviance, the incorporation of additional non-spatially explicit predictors into the model, as measured in the field, allows for an increased model performance (up to nearly 75%). However, this was not corroborated by permutation tests. Implementation on a national scale was made for one model only, due to the difficulty of obtaining similar predictors on this scale. The resulting map on the national scale suggests that the species might once have had a broader distribution in Switzerland. Reasons for its particular abundance within the SNP might possibly be related to habitat fragmentation and vegetation transformation outside the SNP boundaries.