992 resultados para Additive models
Resumo:
The aim of this paper is to predict time series of SO2 concentrations emitted by coal-fired power stations in order to estimate in advance emission episodes and analyze the influence of some meteorological variables in the prediction. An emission episode is said to occur when the series of bi-hourly means of SO2 is greater than a specific level. For coal-fired power stations it is essential to predict emission epi- sodes sufficiently in advance so appropriate preventive measures can be taken. We proposed a meth- odology to predict SO2 emission episodes based on using an additive model and an algorithm for variable selection. The methodology was applied to the estimation of SO2 emissions registered in sampling lo- cations near a coal-fired power station located in Northern Spain. The results obtained indicate a good performance of the model considering only two terms of the time series and that the inclusion of the meteorological variables in the model is not significant.
Resumo:
Este proyecto propone extender y generalizar los procesos de estimación e inferencia de modelos aditivos generalizados multivariados para variables aleatorias no gaussianas, que describen comportamientos de fenómenos biológicos y sociales y cuyas representaciones originan series longitudinales y datos agregados (clusters). Se genera teniendo como objeto para las aplicaciones inmediatas, el desarrollo de metodología de modelación para la comprensión de procesos biológicos, ambientales y sociales de las áreas de Salud y las Ciencias Sociales, la condicionan la presencia de fenómenos específicos, como el de las enfermedades.Es así que el plan que se propone intenta estrechar la relación entre la Matemática Aplicada, desde un enfoque bajo incertidumbre y las Ciencias Biológicas y Sociales, en general, generando nuevas herramientas para poder analizar y explicar muchos problemas sobre los cuales tienen cada vez mas información experimental y/o observacional.Se propone, en forma secuencial, comenzando por variables aleatorias discretas (Yi, con función de varianza menor que una potencia par del valor esperado E(Y)) generar una clase unificada de modelos aditivos (paramétricos y no paramétricos) generalizados, la cual contenga como casos particulares a los modelos lineales generalizados, no lineales generalizados, los aditivos generalizados, los de media marginales generalizados (enfoques GEE1 -Liang y Zeger, 1986- y GEE2 -Zhao y Prentice, 1990; Zeger y Qaqish, 1992; Yan y Fine, 2004), iniciando una conexión con los modelos lineales mixtos generalizados para variables latentes (GLLAMM, Skrondal y Rabe-Hesketh, 2004), partiendo de estructuras de datos correlacionados. Esto permitirá definir distribuciones condicionales de las respuestas, dadas las covariables y las variables latentes y estimar ecuaciones estructurales para las VL, incluyendo regresiones de VL sobre las covariables y regresiones de VL sobre otras VL y modelos específicos para considerar jerarquías de variación ya reconocidas. Cómo definir modelos que consideren estructuras espaciales o temporales, de manera tal que permitan la presencia de factores jerárquicos, fijos o aleatorios, medidos con error como es el caso de las situaciones que se presentan en las Ciencias Sociales y en Epidemiología, es un desafío a nivel estadístico. Se proyecta esa forma secuencial para la construcción de metodología tanto de estimación como de inferencia, comenzando con variables aleatorias Poisson y Bernoulli, incluyendo los existentes MLG, hasta los actuales modelos generalizados jerárquicos, conextando con los GLLAMM, partiendo de estructuras de datos correlacionados. Esta familia de modelos se generará para estructuras de variables/vectores, covariables y componentes aleatorios jerárquicos que describan fenómenos de las Ciencias Sociales y la Epidemiología.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
With the current concern over climate change, descriptions of how rainfall patterns are changing over time can be useful. Observations of daily rainfall data over the last few decades provide information on these trends. Generalized linear models are typically used to model patterns in the occurrence and intensity of rainfall. These models describe rainfall patterns for an average year but are more limited when describing long-term trends, particularly when these are potentially non-linear. Generalized additive models (GAMS) provide a framework for modelling non-linear relationships by fitting smooth functions to the data. This paper describes how GAMS can extend the flexibility of models to describe seasonal patterns and long-term trends in the occurrence and intensity of daily rainfall using data from Mauritius from 1962 to 2001. Smoothed estimates from the models provide useful graphical descriptions of changing rainfall patterns over the last 40 years at this location. GAMS are particularly helpful when exploring non-linear relationships in the data. Care is needed to ensure the choice of smooth functions is appropriate for the data and modelling objectives. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
Understanding spatial distributions and how environmental conditions influence catch-per-unit-effort (CPUE) is important for increased fishing efficiency and sustainable fisheries management. This study investigated the relationship between CPUE, spatial factors, temperature, and depth using generalized additive models. Combinations of factors, and not one single factor, were frequently included in the best model. Parameters which best described CPUE varied by geographic region. The amount of variance, or deviance, explained by the best models ranged from a low of 29% (halibut, Charlotte region) to a high of 94% (sablefish, Charlotte region). Depth, latitude, and longitude influenced most species in several regions. On the broad geographic scale, depth was associated with CPUE for every species, except dogfish. Latitude and longitude influenced most species, except halibut (Areas 4 A/D), sablefish, and cod. Temperature was important for describing distributions of halibut in Alaska, arrowtooth flounder in British Columbia, dogfish, Alaska skate, and Aleutian skate. The species-habitat relationships revealed in this study can be used to create improved fishing and management strategies.
Resumo:
To effectively assess and mitigate risk of permafrost disturbance, disturbance-p rone areas can be predicted through the application of susceptibility models. In this study we developed regional susceptibility models for permafrost disturbances using a field disturbance inventory to test the transferability of the model to a broader region in the Canadian High Arctic. Resulting maps of susceptibility were then used to explore the effect of terrain variables on the occurrence of disturbances within this region. To account for a large range of landscape charac- teristics, the model was calibrated using two locations: Sabine Peninsula, Melville Island, NU, and Fosheim Pen- insula, Ellesmere Island, NU. Spatial patterns of disturbance were predicted with a generalized linear model (GLM) and generalized additive model (GAM), each calibrated using disturbed and randomized undisturbed lo- cations from both locations and GIS-derived terrain predictor variables including slope, potential incoming solar radiation, wetness index, topographic position index, elevation, and distance to water. Each model was validated for the Sabine and Fosheim Peninsulas using independent data sets while the transferability of the model to an independent site was assessed at Cape Bounty, Melville Island, NU. The regional GLM and GAM validated well for both calibration sites (Sabine and Fosheim) with the area under the receiver operating curves (AUROC) N 0.79. Both models were applied directly to Cape Bounty without calibration and validated equally with AUROC's of 0.76; however, each model predicted disturbed and undisturbed samples differently. Addition- ally, the sensitivity of the transferred model was assessed using data sets with different sample sizes. Results in- dicated that models based on larger sample sizes transferred more consistently and captured the variability within the terrain attributes in the respective study areas. Terrain attributes associated with the initiation of dis- turbances were similar regardless of the location. Disturbances commonly occurred on slopes between 4 and 15°, below Holocene marine limit, and in areas with low potential incoming solar radiation
Resumo:
To effectively assess and mitigate risk of permafrost disturbance, disturbance-p rone areas can be predicted through the application of susceptibility models. In this study we developed regional susceptibility models for permafrost disturbances using a field disturbance inventory to test the transferability of the model to a broader region in the Canadian High Arctic. Resulting maps of susceptibility were then used to explore the effect of terrain variables on the occurrence of disturbances within this region. To account for a large range of landscape charac- teristics, the model was calibrated using two locations: Sabine Peninsula, Melville Island, NU, and Fosheim Pen- insula, Ellesmere Island, NU. Spatial patterns of disturbance were predicted with a generalized linear model (GLM) and generalized additive model (GAM), each calibrated using disturbed and randomized undisturbed lo- cations from both locations and GIS-derived terrain predictor variables including slope, potential incoming solar radiation, wetness index, topographic position index, elevation, and distance to water. Each model was validated for the Sabine and Fosheim Peninsulas using independent data sets while the transferability of the model to an independent site was assessed at Cape Bounty, Melville Island, NU. The regional GLM and GAM validated well for both calibration sites (Sabine and Fosheim) with the area under the receiver operating curves (AUROC) N 0.79. Both models were applied directly to Cape Bounty without calibration and validated equally with AUROC's of 0.76; however, each model predicted disturbed and undisturbed samples differently. Addition- ally, the sensitivity of the transferred model was assessed using data sets with different sample sizes. Results in- dicated that models based on larger sample sizes transferred more consistently and captured the variability within the terrain attributes in the respective study areas. Terrain attributes associated with the initiation of dis- turbances were similar regardless of the location. Disturbances commonly occurred on slopes between 4 and 15°, below Holocene marine limit, and in areas with low potential incoming solar radiation
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
Aim To assess the geographical transferability of niche-based species distribution models fitted with two modelling techniques. Location Two distinct geographical study areas in Switzerland and Austria, in the subalpine and alpine belts. Methods Generalized linear and generalized additive models (GLM and GAM) with a binomial probability distribution and a logit link were fitted for 54 plant species, based on topoclimatic predictor variables. These models were then evaluated quantitatively and used for spatially explicit predictions within (internal evaluation and prediction) and between (external evaluation and prediction) the two regions. Comparisons of evaluations and spatial predictions between regions and models were conducted in order to test if species and methods meet the criteria of full transferability. By full transferability, we mean that: (1) the internal evaluation of models fitted in region A and B must be similar; (2) a model fitted in region A must at least retain a comparable external evaluation when projected into region B, and vice-versa; and (3) internal and external spatial predictions have to match within both regions. Results The measures of model fit are, on average, 24% higher for GAMs than for GLMs in both regions. However, the differences between internal and external evaluations (AUC coefficient) are also higher for GAMs than for GLMs (a difference of 30% for models fitted in Switzerland and 54% for models fitted in Austria). Transferability, as measured with the AUC evaluation, fails for 68% of the species in Switzerland and 55% in Austria for GLMs (respectively for 67% and 53% of the species for GAMs). For both GAMs and GLMs, the agreement between internal and external predictions is rather weak on average (Kulczynski's coefficient in the range 0.3-0.4), but varies widely among individual species. The dominant pattern is an asymmetrical transferability between the two study regions (a mean decrease of 20% for the AUC coefficient when the models are transferred from Switzerland and 13% when they are transferred from Austria). Main conclusions The large inter-specific variability observed among the 54 study species underlines the need to consider more than a few species to test properly the transferability of species distribution models. The pronounced asymmetry in transferability between the two study regions may be due to peculiarities of these regions, such as differences in the ranges of environmental predictors or the varied impact of land-use history, or to species-specific reasons like differential phenotypic plasticity, existence of ecotypes or varied dependence on biotic interactions that are not properly incorporated into niche-based models. The lower variation between internal and external evaluation of GLMs compared to GAMs further suggests that overfitting may reduce transferability. Overall, a limited geographical transferability calls for caution when projecting niche-based models for assessing the fate of species in future environments.
Resumo:
We had previously shown that regularization principles lead to approximation schemes, as Radial Basis Functions, which are equivalent to networks with one layer of hidden units, called Regularization Networks. In this paper we show that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models, Breiman's hinge functions and some forms of Projection Pursuit Regression. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In the final part of the paper, we also show a relation between activation functions of the Gaussian and sigmoidal type.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
We assessed the vulnerability of blanket peat to climate change in Great Britain using an ensemble of 8 bioclimatic envelope models. We used 4 published models that ranged from simple threshold models, based on total annual precipitation, to Generalised Linear Models (GLMs, based on mean annual temperature). In addition, 4 new models were developed which included measures of water deficit as threshold, classification tree, GLM and generalised additive models (GAM). Models that included measures of both hydrological conditions and maximum temperature provided a better fit to the mapped peat area than models based on hydrological variables alone. Under UKCIP02 projections for high (A1F1) and low (B1) greenhouse gas emission scenarios, 7 out of the 8 models showed a decline in the bioclimatic space associated with blanket peat. Eastern regions (Northumbria, North York Moors, Orkney) were shown to be more vulnerable than higher-altitude, western areas (Highlands, Western Isles and Argyle, Bute and The Trossachs). These results suggest a long-term decline in the distribution of actively growing blanket peat, especially under the high emissions scenario, although it is emphasised that existing peatlands may well persist for decades under a changing climate. Observational data from long-term monitoring and manipulation experiments in combination with process-based models are required to explore the nature and magnitude of climate change impacts on these vulnerable areas more fully.
Resumo:
In many data sets from clinical studies there are patients insusceptible to the occurrence of the event of interest. Survival models which ignore this fact are generally inadequate. The main goal of this paper is to describe an application of the generalized additive models for location, scale, and shape (GAMLSS) framework to the fitting of long-term survival models. in this work the number of competing causes of the event of interest follows the negative binomial distribution. In this way, some well known models found in the literature are characterized as particular cases of our proposal. The model is conveniently parameterized in terms of the cured fraction, which is then linked to covariates. We explore the use of the gamlss package in R as a powerful tool for inference in long-term survival models. The procedure is illustrated with a numerical example. (C) 2009 Elsevier Ireland Ltd. All rights reserved.
Resumo:
Within the regression framework, we show how different levels of nonlinearity influence the instantaneous firing rate prediction of single neurons. Nonlinearity can be achieved in several ways. In particular, we can enrich the predictor set with basis expansions of the input variables (enlarging the number of inputs) or train a simple but different model for each area of the data domain. Spline-based models are popular within the first category. Kernel smoothing methods fall into the second category. Whereas the first choice is useful for globally characterizing complex functions, the second is very handy for temporal data and is able to include inner-state subject variations. Also, interactions among stimuli are considered. We compare state-of-the-art firing rate prediction methods with some more sophisticated spline-based nonlinear methods: multivariate adaptive regression splines and sparse additive models. We also study the impact of kernel smoothing. Finally, we explore the combination of various local models in an incremental learning procedure. Our goal is to demonstrate that appropriate nonlinearity treatment can greatly improve the results. We test our hypothesis on both synthetic data and real neuronal recordings in cat primary visual cortex, giving a plausible explanation of the results from a biological perspective.