908 resultados para generalized additive models
Resumo:
Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^
Resumo:
Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.
Resumo:
Магдалина Василева Тодорова - В статията е описан подход за верификация на процедурни програми чрез изграждане на техни модели, дефинирани чрез обобщени мрежи. Подходът интегрира концепцията “design by contract” с подходи за верификация от тип доказателство на теореми и проверка на съгласуваност на модели. За целта разделно се верифицират функциите, които изграждат програмата относно спецификации според предназначението им. Изгражда се обобщен мрежов модел, специфициащ връзките между функциите във вид на коректни редици от извиквания. За главната функция на програмата се построява обобщен мрежов модел и се проверява дали той съответства на мрежовия модел на връзките между функциите на програмата. Всяка от функциите на програмата, която използва други функции се верифицира и относно спецификацията, зададена чрез мрежовия модел на връзките между функциите на програмата.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Resumo:
Motivated by the analysis of the Australian Grain Insect Resistance Database (AGIRD), we develop a Bayesian hurdle modelling approach to assess trends in strong resistance of stored grain insects to phosphine over time. The binary response variable from AGIRD indicating presence or absence of strong resistance is characterized by a majority of absence observations and the hurdle model is a two step approach that is useful when analyzing such a binary response dataset. The proposed hurdle model utilizes Bayesian classification trees to firstly identify covariates and covariate levels pertaining to possible presence or absence of strong resistance. Secondly, generalized additive models (GAMs) with spike and slab priors for variable selection are fitted to the subset of the dataset identified from the Bayesian classification tree indicating possibility of presence of strong resistance. From the GAM we assess trends, biosecurity issues and site specific variables influencing the presence of strong resistance using a variable selection approach. The proposed Bayesian hurdle model is compared to its frequentist counterpart, and also to a naive Bayesian approach which fits a GAM to the entire dataset. The Bayesian hurdle model has the benefit of providing a set of good trees for use in the first step and appears to provide enough flexibility to represent the influence of variables on strong resistance compared to the frequentist model, but also captures the subtle changes in the trend that are missed by the frequentist and naive Bayesian models.
Resumo:
Considerable discussion has taken place during the last decade regarding the role of economic growth in determining environmental quality. Using data from 30 OECD countries for the period 1960-2003 and the nonparametric method of generalized additive models, which enables us to use flexible functional forms, this paper examines the environmental Kuznets curve hypothesis for carbon dioxide (CO2). We find that the reduction of coal share in energy use has a significant effect on CO2. Our results imply that economic growth is not sufficient to decrease CO2 emissions.
Resumo:
This study decomposed the determinants of environmental quality into scale, technique, and composition effects. We applied a semiparametric method of generalized additive models, which enabled us to use flexible functional forms and include several independent variables in the model. The differences in the technique effect were found to play a crucial role in reducing pollution. We found that the technique effect was sufficient to reduce sulfur dioxide emissions. On the other hand, its effect was not enough to reduce carbon dioxide (CO2) emissions and energy use, except for the case of CO2 emissions in high-income countries.
Resumo:
Annual discard ogives were estimated using generalized additive models (GAMs) for four demersal fish species: whiting, haddock, megrim, and plaice. The analysis was based on data collected on board commercial vessels and at Irish fishing ports from 1995 to 2003. For all species the most important factors influencing annual discard ogives were fleet (combination of gear, fishing ground, and targeted species), mean length of the catch and year, and, for megrim, also minimum landing size. The length at which fish are discarded has increased since 2000 for haddock, whiting, and plaice. In contrast, discarded length has decreased for megrim, accompanying a reduction in minimum landing size in 2000.
Resumo:
Motivated by the analysis of the Australian Grain Insect Resistance Database (AGIRD), we develop a Bayesian hurdle modelling approach to assess trends in strong resistance of stored grain insects to phosphine over time. The binary response variable from AGIRD indicating presence or absence of strong resistance is characterized by a majority of absence observations and the hurdle model is a two step approach that is useful when analyzing such a binary response dataset. The proposed hurdle model utilizes Bayesian classification trees to firstly identify covariates and covariate levels pertaining to possible presence or absence of strong resistance. Secondly, generalized additive models (GAMs) with spike and slab priors for variable selection are fitted to the subset of the dataset identified from the Bayesian classification tree indicating possibility of presence of strong resistance. From the GAM we assess trends, biosecurity issues and site specific variables influencing the presence of strong resistance using a variable selection approach. The proposed Bayesian hurdle model is compared to its frequentist counterpart, and also to a naive Bayesian approach which fits a GAM to the entire dataset. The Bayesian hurdle model has the benefit of providing a set of good trees for use in the first step and appears to provide enough flexibility to represent the influence of variables on strong resistance compared to the frequentist model, but also captures the subtle changes in the trend that are missed by the frequentist and naive Bayesian models. © 2014 Springer Science+Business Media New York.
Resumo:
Periglacial processes act on cold, non-glacial regions where the landscape deveploment is mainly controlled by frost activity. Circa 25 percent of Earth's surface can be considered as periglacial. Geographical Information System combined with advanced statistical modeling methods, provides an efficient tool and new theoretical perspective for study of cold environments. The aim of this study was to: 1) model and predict the abundance of periglacial phenomena in subarctic environment with statistical modeling, 2) investigate the most import factors affecting the occurence of these phenomena with hierarchical partitioning, 3) compare two widely used statistical modeling methods: Generalized Linear Models and Generalized Additive Models, 4) study modeling resolution's effect on prediction and 5) study how spatially continous prediction can be obtained from point data. The observational data of this study consist of 369 points that were collected during the summers of 2009 and 2010 at the study area in Kilpisjärvi northern Lapland. The periglacial phenomena of interest were cryoturbations, slope processes, weathering, deflation, nivation and fluvial processes. The features were modeled using Generalized Linear Models (GLM) and Generalized Additive Models (GAM) based on Poisson-errors. The abundance of periglacial features were predicted based on these models to a spatial grid with a resolution of one hectare. The most important environmental factors were examined with hierarchical partitioning. The effect of modeling resolution was investigated with in a small independent study area with a spatial resolution of 0,01 hectare. The models explained 45-70 % of the occurence of periglacial phenomena. When spatial variables were added to the models the amount of explained deviance was considerably higher, which signalled a geographical trend structure. The ability of the models to predict periglacial phenomena were assessed with independent evaluation data. Spearman's correlation varied 0,258 - 0,754 between the observed and predicted values. Based on explained deviance, and the results of hierarchical partitioning, the most important environmental variables were mean altitude, vegetation and mean slope angle. The effect of modeling resolution was clear, too coarse resolution caused a loss of information, while finer resolution brought out more localized variation. The models ability to explain and predict periglacial phenomena in the study area were mostly good and moderate respectively. Differences between modeling methods were small, although the explained deviance was higher with GLM-models than GAMs. In turn, GAMs produced more realistic spatial predictions. The single most important environmental variable controlling the occurence of periglacial phenomena was mean altitude, which had strong correlations with many other explanatory variables. The ongoing global warming will have great impact especially in cold environments on high latitudes, and for this reason, an important research topic in the near future will be the response of periglacial environments to a warming climate.
Resumo:
The foraging ecology of bottlenose dolphins Tursiops truncatus in the Northwest Florida Panhandle and estuaries in northern Georgia was determined using diet analysis and behavioral surveys. Stomach content analysis was completed on bottlenose dolphins(N = 25) that stranded in the Northwest Florida Panhandle from November 2006 to March 2009. The most abundant prey species were spot Leiostomus xanthurus (20.4%), squid (10.9%), pinfish Lagodon rhombiodes (10.3%), and Atlantic croaker Micropogonias undulatus (8.5%). Dolphins that stranded during months with a red tide Karenia brevis bloom consumed more pinfish, and spot; whereas dolphins that stranded in non-bloom months consumed more squid, Atlantic croaker, and silver perch Bairdiella chrysoura. Differences in diet were also identified for dolphins that stranded inside bays/sound and dolphin that stranded outside of bays along the coast, and male and female dolphins. Surveys were conducted from south of the Savannah River to north of Ossabaw Sound in Georgia where foraging behaviors were classified. Multivariate Generalized Additive Models were used to test correlations of behaviors to dolphin group size, depth, salinity, temperature, creek width, and tide. Sightings with headstands (p = 0.009), hard stops (p = 0.019), chasing (p = 0.004), mudbank whacking (p < 0.001), herding/circling (p = 0.024), and strand feeding (p = 0.006) were correlated with shallow water or small creeks. Sightings with kerplunking (p = 0.031), mudbank whacking (p = 0.001), strand feeding (p = 0.003), and herding/circling (p = 0.026) were significantly correlated with low tide. The results of the Savannah, Georgia study were the first to characterize foraging behaviors in this area and demonstrate how bottlenose dolphins utilize the salt marsh estuary in terms of foraging. Studies like these are important to determine how dolphins forage efficiently and to provide background information on diet and foraging behavior for use in monitoring future impacts to dolphins in the Northwest Florida Panhandle and near Savannah, Georgia.
Resumo:
Os efeitos das temperaturas elevadas na saúde humana representam um problema de grande magnitude na saúde pública. A temperatura atmosférica e a poluição do ar são fatores de risco para as doenças crônicas não transmissíveis, em particular as doenças isquêmicas do coração. O estudo teve como objetivo analisar a associação entre a temperatura atmosférica e internações hospitalares por doenças cardíacas isquêmicas no município do Rio de Janeiro entre os anos de 2009 e 2013. Utilizaram-se modelos de séries temporais, via modelos aditivos generalizados, em regressão de Poisson, para testar a hipótese de associação. Como variáveis de controle de confusão foram utilizadas as concentrações de poluentes atmosféricos (ozônio e material particulado) e umidade relativa o ar; utilizou-se método de defasagem simples e distribuída para avaliar o impacto da variação de 1oC nas internações hospitalares diárias. No modelo de defasagem simples foram encontradas associações estatisticamente significativas para as internações por DIC no dia concorrente a exposição ao calor, tanto para a temperatura média quanto para a máxima. No modelo de defasagem distribuída polinomial, essa associação foi observada com 1 e 2 dias de defasagem e no efeito acumulado tanto para a temperatura média quanto para a máxima. Ao estratificarmos por faixa etária, as associações para as internações por DIC e exposição ao calor não foram estatisticamente significativas no modelo de defasagem simples para as temperaturas média e máxima. Em contrapartida, no modelo de defasagem distribuída polinomial, a correlação entre internações por DIC e exposição ao calor foi observada na faixa de 30 a 60 anos no efeito acumulado para a temperatura média; e com defasagem de 1 e 2 dias para 60 anos ou mais de idade para a temperatura média. Estes resultados sugerem associação positiva entre as internações hospitalares por doença cardíaca isquêmica e temperatura na cidade do Rio de Janeiro. Os resultados do presente estudo fornecem informações para o planejamento de investimentos de áreas urbanas climatizadas e para a preparação dos hospitais para receber emergências relacionadas aos efeitos de calor que é uma das consequências mais importantes das mudanças climáticas.
Resumo:
研究植被、物种分布与环境的关系一直是生态学中的重点。长期以来,在全球变化与陆地生态系统的研究中,主要研究重点是对大尺度植被分布的模拟和预测,并建立了大量的气候-植被分布关系模型。而对于物种潜在分布的模拟和预测,国内外相关的研究较少。近年来,随着统计技术和地理信息系统的发展,用于预测物种分布的统计模型技术得到了迅速的发展。统计模型技术已被广泛应用于生物地理分布、植物群落、生物多样性、气候变化影响评估等方面。 本论文基于当前在物种分布研究中应用广泛的广义线性模型、广义加法模型及分类回归树3种统计模型技术,对我国常见树种的地理分布进行模拟分析,并比较不同模型模拟精度的优劣,将模拟精度较高的模型应用于预测未来气候情景下我国几种主要树种的未来潜在地理分布。 基于建立的广义线性模型(GLM)、二次项逐步回归广义线性模型(SGLM)、广义加法模型(GAM)和分类回归树(CART)4个模型对我国20种常见树种地理分布进行模拟,结果表明,4个模型均有较高的模拟精度。GAM的模拟精度最高;添加二次项并进行逐步回归有效的提高了GLM的模拟精度;CART是一种基于规则的模型技术,模拟结果比GLM稍好,比GAM略差。 对不同树种的模拟分析表明,4个模型对于主要分布在暖温带落叶阔叶林区域的油松、辽东栎分布的模拟结果较差;GLM对分布在温带针阔混交林中红松、蒙古栎、胡桃楸和糠椴的模拟结果不太理想;4个模型对分布在中国亚热带常绿阔叶林区域的树种均表现出较高的模拟精度;对广布种也表现出很高的模拟精度。 结合地理信息系统,以地图形式将青冈、油松的模拟结果表示出来。结果表明:地理信息系统直观的反映出了模型模拟结果差异。4个模型均能很好模拟青冈的分布,且模拟结果接近;而对油松分布模拟结果4个模型均不甚理想,以GLM最差。这些结果与模型模拟评估结果相吻合。 在未来气候变化情景下,基于4个模型模拟结果优劣,以我国三种主要造林树种马尾松、油松、红松和两种常见树种青冈、蒙古栎为研究对象,分析其未来变化趋势。结果表明,未来气候变化情景下,对于马尾松而言,4个模型均预测马尾松在基本保持原有分布的基础上,其未来潜在分布区域均有所扩大,且有向西和向北扩展的趋势;对于油松而言,基于GLM、SGLM和GAM3个模型,油松的未来潜在分布除有北移的趋势外,其分布区还将向东北和西南两个方向扩展;对于红松而言,基于SGLM、GAM和CART3个模型的预测结果较为接近,即红松的未来潜在分布区域将有所减少;对蒙古栎而言,4个模型预测蒙古栎未来分布均将向西扩展;对青冈而言,4个模型预测青冈能基本保持其原有分布区,并向西和向北扩展,其中CART预测结果还表明,青冈在广东南部及广西南部的分布区域将消失。