832 resultados para Poisson generalized linear mixed models
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
A computationally efficient sequential Monte Carlo algorithm is proposed for the sequential design of experiments for the collection of block data described by mixed effects models. The difficulty in applying a sequential Monte Carlo algorithm in such settings is the need to evaluate the observed data likelihood, which is typically intractable for all but linear Gaussian models. To overcome this difficulty, we propose to unbiasedly estimate the likelihood, and perform inference and make decisions based on an exact-approximate algorithm. Two estimates are proposed: using Quasi Monte Carlo methods and using the Laplace approximation with importance sampling. Both of these approaches can be computationally expensive, so we propose exploiting parallel computational architectures to ensure designs can be derived in a timely manner. We also extend our approach to allow for model uncertainty. This research is motivated by important pharmacological studies related to the treatment of critically ill patients.
Resumo:
The availability of population-specific normative data regarding disease severity measures is essential for patient assessment. The goals of the current study were to characterize the pattern of ankylosing spondylitis (AS) in Portuguese patients and to develop reference centile charts for BASDAI, BASFI, BASMI and mSASSS, the most widely used assessment tools in AS. AS cases were recruited from hospital outpatient clinics, with AS defined according to the modified New York criteria. Demographic and clinical data were recorded. All radiographs were evaluated by two independent experienced readers. Centile charts for BASDAI, BASFI, BASMI and mSASSS were constructed for both genders, using generalized linear models and regression models with duration of disease as independent variable. A total of 369 patients (62.3% male, mean ± (SD) age 45.4 ± 13.2 years, mean ± (SD) disease duration 11.4 ± 10.5 years, 70.7% B27-positive) were included. Family history of AS in a first-degree relative was reported in 17.6% of the cases. Regarding clinical disease pattern, at the time of assessment 42.3% had axial disease, 2.4% peripheral disease, 40.9% mixed disease and 7.1% isolated enthesopatic disease. Anterior uveitis (33.6%) was the most common extra-articular manifestation. The centile charts suggest that females reported greater disease activity and more functional impairment than males but had lower BASMI and mSASSS scores. Data collected through this study provided a demographic and clinical profile of patients with AS in Portugal. The development of centile charts constitutes a useful tool to assess the change of disease pattern over time and in response to therapeutic interventions.
Resumo:
This paper presents a maximum likelihood method for estimating growth parameters for an aquatic species that incorporates growth covariates, and takes into consideration multiple tag-recapture data. Individual variability in asymptotic length, age-at-tagging, and measurement error are also considered in the model structure. Using distribution theory, the log-likelihood function is derived under a generalised framework for the von Bertalanffy and Gompertz growth models. Due to the generality of the derivation, covariate effects can be included for both models with seasonality and tagging effects investigated. Method robustness is established via comparison with the Fabens, improved Fabens, James and a non-linear mixed-effects growth models, with the maximum likelihood method performing the best. The method is illustrated further with an application to blacklip abalone (Haliotis rubra) for which a strong growth-retarding tagging effect that persisted for several months was detected
Resumo:
Objective To discuss generalized estimating equations as an extension of generalized linear models by commenting on the paper of Ziegler and Vens "Generalized Estimating Equations. Notes on the Choice of the Working Correlation Matrix". Methods Inviting an international group of experts to comment on this paper. Results Several perspectives have been taken by the discussants. Econometricians have established parallels to the generalized method of moments (GMM). Statisticians discussed model assumptions and the aspect of missing data Applied statisticians; commented on practical aspects in data analysis. Conclusions In general, careful modeling correlation is encouraged when considering estimation efficiency and other implications, and a comparison of choosing instruments in GMM and generalized estimating equations, (GEE) would be worthwhile. Some theoretical drawbacks of GEE need to be further addressed and require careful analysis of data This particularly applies to the situation when data are missing at random.
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
This paper presents a maximum likelihood method for estimating growth parameters for an aquatic species that incorporates growth covariates, and takes into consideration multiple tag-recapture data. Individual variability in asymptotic length, age-at-tagging, and measurement error are also considered in the model structure. Using distribution theory, the log-likelihood function is derived under a generalised framework for the von Bertalanffy and Gompertz growth models. Due to the generality of the derivation, covariate effects can be included for both models with seasonality and tagging effects investigated. Method robustness is established via comparison with the Fabens, improved Fabens, James and a non-linear mixed-effects growth models, with the maximum likelihood method performing the best. The method is illustrated further with an application to blacklip abalone (Haliotis rubra) for which a strong growth-retarding tagging effect that persisted for several months was detected. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Periglacial processes act on cold, non-glacial regions where the landscape deveploment is mainly controlled by frost activity. Circa 25 percent of Earth's surface can be considered as periglacial. Geographical Information System combined with advanced statistical modeling methods, provides an efficient tool and new theoretical perspective for study of cold environments. The aim of this study was to: 1) model and predict the abundance of periglacial phenomena in subarctic environment with statistical modeling, 2) investigate the most import factors affecting the occurence of these phenomena with hierarchical partitioning, 3) compare two widely used statistical modeling methods: Generalized Linear Models and Generalized Additive Models, 4) study modeling resolution's effect on prediction and 5) study how spatially continous prediction can be obtained from point data. The observational data of this study consist of 369 points that were collected during the summers of 2009 and 2010 at the study area in Kilpisjärvi northern Lapland. The periglacial phenomena of interest were cryoturbations, slope processes, weathering, deflation, nivation and fluvial processes. The features were modeled using Generalized Linear Models (GLM) and Generalized Additive Models (GAM) based on Poisson-errors. The abundance of periglacial features were predicted based on these models to a spatial grid with a resolution of one hectare. The most important environmental factors were examined with hierarchical partitioning. The effect of modeling resolution was investigated with in a small independent study area with a spatial resolution of 0,01 hectare. The models explained 45-70 % of the occurence of periglacial phenomena. When spatial variables were added to the models the amount of explained deviance was considerably higher, which signalled a geographical trend structure. The ability of the models to predict periglacial phenomena were assessed with independent evaluation data. Spearman's correlation varied 0,258 - 0,754 between the observed and predicted values. Based on explained deviance, and the results of hierarchical partitioning, the most important environmental variables were mean altitude, vegetation and mean slope angle. The effect of modeling resolution was clear, too coarse resolution caused a loss of information, while finer resolution brought out more localized variation. The models ability to explain and predict periglacial phenomena in the study area were mostly good and moderate respectively. Differences between modeling methods were small, although the explained deviance was higher with GLM-models than GAMs. In turn, GAMs produced more realistic spatial predictions. The single most important environmental variable controlling the occurence of periglacial phenomena was mean altitude, which had strong correlations with many other explanatory variables. The ongoing global warming will have great impact especially in cold environments on high latitudes, and for this reason, an important research topic in the near future will be the response of periglacial environments to a warming climate.
Resumo:
Native species' response to the presence of invasive species is context specific. This response cannot be studied in isolation from the prevailing environmental stresses in invaded habitats such as seasonal drought. We investigated the combined effects of an invasive shrub Lantana camara L. (lantana), seasonal rainfall and species' microsite preferences on the growth and survival of 1,105 naturally established seedlings of native trees and shrubs in a seasonally dry tropical forest. Individuals were followed from April 2008 to February 2010, and growth and survival measured in relation to lantana density, seasonality of rainfall and species characteristics in a 50-ha permanent forest plot located in Mudumalai, southern India. We used a mixed effects modelling approach to examine seedling growth and generalized linear models to examine seedling survival. The overall relative height growth rate of established seedlings was found to be very low irrespective of the presence or absence of dense lantana. 22-month growth rate of dry forest species was lower under dense lantana while moist forest species were not affected by the presence of lantana thickets. 4-month growth rates of all species increased with increasing inter-census rainfall. Community results may be influenced by responses of the most abundant species, Catunaregam spinosa, whose growth rates were always lower under dense lantana. Overall seedling survival was high, increased with increasing rainfall and was higher for species with dry forest preference than for species with moist forest preference. The high survival rates of naturally established seedlings combined with their basal sprouting ability in this forest could enable the persistence of woody species in the face of invasive species.
Resumo:
Esta tese inclui dois artigos que tiveram por objetivo investigar a relação de estresse no ambiente de trabalho com a prevalência de transtornos mentais comuns (TMC) e a relação de ambos com os níveis de prática de atividade física em militares do Exército Brasileiro. No primeiro artigo, a variável dependente foi TMC e a primeira variável independente foi o estresse no ambiente de trabalho, avaliado sob o modelo esforço-recompensa em desequilíbrio (effort-reward imbalance: ERI). TMC foram avaliados por meio do General Health Questionnaire (GHQ-12). Foram estimadas razões de prevalência (RP) por regressão de Poisson para imprimir robustez aos intervalos de confiança (95%). A prevalência de TMC foi de 33,2% (IC95%:29,1;37,3). O estudo mostrou, após ajuste por idade, educação, renda, estilo de vida, autopercepção de saúde, agravos à saúde autorreferidos e características ocupacionais, que estresse no ambiente de trabalho estava forte e independentemente associado a TMC, exibindo razões de prevalências (RP) que variaram entre os níveis de estresse, oscilando de 1,60 a 2,01. O posto de tenente estava associado a TMC, mesmo após ajuste pelas covariáveis (RP = 2,06; IC95% 1,2 4,1). Os resultados indicaram que excesso de comprometimento é um componente importante do estresse no trabalho. Estes achados foram consistentes com a literatura e contribuem com o conhecimento sobre o estado de saúde mental dos militares das Forças Armadas no Brasil, destacando que o estresse no ambiente de trabalho e que o desempenho das funções ocupacionais, do posto de Tenente, podem significar risco maior para TMC nesse tipo de população. O segundo artigo teve por objetivo investigar a associação de estresse no ambiente de trabalho e TMC com a prática de atividade física habitual entre militares das Forças Armadas. A atividade física (variável dependente) foi estimada por meio do Questionário de Baecke, um dos instrumentos mais utilizados em estudos epidemiológicos sobre atividade física. Estresse no ambiente de trabalho, TMC e posto foram as variáveis independentes, avaliadas conforme descrição mencionada acima. Buscou-se avaliar a associação destas variáveis e com a prática de atividade física no pessoal militar. Para tanto, utilizou-se o método de regressão linear múltipla, via modelos lineares generalizados. Após controlar por características socioeconomicas e demográficas, estresse no ambiente de trabalho, caracterizado por "altos esforços e baixa recompensas", permaneceu associado a mais atividade física ocupacional (b = 0,224 IC95% 0,098; 0,351) e a menos atividade física no lazer (b = -0,198; IC95% -0,384; -0,011). TMC permaneceram associados a menores níveis de atividade física nos esportes/exercícios no lazer (b = -0,184; IC95% -0,321; -0,046). Posto permaneceu associado a maiores níveis de atividade física ocupacional (b = 0,324 IC95% 0,167; 0,481). Até onde se sabe, este foi o primeiro estudo a avaliar a relação de aspectos psicossociais e ocupacionais envolvidos na prática de atividade física em militares no Brasil e no exterior. Os resultados sugerem que o ambiente de trabalho e a saúde mental estão associados à prática de atividade física de militares, que se relaciona com a condição de aptidão física.
Resumo:
The study of ecological differences among coexisting microparasites has been largely neglected, but it addresses important and unusual issues because there is no clear distinction in such cases between conventional (resource) and apparent competition. Here patterns in the population dynamics are examined for four species of Bartonella (bacterial parasites) coexisting in two wild rodent hosts, bank voles (Clethrionomys glareolus) and wood mice (Apodemus sylvaticus). Using generalized linear modeling and mixed effects models, we examine, for these four species, seasonal patterns and dependencies on host density (both direct and delayed) and, having accounted for these, any differences in prevalence between the two hosts. Whereas previous studies had failed to uncover species differences, here all four were different. Two, B. doshiae and B. taylorii, were more prevalent in wood mice, and one, B. birtlesii, was more prevalent in bank voles. B. birtlesii, B. grahamii, and B. taylorii peaked in prevalence in the fall, whereas B. doshiae peaked in spring. For B. birtlesii in bank voles, density dependence was direct, but for B. taylorii in wood mice density dependence was delayed. B. birtlesii prevalence in wood mice was related to bank vole density. The implications of these differences for species coexistence are discussed.
Resumo:
As técnicas estatísticas são fundamentais em ciência e a análise de regressão linear é, quiçá, uma das metodologias mais usadas. É bem conhecido da literatura que, sob determinadas condições, a regressão linear é uma ferramenta estatística poderosíssima. Infelizmente, na prática, algumas dessas condições raramente são satisfeitas e os modelos de regressão tornam-se mal-postos, inviabilizando, assim, a aplicação dos tradicionais métodos de estimação. Este trabalho apresenta algumas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, em particular na estimação de modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. A investigação é desenvolvida em três vertentes, nomeadamente na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, na estimação do parâmetro ridge em regressão ridge e, por último, em novos desenvolvimentos na estimação com máxima entropia. Na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, o trabalho desenvolvido evidencia um melhor desempenho dos estimadores de máxima entropia em relação ao estimador de máxima verosimilhança. Este bom desempenho é notório em modelos com poucas observações por estado e em modelos com um grande número de estados, os quais são comummente afetados por colinearidade. Espera-se que a utilização de estimadores de máxima entropia contribua para o tão desejado aumento de trabalho empírico com estas fronteiras de produção. Em regressão ridge o maior desafio é a estimação do parâmetro ridge. Embora existam inúmeros procedimentos disponíveis na literatura, a verdade é que não existe nenhum que supere todos os outros. Neste trabalho é proposto um novo estimador do parâmetro ridge, que combina a análise do traço ridge e a estimação com máxima entropia. Os resultados obtidos nos estudos de simulação sugerem que este novo estimador é um dos melhores procedimentos existentes na literatura para a estimação do parâmetro ridge. O estimador de máxima entropia de Leuven é baseado no método dos mínimos quadrados, na entropia de Shannon e em conceitos da eletrodinâmica quântica. Este estimador suplanta a principal crítica apontada ao estimador de máxima entropia generalizada, uma vez que prescinde dos suportes para os parâmetros e erros do modelo de regressão. Neste trabalho são apresentadas novas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, tendo por base o estimador de máxima entropia de Leuven, a teoria da informação e a regressão robusta. Os estimadores desenvolvidos revelam um bom desempenho em modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. Por último, são apresentados alguns códigos computacionais para estimação com máxima entropia, contribuindo, deste modo, para um aumento dos escassos recursos computacionais atualmente disponíveis.