904 resultados para Generalized linear mixed models
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.
Resumo:
For decades, global climate change has directly and indirectly affected the structure and function of ecosystems. Abrupt changes in biodiversity have been observed in response to linear or sudden modifications to the environment. These abrupt shifts can cause long-term reorganizations within ecosystems, with communities exhibiting new functional responses to environmental factors. Over the last 3 decades, the Gironde estuary in southwest France has experienced 2 abrupt shifts in both the physical and chemical environments and the pelagic community. Rather than describing these shifts and their origins, we focused on the 3 inter-shift periods, describing the structure of the fish community and its relationship with the environment during these periods. We described fish biodiversity using a limited set of descriptors, taking into account both species composition and relative species abundances. Inter-shift ecosystem states were defined based on the relationship between this description and the hydro-physico-chemical variables and climatic indices defining the main features of the environment. This relationship was described using generalized linear mixed models on the entire time series and for each inter-shift period. Our results indicate that (1) the fish community structure has been significantly modified, (2) environmental drivers influencing fish diversity have changed during these 3 periods, and (3) the fish-environment relationships have been modified over time. From this, we conclude a regime shift has occurred in the Gironde estuary. We also highlight that anthropogenic influences have increased, which re-emphasizes the importance of local management in maintaining fish diversity and associated goods and services within the context of climate change.
Resumo:
Endemic zoonotic diseases remain a serious but poorly recognised problem in affected communities in developing countries. Despite the overall burden of zoonoses on human and animal health, information about their impacts in endemic settings is lacking and most of these diseases are continuously being neglected. The non-specific clinical presentation of these diseases has been identified as a major challenge in their identification (even with good laboratory diagnosis), and control. The signs and symptoms in animals and humans respectively, are easily confused with other non-zoonotic diseases, leading to widespread misdiagnosis in areas where diagnostic capacity is limited. The communities that are mostly affected by these diseases live in close proximity with their animals which they depend on for livelihood, which further complicates the understanding of the epidemiology of zoonoses. This thesis reviewed the pattern of reporting of zoonotic pathogens that cause febrile illness in malaria endemic countries, and evaluates the recognition of animal associations among other risk factors in the transmission and management of zoonoses. The findings of the review chapter were further investigated through a laboratory study of risk factors for bovine leptospirosis, and exposure patterns of livestock coxiellosis in the subsequent chapters. A review was undertaken on 840 articles that were part of a bigger review of zoonotic pathogens that cause human fever. The review process involves three main steps: filtering and reference classification, identification of abstracts that describe risk factors, and data extraction and summary analysis of data. Abstracts of the 840 references were transferred into a Microsoft excel spread sheet, where several subsets of abstracts were generated using excel filters and text searches to classify the content of each abstract. Data was then extracted and summarised to describe geographical patterns of the pathogens reported, and determine the frequency animal related risk factors were considered among studies that investigated risk factors for zoonotic pathogen transmission. Subsequently, a seroprevalence study of bovine leptospirosis in northern Tanzania was undertaken in the second chapter of this thesis. The study involved screening of serum samples, which were obtained from an abattoir survey and cross-sectional study (Bacterial Zoonoses Project), for antibodies against Leptospira serovar Hardjo. The data were analysed using generalised linear mixed models (GLMMs), to identify risk factors for cattle infection. The final chapter was the analysis of Q fever data, which were also obtained from the Bacterial Zoonoses Project, to determine exposure patterns across livestock species using generalized linear mixed models (GLMMs). Leptospira spp. (10.8%, 90/840) and Rickettsia spp. (10.7%, 86/840) were identified as the most frequently reported zoonotic pathogens that cause febrile illness, while Rabies virus (0.4%, 3/840) and Francisella spp. (0.1%, 1/840) were least reported, across malaria endemic countries. The majority of the pathogens were reported in Asia, and the frequency of reporting seems to be higher in areas where outbreaks are mostly reported. It was also observed that animal related risk factors are not often considered among other risk factors for zoonotic pathogens that cause human fever in malaria endemic countries. The seroprevalence study indicated that Leptospira serovar Hardjo is widespread in cattle population in northern Tanzania, and animal husbandry systems and age are the two most important risk factors that influence seroprevalence. Cattle in the pastoral systems and adult cattle were significantly more likely to be seropositive compared to non-pastoral and young animals respectively, while there was no significant effect of cattle breed or sex. Exposure patterns of Coxiella burnetii appear different for each livestock species. While most risk factors were identified for goats (such as animal husbandry systems, age and sex) and sheep (animal husbandry systems and sex), there were none for cattle. In addition, there was no evidence of a significant influence of mixed livestock-keeping on animal coxiellosis. Zoonotic agents that cause human fever are common in developing countries. The role of animals in the transmission of zoonotic pathogens that cause febrile illness is not fully recognised and appreciated. Since Leptospira spp. and C. burnetii are among the most frequently reported pathogens that cause human fever across malaria endemic countries, and are also prevalent in livestock population, control and preventive measures that recognise animals as source of infection would be very important especially in livestock-keeping communities where people live in close proximity with their animals.
Resumo:
This paper introduces local distance-based generalized linear models. These models extend (weighted) distance-based linear models firstly with the generalized linear model concept, then by localizing. Distances between individuals are the only predictor information needed to fit these models. Therefore they are applicable to mixed (qualitative and quantitative) explanatory variables or when the regressor is of functional type. Models can be fitted and analysed with the R package dbstats, which implements several distancebased prediction methods.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
The estimation of data transformation is very useful to yield response variables satisfying closely a normal linear model, Generalized linear models enable the fitting of models to a wide range of data types. These models are based on exponential dispersion models. We propose a new class of transformed generalized linear models to extend the Box and Cox models and the generalized linear models. We use the generalized linear model framework to fit these models and discuss maximum likelihood estimation and inference. We give a simple formula to estimate the parameter that index the transformation of the response variable for a subclass of models. We also give a simple formula to estimate the rth moment of the original dependent variable. We explore the possibility of using these models to time series data to extend the generalized autoregressive moving average models discussed by Benjamin er al. [Generalized autoregressive moving average models. J. Amer. Statist. Assoc. 98, 214-223]. The usefulness of these models is illustrated in a Simulation study and in applications to three real data sets. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Detecting both the majors genes that control the phenotypic mean and those controlling phenotypic variance has been raised in quantitative trait loci analysis. In order to mapping both kinds of genes, we applied the idea of the classic Haley-Knott regression to double generalized linear models. We performed both kinds of quantitative trait loci detection for a Red Jungle Fowl x White Leghorn F2 intercross using double generalized linear models. It is shown that double generalized linear model is a proper and efficient approach for localizing variance-controlling genes. We compared two models with or without fixed sex effect and prefer including the sex effect in order to reduce the residual variances. We found that different genes might take effect on the body weight at different time as the chicken grows.
Resumo:
Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Resumo:
A rigorous asymptotic theory for Wald residuals in generalized linear models is not yet available. The authors provide matrix formulae of order O(n(-1)), where n is the sample size, for the first two moments of these residuals. The formulae can be applied to many regression models widely used in practice. The authors suggest adjusted Wald residuals to these models with approximately zero mean and unit variance. The expressions were used to analyze a real dataset. Some simulation results indicate that the adjusted Wald residuals are better approximated by the standard normal distribution than the Wald residuals.
Resumo:
Marginal generalized linear models can be used for clustered and longitudinal data by fitting a model as if the data were independent and using an empirical estimator of parameter standard errors. We extend this approach to data where the number of observations correlated with a given one grows with sample size and show that parameter estimates are consistent and asymptotically Normal with a slower convergence rate than for independent data, and that an information sandwich variance estimator is consistent. We present two problems that motivated this work, the modelling of patterns of HIV genetic variation and the behavior of clustered data estimators when clusters are large.
Resumo:
This paper reports a comparison of three modeling strategies for the analysis of hospital mortality in a sample of general medicine inpatients in a Department of Veterans Affairs medical center. Logistic regression, a Markov chain model, and longitudinal logistic regression were evaluated on predictive performance as measured by the c-index and on accuracy of expected numbers of deaths compared to observed. The logistic regression used patient information collected at admission; the Markov model was comprised of two absorbing states for discharge and death and three transient states reflecting increasing severity of illness as measured by laboratory data collected during the hospital stay; longitudinal regression employed Generalized Estimating Equations (GEE) to model covariance structure for the repeated binary outcome. Results showed that the logistic regression predicted hospital mortality as well as the alternative methods but was limited in scope of application. The Markov chain provides insights into how day to day changes of illness severity lead to discharge or death. The longitudinal logistic regression showed that increasing illness trajectory is associated with hospital mortality. The conclusion is reached that for standard applications in modeling hospital mortality, logistic regression is adequate, but for new challenges facing health services research today, alternative methods are equally predictive, practical, and can provide new insights. ^
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^
Resumo:
To effectively assess and mitigate risk of permafrost disturbance, disturbance-p rone areas can be predicted through the application of susceptibility models. In this study we developed regional susceptibility models for permafrost disturbances using a field disturbance inventory to test the transferability of the model to a broader region in the Canadian High Arctic. Resulting maps of susceptibility were then used to explore the effect of terrain variables on the occurrence of disturbances within this region. To account for a large range of landscape charac- teristics, the model was calibrated using two locations: Sabine Peninsula, Melville Island, NU, and Fosheim Pen- insula, Ellesmere Island, NU. Spatial patterns of disturbance were predicted with a generalized linear model (GLM) and generalized additive model (GAM), each calibrated using disturbed and randomized undisturbed lo- cations from both locations and GIS-derived terrain predictor variables including slope, potential incoming solar radiation, wetness index, topographic position index, elevation, and distance to water. Each model was validated for the Sabine and Fosheim Peninsulas using independent data sets while the transferability of the model to an independent site was assessed at Cape Bounty, Melville Island, NU. The regional GLM and GAM validated well for both calibration sites (Sabine and Fosheim) with the area under the receiver operating curves (AUROC) N 0.79. Both models were applied directly to Cape Bounty without calibration and validated equally with AUROC's of 0.76; however, each model predicted disturbed and undisturbed samples differently. Addition- ally, the sensitivity of the transferred model was assessed using data sets with different sample sizes. Results in- dicated that models based on larger sample sizes transferred more consistently and captured the variability within the terrain attributes in the respective study areas. Terrain attributes associated with the initiation of dis- turbances were similar regardless of the location. Disturbances commonly occurred on slopes between 4 and 15°, below Holocene marine limit, and in areas with low potential incoming solar radiation