15 resultados para Generalized linear mixed model
em Dalarna University College Electronic Archive
Resumo:
This paper presents a two-step pseudo likelihood estimation technique for generalized linear mixed models with the random effects being correlated between groups. The core idea is to deal with the intractable integrals in the likelihood function by multivariate Taylor's approximation. The accuracy of the estimation technique is assessed in a Monte-Carlo study. An application of it with a binary response variable is presented using a real data set on credit defaults from two Swedish banks. Thanks to the use of two-step estimation technique, the proposed algorithm outperforms conventional pseudo likelihood algorithms in terms of computational time.
Resumo:
This paper presents the techniques of likelihood prediction for the generalized linear mixed models. Methods of likelihood prediction is explained through a series of examples; from a classical one to more complicated ones. The examples show, in simple cases, that the likelihood prediction (LP) coincides with already known best frequentist practice such as the best linear unbiased predictor. The paper outlines a way to deal with the covariate uncertainty while producing predictive inference. Using a Poisson error-in-variable generalized linear model, it has been shown that in complicated cases LP produces better results than already know methods.
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
We present the hglm package for fitting hierarchical generalized linear models. It can be used for linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. Fixed effects can also be fitted in the dispersion part of the model.
Resumo:
Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.
Resumo:
We analyze a real data set pertaining to reindeer fecal pellet-group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi-Poisson hierarchical generalized linear model (HGLM), zero-inflated Poisson (ZIP), and hurdle models. The quasi-Poisson HGLM allows for both under- and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi-Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi-Poisson HGLM with spatial random effects.
Resumo:
Maintenance of transport infrastructure assets is widely advocated as the key in minimizing current and future costs of the transportation network. While effective maintenance decisions are often a result of engineering skills and practical knowledge, efficient decisions must also account for the net result over an asset's life-cycle. One essential aspect in the long term perspective of transport infrastructure maintenance is to proactively estimate maintenance needs. In dealing with immediate maintenance actions, support tools that can prioritize potential maintenance candidates are important to obtain an efficient maintenance strategy. This dissertation consists of five individual research papers presenting a microdata analysis approach to transport infrastructure maintenance. Microdata analysis is a multidisciplinary field in which large quantities of data is collected, analyzed, and interpreted to improve decision-making. Increased access to transport infrastructure data enables a deeper understanding of causal effects and a possibility to make predictions of future outcomes. The microdata analysis approach covers the complete process from data collection to actual decisions and is therefore well suited for the task of improving efficiency in transport infrastructure maintenance. Statistical modeling was the selected analysis method in this dissertation and provided solutions to the different problems presented in each of the five papers. In Paper I, a time-to-event model was used to estimate remaining road pavement lifetimes in Sweden. In Paper II, an extension of the model in Paper I assessed the impact of latent variables on road lifetimes; displaying the sections in a road network that are weaker due to e.g. subsoil conditions or undetected heavy traffic. The study in Paper III incorporated a probabilistic parametric distribution as a representation of road lifetimes into an equation for the marginal cost of road wear. Differentiated road wear marginal costs for heavy and light vehicles are an important information basis for decisions regarding vehicle miles traveled (VMT) taxation policies. In Paper IV, a distribution based clustering method was used to distinguish between road segments that are deteriorating and road segments that have a stationary road condition. Within railway networks, temporary speed restrictions are often imposed because of maintenance and must be addressed in order to keep punctuality. The study in Paper V evaluated the empirical effect on running time of speed restrictions on a Norwegian railway line using a generalized linear mixed model.
Resumo:
Detecting both the majors genes that control the phenotypic mean and those controlling phenotypic variance has been raised in quantitative trait loci analysis. In order to mapping both kinds of genes, we applied the idea of the classic Haley-Knott regression to double generalized linear models. We performed both kinds of quantitative trait loci detection for a Red Jungle Fowl x White Leghorn F2 intercross using double generalized linear models. It is shown that double generalized linear model is a proper and efficient approach for localizing variance-controlling genes. We compared two models with or without fixed sex effect and prefer including the sex effect in order to reduce the residual variances. We found that different genes might take effect on the body weight at different time as the chicken grows.
Resumo:
Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
OBJECTIVE: This study aimed to assess women´s acceptability of diagnosis and treatment of incomplete abortion with misoprostol by midwives, compared with physicians. METHODS: This was an analysis of secondary outcomes from a multi-centre randomized controlled equivalence trial at district level in Uganda. Women with first trimester incomplete abortion were randomly allocated to clinical assessment and treatment with misoprostol by a physician or a midwife. The randomisation (1:1) was done in blocks of 12 and stratified for health care facility. Acceptability was measured in expectations and satisfaction at a follow up visit 14-28 days following treatment. Analysis of women's overall acceptability was done using a generalized linear mixed-effects model with an equivalence range of -4% to 4%. The study was not masked. The trial is registered at ClinicalTrials.org, NCT 01844024. RESULTS: From April 2013 to June 2014, 1108 women were assessed for eligibility of which 1010 were randomized (506 to midwife and 504 to physician). 953 women were successfully followed up and included in the acceptability analysis. 95% (904) of the participants found the treatment satisfactory and overall acceptability was found to be equivalent between the two study groups. Treatment failure, not feeling calm and safe following treatment, experiencing severe abdominal pain or heavy bleeding following treatment, were significantly associated with non-satisfaction. No serious adverse events were recorded. CONCLUSIONS: Treatment of incomplete abortion with misoprostol by midwives and physician was highly, and equally, acceptable to women. TRIAL REGISTRATION: ClinicalTrials.gov NCT01844024.
Resumo:
1. Genomewide association studies (GWAS) enable detailed dissections of the genetic basis for organisms' ability to adapt to a changing environment. In long-term studies of natural populations, individuals are often marked at one point in their life and then repeatedly recaptured. It is therefore essential that a method for GWAS includes the process of repeated sampling. In a GWAS, the effects of thousands of single-nucleotide polymorphisms (SNPs) need to be fitted and any model development is constrained by the computational requirements. A method is therefore required that can fit a highly hierarchical model and at the same time is computationally fast enough to be useful. 2. Our method fits fixed SNP effects in a linear mixed model that can include both random polygenic effects and permanent environmental effects. In this way, the model can correct for population structure and model repeated measures. The covariance structure of the linear mixed model is first estimated and subsequently used in a generalized least squares setting to fit the SNP effects. The method was evaluated in a simulation study based on observed genotypes from a long-term study of collared flycatchers in Sweden. 3. The method we present here was successful in estimating permanent environmental effects from simulated repeated measures data. Additionally, we found that especially for variable phenotypes having large variation between years, the repeated measurements model has a substantial increase in power compared to a model using average phenotypes as a response. 4. The method is available in the R package RepeatABEL. It increases the power in GWAS having repeated measures, especially for long-term studies of natural populations, and the R implementation is expected to facilitate modelling of longitudinal data for studies of both animal and human populations.
Resumo:
BACKGROUND: Canalization is defined as the stability of a genotype against minor variations in both environment and genetics. Genetic variation in degree of canalization causes heterogeneity of within-family variance. The aims of this study are twofold: (1) quantify genetic heterogeneity of (within-family) residual variance in Atlantic salmon and (2) test whether the observed heterogeneity of (within-family) residual variance can be explained by simple scaling effects. RESULTS: Analysis of body weight in Atlantic salmon using a double hierarchical generalized linear model (DHGLM) revealed substantial heterogeneity of within-family variance. The 95% prediction interval for within-family variance ranged from ~0.4 to 1.2 kg2, implying that the within-family variance of the most extreme high families is expected to be approximately three times larger than the extreme low families. For cross-sectional data, DHGLM with an animal mean sub-model resulted in severe bias, while a corresponding sire-dam model was appropriate. Heterogeneity of variance was not sensitive to Box-Cox transformations of phenotypes, which implies that heterogeneity of variance exists beyond what would be expected from simple scaling effects. CONCLUSIONS: Substantial heterogeneity of within-family variance was found for body weight in Atlantic salmon. A tendency towards higher variance with higher means (scaling effects) was observed, but heterogeneity of within-family variance existed beyond what could be explained by simple scaling effects. For cross-sectional data, using the animal mean sub-model in the DHGLM resulted in biased estimates of variance components, which differed substantially both from a standard linear mean animal model and a sire-dam DHGLM model. Although genetic differences in canalization were observed, selection for increased canalization is difficult, because there is limited individual information for the variance sub-model, especially when based on cross-sectional data. Furthermore, potential macro-environmental changes (diet, climatic region, etc.) may make genetic heterogeneity of variance a less stable trait over time and space.
Resumo:
We present a new version of the hglm package for fittinghierarchical generalized linear models (HGLM) with spatially correlated random effects. A CAR family for conditional autoregressive random effects was implemented. Eigen decomposition of the matrix describing the spatial structure (e.g. the neighborhood matrix) was used to transform the CAR random effectsinto an independent, but heteroscedastic, gaussian random effect. A linear predictor is fitted for the random effect variance to estimate the parameters in the CAR model.This gives a computationally efficient algorithm for moderately sized problems (e.g. n<5000).
Resumo:
BACKGROUND: Misoprostol is established for the treatment of incomplete abortion but has not been systematically assessed when provided by midwives at district level in a low-resource setting. We investigated the effectiveness and safety of midwives diagnosing and treating incomplete abortion with misoprostol, compared with physicians. METHODS: We did a multicentre randomised controlled equivalence trial at district level at six facilities in Uganda. Eligibility criteria were women with signs of incomplete abortion. We randomly allocated women with first-trimester incomplete abortion to clinical assessment and treatment with misoprostol either by a physician or a midwife. The randomisation (1:1) was done in blocks of 12 and was stratified for study site. Primary outcome was complete abortion not needing surgical intervention within 14-28 days after initial treatment. The study was not masked. Analysis of the primary outcome was done on the per-protocol population with a generalised linear-mixed effects model. The predefined equivalence range was -4% to 4%. The trial was registered at ClinicalTrials.gov, number NCT01844024. FINDINGS: From April 30, 2013, to July 21, 2014, 1108 women were assessed for eligibility. 1010 women were randomly assigned to each group (506 to midwife group and 504 to physician group). 955 women (472 in the midwife group and 483 in the physician group) were included in the per-protocol analysis. 452 (95·8%) of women in the midwife group had complete abortion and 467 (96·7%) in the physician group. The model-based risk difference for midwife versus physician group was -0·8% (95% CI -2·9 to 1·4), falling within the predefined equivalence range (-4% to 4%). The overall proportion of women with incomplete abortion was 3·8% (36/955), similarly distributed between the two groups (4·2% [20/472] in the midwife group, 3·3% [16/483] in the physician group). No serious adverse events were recorded. INTERPRETATION: Diagnosis and treatment of incomplete abortion with misoprostol by midwives is equally safe and effective as when provided by physicians, in a low-resource setting. Scaling up midwives' involvement in treatment of incomplete abortion with misoprostol at district level would increase access to safe post-abortion care. FUNDING: The Swedish Research Council, Karolinska Institutet, and Dalarna University.