8 resultados para Hierarchical Linear Modeling

em Dalarna University College Electronic Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision.  Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes.  The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present the hglm package for fitting hierarchical generalized linear models. It can be used for linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. Fixed effects can also be fitted in the dispersion part of the model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We analyze a real data set pertaining to reindeer fecal pellet-group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi-Poisson hierarchical generalized linear model (HGLM), zero-inflated Poisson (ZIP), and hurdle models. The quasi-Poisson HGLM allows for both under- and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi-Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi-Poisson HGLM with spatial random effects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we study the influence of the National Telecom Business Volume by the data in 2008 that have been published in China Statistical Yearbook of Statistics. We illustrate the procedure of modeling “National Telecom Business Volume” on the following eight variables, GDP, Consumption Levels, Retail Sales of Social Consumer Goods Total Renovation Investment, the Local Telephone Exchange Capacity, Mobile Telephone Exchange Capacity, Mobile Phone End Users, and the Local Telephone End Users. The testing of heteroscedasticity and multicollinearity for model evaluation is included. We also consider AIC and BIC criterion to select independent variables, and conclude the result of the factors which are the optimal regression model for the amount of telecommunications business and the relation between independent variables and dependent variable. Based on the final results, we propose several recommendations about how to improve telecommunication services and promote the economic development.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The gradual changes in the world development have brought energy issues back into high profile. An ongoing challenge for countries around the world is to balance the development gains against its effects on the environment. The energy management is the key factor of any sustainable development program. All the aspects of development in agriculture, power generation, social welfare and industry in Iran are crucially related to the energy and its revenue. Forecasting end-use natural gas consumption is an important Factor for efficient system operation and a basis for planning decisions. In this thesis, particle swarm optimization (PSO) used to forecast long run natural gas consumption in Iran. Gas consumption data in Iran for the previous 34 years is used to predict the consumption for the coming years. Four linear and nonlinear models proposed and six factors such as Gross Domestic Product (GDP), Population, National Income (NI), Temperature, Consumer Price Index (CPI) and yearly Natural Gas (NG) demand investigated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.