836 resultados para Mixed linear models
Resumo:
Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
O modelo misto consiste numa importante classe de modelos que tem sido tradicionalmente analisada por meio de procedimentos da análise de variância. Nos modelos mistos, três aspectos são fundamentais: estimação e testes de hipóteses dos efeitos fixos, predição dos efeitos aleatórios e estimação dos componentes de variância. Na análise de modelos lineares mistos desbalanceados, a estimação dos componentes de variância é de fundamental importância e depende da estrutura de covariâncias e dos métodos de estimação utilizados. Nesse contexto, este artigo pretende apresentar os principais métodos de estimação e de análise utilizados no estudo de modelos lineares mistos com estruturas gerais de covariâncias nos efeitos aleatórios, disponíveis no procedimento MIXED, do SAS (Statistical Analysis System).
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A rigorous asymptotic theory for Wald residuals in generalized linear models is not yet available. The authors provide matrix formulae of order O(n(-1)), where n is the sample size, for the first two moments of these residuals. The formulae can be applied to many regression models widely used in practice. The authors suggest adjusted Wald residuals to these models with approximately zero mean and unit variance. The expressions were used to analyze a real dataset. Some simulation results indicate that the adjusted Wald residuals are better approximated by the standard normal distribution than the Wald residuals.
Resumo:
Spatial linear models have been applied in numerous fields such as agriculture, geoscience and environmental sciences, among many others. Spatial dependence structure modelling, using a geostatistical approach, is an indispensable tool to estimate the parameters that define this structure. However, this estimation may be greatly affected by the presence of atypical observations in the sampled data. The purpose of this paper is to use diagnostic techniques to assess the sensitivity of the maximum-likelihood estimators, covariance functions and linear predictor to small perturbations in the data and/or the spatial linear model assumptions. The methodology is illustrated with two real data sets. The results allowed us to conclude that the presence of atypical values in the sample data have a strong influence on thematic maps, changing the spatial dependence structure.
Resumo:
Marginal generalized linear models can be used for clustered and longitudinal data by fitting a model as if the data were independent and using an empirical estimator of parameter standard errors. We extend this approach to data where the number of observations correlated with a given one grows with sample size and show that parameter estimates are consistent and asymptotically Normal with a slower convergence rate than for independent data, and that an information sandwich variance estimator is consistent. We present two problems that motivated this work, the modelling of patterns of HIV genetic variation and the behavior of clustered data estimators when clusters are large.
Resumo:
This paper reports a comparison of three modeling strategies for the analysis of hospital mortality in a sample of general medicine inpatients in a Department of Veterans Affairs medical center. Logistic regression, a Markov chain model, and longitudinal logistic regression were evaluated on predictive performance as measured by the c-index and on accuracy of expected numbers of deaths compared to observed. The logistic regression used patient information collected at admission; the Markov model was comprised of two absorbing states for discharge and death and three transient states reflecting increasing severity of illness as measured by laboratory data collected during the hospital stay; longitudinal regression employed Generalized Estimating Equations (GEE) to model covariance structure for the repeated binary outcome. Results showed that the logistic regression predicted hospital mortality as well as the alternative methods but was limited in scope of application. The Markov chain provides insights into how day to day changes of illness severity lead to discharge or death. The longitudinal logistic regression showed that increasing illness trajectory is associated with hospital mortality. The conclusion is reached that for standard applications in modeling hospital mortality, logistic regression is adequate, but for new challenges facing health services research today, alternative methods are equally predictive, practical, and can provide new insights. ^
Resumo:
In applied work economists often seek to relate a given response variable y to some causal parameter mu* associated with it. This parameter usually represents a summarization based on some explanatory variables of the distribution of y, such as a regression function, and treating it as a conditional expectation is central to its identification and estimation. However, the interpretation of mu* as a conditional expectation breaks down if some or all of the explanatory variables are endogenous. This is not a problem when mu* is modelled as a parametric function of explanatory variables because it is well known how instrumental variables techniques can be used to identify and estimate mu*. In contrast, handling endogenous regressors in nonparametric models, where mu* is regarded as fully unknown, presents di±cult theoretical and practical challenges. In this paper we consider an endogenous nonparametric model based on a conditional moment restriction. We investigate identification related properties of this model when the unknown function mu* belongs to a linear space. We also investigate underidentification of mu* along with the identification of its linear functionals. Several examples are provided in order to develop intuition about identification and estimation for endogenous nonparametric regression and related models.
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^
Resumo:
Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^
Resumo:
The Department of Structural Analysis of the University of Santander has been for a longtime involved in the solution of the country´s practical engineering problems. Some of these have required the use of non-conventional methods of analysis, in order to achieve adequate engineering answers. As an example of the increasing application of non-linear computer codes in the nowadays engineering practice, some cases will be briefly presented. In each case, only the main features of the problem involved and the solution used to solve it will be shown
Resumo:
Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.