937 resultados para multiple linear regression models
Resumo:
Data were collected and analysed from seven field sites in Australia, Brazil and Colombia on weather conditions and the severity of anthracnose disease of the tropical pasture legume Stylosanthes scabra caused by Colletotrichum gloeosporioides. Disease severity and weather data were analysed using artificial neural network (ANN) models developed using data from some or all field sites in Australia and/or South America to predict severity at other sites. Three series of models were developed using different weather summaries. of these, ANN models with weather for the day of disease assessment and the previous 24 h period had the highest prediction success, and models trained on data from all sites within one continent correctly predicted disease severity in the other continent on more than 75% of days; the overall prediction error was 21.9% for the Australian and 22.1% for the South American model. of the six cross-continent ANN models trained on pooled data for five sites from two continents to predict severity for the remaining sixth site, the model developed without data from Planaltina in Brazil was the most accurate, with >85% prediction success, and the model without Carimagua in Colombia was the least accurate, with only 54% success. In common with multiple regression models, moisture-related variables such as rain, leaf surface wetness and variables that influence moisture availability such as radiation and wind on the day of disease severity assessment or the day before assessment were the most important weather variables in all ANN models. A set of weights from the ANN models was used to calculate the overall risk of anthracnose for the various sites. Sites with high and low anthracnose risk are present in both continents, and weather conditions at centres of diversity in Brazil and Colombia do not appear to be more conducive than conditions in Australia to serious anthracnose development.
Resumo:
Random regression models have been widely used to estimate genetic parameters that influence milk production in Bos taurus breeds, and more recently in B. indicus breeds. With the aim of finding appropriate random regression model to analyze milk yield, different parametric functions were compared, applied to 20,524 test-day milk yield records of 2816 first-lactation Guzerat (B. indicus) cows in Brazilian herds. The records were analyzed by random regression models whose random effects were additive genetic, permanent environmental and residual, and whose fixed effects were contemporary group, the covariable cow age at calving (linear and quadratic effects), and the herd lactation curve. The additive genetic and permanent environmental effects were modeled by the Wilmink function, a modified Wilmink function (with the second term divided by 100), a function that combined third-order Legendre polynomials with the last term of the Wilmink function, and the Ali and Schaeffer function. The residual variances were modeled by means of 1, 4, 6, or 10 heterogeneous classes, with the exception of the last term of the Wilmink function, for which there were 1, from 0.20 to 0.33. Genetic correlations between adjacent records were high values (0.83-0.99), but they declined when the interval between the test-day records increased, and were negative between the first and last records. The model employing the Ali and Schaeffer function with six residual variance classes was the most suitable for fitting the data. © FUNPEC-RP.
Resumo:
Este trabalho objetivou predizer parâmetros da estrutura de associações macrobentônicas (composição específica, abundância, riqueza, diversidade e equitatividade) em estuários do Sul do Brasil, utilizando modelos baseados em dados ambientais (características dos sedimentos, salinidade, temperaturas do ar e da água, e profundidade). As amostragens foram realizadas sazonalmente em cinco estuários entre o inverno de 1996 e o verão de 1998. Em cada estuário as amostras foram coletadas em áreas não poluídas, com características semelhantes quanto a presença ou ausência de vegetação, profundidade e distância da desenbocadura. Para a obtenção dos modelos de predição, foram utilizados dois métodos: o primeiro baseado em Análise Discriminante Múltipla (ADM) e o segundo em Regressão Linear Múltipla (RLM). Os modelos baseados em ADM apresentaram resultados melhores do que os baseados em regressão linear. Os melhores resultados usando RLM foram obtidos para diversidade e riqueza. É possível então, concluir que modelos como aqui derivados podem representar ferramentas muito úteis em estudos de monitoramento ambiental em estuários.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Given the importance of Guzera breeding programs for milk production in the tropics, the objective of this study was to compare alternative random regression models for estimation of genetic parameters and prediction of breeding values. Test-day milk yields records (TDR) were collected monthly, in a maximum of 10 measurements. The database included 20,524 records of first lactation from 2816 Guzera cows. TDR data were analyzed by random regression models (RRM) considering additive genetic, permanent environmental and residual effects as random and the effects of contemporary group (CG), calving age as a covariate (linear and quadratic effects) and mean lactation curve as fixed. The genetic additive and permanent environmental effects were modeled by RRM using Wilmink, All and Schaeffer and cubic B-spline functions as well as Legendre polynomials. Residual variances were considered as heterogeneous classes, grouped differently according to the model used. Multi-trait analysis using finite-dimensional models (FDM) for testday milk records (TDR) and a single-trait model for 305-days milk yields (default) using the restricted maximum likelihood method were also carried out as further comparisons. Through the statistical criteria adopted, the best RRM was the one that used the cubic B-spline function with five random regression coefficients for the genetic additive and permanent environmental effects. However, the models using the Ali and Schaeffer function or Legendre polynomials with second and fifth order for, respectively, the additive genetic and permanent environmental effects can be adopted, as little variation was observed in the genetic parameter estimates compared to those estimated by models using the B-spline function. Therefore, due to the lower complexity in the (co)variance estimations, the model using Legendre polynomials represented the best option for the genetic evaluation of the Guzera lactation records. An increase of 3.6% in the accuracy of the estimated breeding values was verified when using RRM. The ranks of animals were very close whatever the RRM for the data set used to predict breeding values. Considering P305, results indicated only small to medium difference in the animals' ranking based on breeding values predicted by the conventional model or by RRM. Therefore, the sum of all the RRM-predicted breeding values along the lactation period (RRM305) can be used as a selection criterion for 305-day milk production. (c) 2014 Elsevier B.V. All rights reserved.
Resumo:
We consider model selection uncertainty in linear regression. We study theoretically and by simulation the approach of Buckland and co-workers, who proposed estimating a parameter common to all models under study by taking a weighted average over the models, using weights obtained from information criteria or the bootstrap. This approach is compared with the usual approach in which the 'best' model is used, and with Bayesian model averaging. The weighted predictor behaves similarly to model averaging, with generally more realistic mean-squared errors than the usual model-selection-based estimator.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Estimates of evapotranspiration on a local scale is important information for agricultural and hydrological practices. However, equations to estimate potential evapotranspiration based only on temperature data, which are simple to use, are usually less trustworthy than the Food and Agriculture Organization (FAO)Penman-Monteith standard method. The present work describes two correction procedures for potential evapotranspiration estimates by temperature, making the results more reliable. Initially, the standard FAO-Penman-Monteith method was evaluated with a complete climatologic data set for the period between 2002 and 2006. Then temperature-based estimates by Camargo and Jensen-Haise methods have been adjusted by error autocorrelation evaluated in biweekly and monthly periods. In a second adjustment, simple linear regression was applied. The adjusted equations have been validated with climatic data available for the Year 2001. Both proposed methodologies showed good agreement with the standard method indicating that the methodology can be used for local potential evapotranspiration estimates.
Resumo:
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.
Resumo:
The objective of this paper is to model variations in test-day milk yields of first lactations of Holstein cows by RR using B-spline functions and Bayesian inference in order to fit adequate and parsimonious models for the estimation of genetic parameters. They used 152,145 test day milk yield records from 7317 first lactations of Holstein cows. The model established in this study was additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. Authors modeled the average lactation curve of the population with a fourth-order orthogonal Legendre polynomial. They concluded that a cubic B-spline with seven random regression coefficients for both the additive genetic and permanent environment effects was to be the best according to residual mean square and residual variance estimates. Moreover they urged a lower order model (quadratic B-spline with seven random regression coefficients for both random effects) could be adopted because it yielded practically the same genetic parameter estimates with parsimony. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^