883 resultados para random coefficient regression model
Resumo:
The Brazilian Association of Simmental and Simbrasil Cattle Farmers provided 29,510 records from 10,659 Simmental beef cattle; these were used to estimate (co)variance components and genetic parameters for weights in the growth trajectory, based on multi-trait (MTM) and random regression models (RRM). The (co)variance components and genetic parameters were estimated by restricted maximum likelihood. In the MTM analysis, the likelihood ratio test was used to determine the significance of random effects included in the model and to define the most appropriate model. All random effects were significant and included in the final model. In the RRM analysis, different adjustments of polynomial orders were compared for 5 different criteria to choose the best fit model. An RRM of third order for the direct additive genetic, direct permanent environmental, maternal additive genetic, and maternal permanent environment effects was sufficient to model variance structures in the growth trajectory of the animals. The (co)variance components were generally similar in MTM and RRM. Direct heritabilities of MTM were slightly lower than RRM and varied from 0.04 to 0.42 and 0.16 to 0.45, respectively. Additive direct correlations were mostly positive and of high magnitude, being highest at closest ages. Considering the results and that pre-adjustment of the weights to standard ages is not required, RRM is recommended for genetic evaluation of Simmental beef cattle in Brazil. ©FUNPEC-RP.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In this paper, we propose a random intercept Poisson model in which the random effect is assumed to follow a generalized log-gamma (GLG) distribution. This random effect accommodates (or captures) the overdispersion in the counts and induces within-cluster correlation. We derive the first two moments for the marginal distribution as well as the intraclass correlation. Even though numerical integration methods are, in general, required for deriving the marginal models, we obtain the multivariate negative binomial model from a particular parameter setting of the hierarchical model. An iterative process is derived for obtaining the maximum likelihood estimates for the parameters in the multivariate negative binomial model. Residual analysis is proposed and two applications with real data are given for illustration. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full posterior predictive distribution and two “mixed” predictive methods that incorporate higher level random effects simulated from the underlying model distribution. Cross validation is considered a gold standard method but is computationally intensive and thus a comparison is made between posterior predictive assessments and cross validation. The analyses revealed that mixed prediction methods produced results close to cross validation whilst the full posterior predictive assessment gave predictions that were over-optimistic (closer to the observed disease rates) compared with cross validation. A mixed prediction method that simulated random effects from both higher levels was best at identifying the outlying level two (farm-year) units of interest. It is concluded that this mixed prediction method, simulating random effects from both higher levels, is straightforward and may be of value in model criticism of multilevel logistic regression, a technique commonly used for animal health data with a hierarchical structure.
Resumo:
In the study of traffic safety, expected crash frequencies across sites are generally estimated via the negative binomial model, assuming time invariant safety. Since the time invariant safety assumption may be invalid, Hauer (1997) proposed a modified empirical Bayes (EB) method. Despite the modification, no attempts have been made to examine the generalisable form of the marginal distribution resulting from the modified EB framework. Because the hyper-parameters needed to apply the modified EB method are not readily available, an assessment is lacking on how accurately the modified EB method estimates safety in the presence of the time variant safety and regression-to-the-mean (RTM) effects. This study derives the closed form marginal distribution, and reveals that the marginal distribution in the modified EB method is equivalent to the negative multinomial (NM) distribution, which is essentially the same as the likelihood function used in the random effects Poisson model. As a result, this study shows that the gamma posterior distribution from the multivariate Poisson-gamma mixture can be estimated using the NM model or the random effects Poisson model. This study also shows that the estimation errors from the modified EB method are systematically smaller than those from the comparison group method by simultaneously accounting for the RTM and time variant safety effects. Hence, the modified EB method via the NM model is a generalisable method for estimating safety in the presence of the time variant safety and the RTM effects.
Resumo:
Poisson distribution has often been used for count like accident data. Negative Binomial (NB) distribution has been adopted in the count data to take care of the over-dispersion problem. However, Poisson and NB distributions are incapable of taking into account some unobserved heterogeneities due to spatial and temporal effects of accident data. To overcome this problem, Random Effect models have been developed. Again another challenge with existing traffic accident prediction models is the distribution of excess zero accident observations in some accident data. Although Zero-Inflated Poisson (ZIP) model is capable of handling the dual-state system in accident data with excess zero observations, it does not accommodate the within-location correlation and between-location correlation heterogeneities which are the basic motivations for the need of the Random Effect models. This paper proposes an effective way of fitting ZIP model with location specific random effects and for model calibration and assessment the Bayesian analysis is recommended.
Resumo:
The along-track stereo images of Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) sensor with 15 m resolution were used to generate Digital Elevation Model (DEM) on an area with low and near Mean Sea Level (MSL) elevation in Johor, Malaysia. The absolute DEM was generated by using the Rational Polynomial Coefficient (RPC) model which was run on ENVI 4.8 software. In order to generate the absolute DEM, 60 Ground Control Pointes (GCPs) with almost vertical accuracy less than 10 meter extracted from topographic map of the study area. The assessment was carried out on uncorrected and corrected DEM by utilizing dozens of Independent Check Points (ICPs). Consequently, the uncorrected DEM showed the RMSEz of ± 26.43 meter which was decreased to the RMSEz of ± 16.49 meter for the corrected DEM after post-processing. Overall, the corrected DEM of ASTER stereo images met the expectations.
Resumo:
Submarine groundwater discharge (SGD) is an integral part of the hydrological cycle and represents an important aspect of land-ocean interactions. We used a numerical model to simulate flow and salt transport in a nearshore groundwater aquifer under varying wave conditions based on yearlong random wave data sets, including storm surge events. The results showed significant flow asymmetry with rapid response of influxes and retarded response of effluxes across the seabed to the irregular wave conditions. While a storm surge immediately intensified seawater influx to the aquifer, the subsequent return of intruded seawater to the sea, as part of an increased SGD, was gradual. Using functional data analysis, we revealed and quantified retarded, cumulative effects of past wave conditions on SGD including the fresh groundwater and recirculating seawater discharge components. The retardation was characterized well by a gamma distribution function regardless of wave conditions. The relationships between discharge rates and wave parameters were quantifiable by a regression model in a functional form independent of the actual irregular wave conditions. This statistical model provides a useful method for analyzing and predicting SGD from nearshore unconfined aquifers affected by random waves
Resumo:
This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.
Resumo:
Semisupervised dimensionality reduction has been attracting much attention as it not only utilizes both labeled and unlabeled data simultaneously, but also works well in the situation of out-of-sample. This paper proposes an effective approach of semisupervised dimensionality reduction through label propagation and label regression. Different from previous efforts, the new approach propagates the label information from labeled to unlabeled data with a well-designed mechanism of random walks, in which outliers are effectively detected and the obtained virtual labels of unlabeled data can be well encoded in a weighted regression model. These virtual labels are thereafter regressed with a linear model to calculate the projection matrix for dimensionality reduction. By this means, when the manifold or the clustering assumption of data is satisfied, the labels of labeled data can be correctly propagated to the unlabeled data; and thus, the proposed approach utilizes the labeled and the unlabeled data more effectively than previous work. Experimental results are carried out upon several databases, and the advantage of the new approach is well demonstrated.
Resumo:
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).
Resumo:
Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.
Resumo:
This project focuses on the study of different explanatory models for the behavior of CDS security, such as Fixed-Effect Model, GLS Random-Effect Model, Pooled OLS and Quantile Regression Model. After determining the best fitness model, trading strategies with long and short positions in CDS have been developed. Due to some specifications of CDS, I conclude that the quantile regression is the most efficient model to estimate the data. The P&L and Sharpe Ratio of the strategy are analyzed using a backtesting analogy, where I conclude that, mainly for non-financial companies, the model allows traders to take advantage of and profit from arbitrages.
Resumo:
Several Authors Have Discussed Recently the Limited Dependent Variable Regression Model with Serial Correlation Between Residuals. the Pseudo-Maximum Likelihood Estimators Obtained by Ignoring Serial Correlation Altogether, Have Been Shown to Be Consistent. We Present Alternative Pseudo-Maximum Likelihood Estimators Which Are Obtained by Ignoring Serial Correlation Only Selectively. Monte Carlo Experiments on a Model with First Order Serial Correlation Suggest That Our Alternative Estimators Have Substantially Lower Mean-Squared Errors in Medium Size and Small Samples, Especially When the Serial Correlation Coefficient Is High. the Same Experiments Also Suggest That the True Level of the Confidence Intervals Established with Our Estimators by Assuming Asymptotic Normality, Is Somewhat Lower Than the Intended Level. Although the Paper Focuses on Models with Only First Order Serial Correlation, the Generalization of the Proposed Approach to Serial Correlation of Higher Order Is Also Discussed Briefly.
Resumo:
Influences of inbreeding on daily milk yield (DMY), age at first calving (AFC), and calving intervals (CI) were determined on a highly inbred zebu dairy subpopulation of the Guzerat breed. Variance components were estimated using animal models in single-trait analyses. Two approaches were employed to estimate inbreeding depression: using individual increase in inbreeding coefficients or using inbreeding coefficients as possible covariates included in the statistical models. The pedigree file included 9,915 animals, of which 9,055 were inbred, with an average inbreeding coefficient of 15.2%. The maximum inbreeding coefficient observed was 49.45%, and the average inbreeding for the females still in the herd during the analysis was 26.42%. Heritability estimates were 0.27 for DMY and 0.38 for AFC. The genetic variance ratio estimated with the random regression model for CI ranged around 0.10. Increased inbreeding caused poorer performance in DMY, AFC, and CI. However, some of the cows with the highest milk yield were among the highly inbred animals in this subpopulation. Individual increase in inbreeding used as a covariate in the statistical models accounted for inbreeding depression while avoiding overestimation that may result when fitting inbreeding coefficients.