946 resultados para Random regression models
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This paper studies a smooth-transition (ST) type cointegration. The proposed ST cointegration allows for regime switching structure in a cointegrated system. It nests the linear cointegration developed by Engle and Granger (1987) and the threshold cointegration studied by Balke and Fomby (1997). We develop F-type tests to examine linear cointegration against ST cointegration in ST-type cointegrating regression models with or without time trends. The null asymptotic distributions of the tests are derived with stationary transition variables in ST cointegrating regression models. And it is shown that our tests have nonstandard limiting distributions expressed in terms of standard Brownian motion when regressors are pure random walks, while have standard asymptotic distributions when regressors contain random walks with nonzero drift. Finite-sample distributions of those tests are studied by Monto Carlo simulations. The small-sample performance of the tests states that our F-type tests have a better power when the system contains ST cointegration than when the system is linearly cointegrated. An empirical example for the purchasing power parity (PPP) data (monthly US dollar, Italy lira and dollar-lira exchange rate from 1973:01 to 1989:10) is illustrated by applying the testing procedures in this paper. It is found that there is no linear cointegration in the system, but there exits the ST-type cointegration in the PPP data.
Resumo:
This is a note about proxy variables and instruments for identification of structural parameters in regression models. We have experienced that in the econometric textbooks these two issues are treated separately, although in practice these two concepts are very often combined. Usually, proxy variables are inserted in instrument variable regressions with the motivation they are exogenous. Implicitly meaning they are exogenous in a reduced form model and not in a structural model. Actually if these variables are exogenous they should be redundant in the structural model, e.g. IQ as a proxy for ability. Valid proxies reduce unexplained variation and increases the efficiency of the estimator of the structural parameter of interest. This is especially important in situations when the instrument is weak. With a simple example we demonstrate what is required of a proxy and an instrument when they are combined. It turns out that when a researcher has a valid instrument the requirements on the proxy variable is weaker than if no such instrument exists
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
This paper provides a systematic and unified treatment of the developments in the area of kernel estimation in econometrics and statistics. Both the estimation and hypothesis testing issues are discussed for the nonparametric and semiparametric regression models. A discussion on the choice of windowwidth is also presented.
Resumo:
The goal of this paper is to introduce a class of tree-structured models that combines aspects of regression trees and smooth transition regression models. The model is called the Smooth Transition Regression Tree (STR-Tree). The main idea relies on specifying a multiple-regime parametric model through a tree-growing procedure with smooth transitions among different regimes. Decisions about splits are entirely based on a sequence of Lagrange Multiplier (LM) tests of hypotheses.
Resumo:
Objetivou-se avaliar a melhor modelagem para as variâncias genética aditiva, de ambiente permanente e residual da produção de leite no dia do controle (PLDC) de caprinos. Utilizaram-se modelos de regressão aleatória sobre polinômios ortogonais de Legendre com diferentes ordens de ajuste e variância residual heterogênea. Consideraram-se como efeitos fixos os efeitos de grupo de contemporâneos, a idade da cabra ao parto (co-variável) e a regressão fixa da PLDC sobre polinômios de Legendre, para modelar a trajetória média da população; e, como efeitos aleatórios, os efeitos genético aditivo e de ambiente permanente. O modelo com quatro classes de variâncias residuais foi o que proporcionou melhor ajuste. Os valores do logaritmo da função de verossimilhança, de AIC e BIC apontaram para seleção de modelos com ordens mais altas (cinco para o efeito genético e sete para o efeito de ambiente permanente). Entretanto, os autovalores associados às matrizes de co-variâncias entre os coeficientes de regressão indicaram a possibilidade de redução da dimensionalidade. As altas ordens de ajuste proporcionaram estimativas de variâncias genéticas e correlações genéticas e de ambiente permanente que não condizem com o fenômeno biológico estudado. O modelo de quinta ordem para a variância genética aditiva e de sétima ordem para o ambiente permanente foi indicado. Entretanto, um modelo mais parcimonioso, de quarta ordem para o efeito genético aditivo e de sexta ordem para o efeito de ambiente permanente, foi suficiente para ajustar as variâncias nos dados.
Resumo:
The study had the objective of evaluate the growth of bean cultivar Carioca early cultivated during the rainy season under different doses of phosphorus applied to soil. We adopted the experimental design at random blocks, evaluating six doses of phosphorus (0, 30, 60, 90, 120 and 150 kg P(2)O(5) ha(-1)), relying in the form of superphosphate triple, with five replicates. Biometric indices were evaluated leaf area (cm(2) plant(-1)) and dry matter (g seedling(-1)). Physiological indices of crop growth rate (CGR, g m(-2) day(-1)), relative growth rate (RGR, g g(-1) day(-1)), net assimilation rate (NAR, g m(-2) day(-1)), leaf area ratio (LAR, cm(2) g(-1)) and specific leaf area (SLA, cm(2) g(-1)) were obtained through analysis of functional growth. The data from the biometric indices were submitted to analysis of variance, with subsequent construction of regression models. The dry matter and leaf area of bean Carioca Early increase linearly with increasing doses of applied phosphorus in the soil. The dry matter has increased constantly throughout the cycle, and the leaf area indices reach its maximum at 52 days after emergence (DAE). The restriction of phosphorus reduces the Relative Growth Rate at the beginning of the development and lengthen the physiological cycle of bean cultivar Carioca early.
Resumo:
Ties among event times are often recorded in survival studies. For example, in a two week laboratory study where event times are measured in days, ties are very likely to occur. The proportional hazards model might be used in this setting using an approximated partial likelihood function. This approximation works well when the number of ties is small. on the other hand, discrete regression models are suggested when the data are heavily tied. However, in many situations it is not clear which approach should be used in practice. In this work, empirical guidelines based on Monte Carlo simulations are provided. These recommendations are based on a measure of the amount of tied data present and the mean square error. An example illustrates the proposed criterion.
Resumo:
It is often necessary to run response surface designs in blocks. In this paper the analysis of data from such experiments, using polynomial regression models, is discussed. The definition and estimation of pure error in blocked designs are considered. It is recommended that pure error is estimated by assuming additive block and treatment effects, as this is more consistent with designs without blocking. The recovery of inter-block information using REML analysis is discussed, although it is shown that it has very little impact if thc design is nearly orthogonally blocked. Finally prediction from blocked designs is considered and it is shown that prediction of many quantities of interest is much simpler than prediction of the response itself.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC) são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.
Resumo:
Data were collected and analysed from seven field sites in Australia, Brazil and Colombia on weather conditions and the severity of anthracnose disease of the tropical pasture legume Stylosanthes scabra caused by Colletotrichum gloeosporioides. Disease severity and weather data were analysed using artificial neural network (ANN) models developed using data from some or all field sites in Australia and/or South America to predict severity at other sites. Three series of models were developed using different weather summaries. of these, ANN models with weather for the day of disease assessment and the previous 24 h period had the highest prediction success, and models trained on data from all sites within one continent correctly predicted disease severity in the other continent on more than 75% of days; the overall prediction error was 21.9% for the Australian and 22.1% for the South American model. of the six cross-continent ANN models trained on pooled data for five sites from two continents to predict severity for the remaining sixth site, the model developed without data from Planaltina in Brazil was the most accurate, with >85% prediction success, and the model without Carimagua in Colombia was the least accurate, with only 54% success. In common with multiple regression models, moisture-related variables such as rain, leaf surface wetness and variables that influence moisture availability such as radiation and wind on the day of disease severity assessment or the day before assessment were the most important weather variables in all ANN models. A set of weights from the ANN models was used to calculate the overall risk of anthracnose for the various sites. Sites with high and low anthracnose risk are present in both continents, and weather conditions at centres of diversity in Brazil and Colombia do not appear to be more conducive than conditions in Australia to serious anthracnose development.
Resumo:
A quantitative structure-activity relationship (QSAR) study of 19 quinone compounds with trypanocidal activity was performed by Partial Least Squares (PLS) and Principal Component Regression (PCR) methods with the use of leave-one-out crossvalidation procedure to build the regression models. The trypanocidal activity of the compounds is related to their first cathodic potential (Ep(c1)). The regression PLS and PCR models built in this study were also used to predict the Ep(c1) of six new quinone compounds. The PLS model was built with three principal components that described 96.50% of the total variance and present Q(2) = 0.83 and R-2 = 0.90. The results obtained with the PCR model were similar to those obtained with the PLS model. The PCR model was also built with three principal components that described 96.67% of the total variance with Q(2) = 0.83 and R-2 = 0.90. The most important descriptors for our PLS and PCR models were HOMO-1 (energy of the molecular orbital below HOMO), Q4 (atomic charge at position 4), MAXDN (maximal electrotopological negative difference), and HYF (hydrophilicity index).
Resumo:
A total of 20,065 weights recorded on 3016 Nelore animals were used to estimate covariance functions for growth from birth to 630 days of age, assuming a parametric correlation structure to model within-animal correlations. The model of analysis included fixed effects of contemporary groups and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Genetic effects of the animal and its dam and maternal permanent environmental effects were modelled by random regressions on Legendre polynomials of age at recording. Changes in direct permanent environmental effect variances were modelled by a polynomial variance function, together with a parametric correlation function to account for correlations between ages. Stationary and nonstationary models were used to model within-animal correlations between different ages. Residual variances were considered homogeneous or heterogeneous, with changes modelled by a step or polynomial function of age at recording. Based on Bayesian information criterion, a model with a cubic variance function combined with a nonstationary correlation function for permanent environmental effects, with 49 parameters to be estimated, fitted best. Modelling within-animal correlations through a parametric correlation structure can describe the variation pattern adequately. Moreover, the number of parameters to be estimated can be decreased substantially compared to a model fitting random regression on Legendre polynomial of age. © 2004 Elsevier B.V. All rights reserved.