952 resultados para Linear models (Statistics)
Resumo:
This paper proposes a method for describing the distribution of observed temperatures on any day of the year such that the distribution and summary statistics of interest derived from the distribution vary smoothly through the year. The method removes the noise inherent in calculating summary statistics directly from the data thus easing comparisons of distributions and summary statistics between different periods. The method is demonstrated using daily effective temperatures (DET) derived from observations of temperature and wind speed at De Bilt, Holland. Distributions and summary statistics are obtained from 1985 to 2009 and compared to the period 1904–1984. A two-stage process first obtains parameters of a theoretical probability distribution, in this case the generalized extreme value (GEV) distribution, which describes the distribution of DET on any day of the year. Second, linear models describe seasonal variation in the parameters. Model predictions provide parameters of the GEV distribution, and therefore summary statistics, that vary smoothly through the year. There is evidence of an increasing mean temperature, a decrease in the variability in temperatures mainly in the winter and more positive skew, more warm days, in the summer. In the winter, the 2% point, the value below which 2% of observations are expected to fall, has risen by 1.2 °C, in the summer the 98% point has risen by 0.8 °C. Medians have risen by 1.1 and 0.9 °C in winter and summer, respectively. The method can be used to describe distributions of future climate projections and other climate variables. Further extensions to the methodology are suggested.
Resumo:
This thesis is concerned with development of improved management practices in indigenous chicken production systems in a research process that includes participatory approaches with smallholder farmers and other stakeholders in Kenya. The research process involved a wide range of activities that included on-station experiments, field surveys, stakeholder consultations in workshops, seminars and visits, and on-farm farmer participatory research to evaluate the effect of some improved management interventions on production performance of indigenous chickens. The participatory research was greatly informed from collective experiences and lessons of the previous activities. The on-station studies focused on hatching, growth and nutritional characteristics of the indigenous chickens. Four research publications from these studies are included in this thesis. Quantitative statistical analyses were applied and they involved use of growth models estimated with non-linear regressions for the growth characteristics, chi-square determinations to investigate differences among different reciprocal crosses of indigenous chickens and general linear models and covariance determination for the nutrition study. The on-station studies brought greater understanding of performance and production characteristics of indigenous chickens and the influence of management practices on these characteristics. The field surveys and stakeholder consultations helped in understanding the overarching issues affecting the productivity of the indigenous chickens systems and their place in the livelihoods of smallholder farmers. These activities created strong networking opportunities with stakeholders from a wide spectrum. The on-farm farmer participatory research involved selection of 200 farmers in five regions followed by training and introduction of interventions on improved management practices which included housing, vaccination, deworming and feed supplementation. Implementation and monitoring was mainly done by individual farmers continuously for close to one and half years. Six quarterly visits to the farms were made by the research team to monitor and provide support for on-going project activities. The data collected has been analysed for 5 consecutive 3-monthly periods. Descriptive and inferential statistics were applied to analyse the data collected involving treatment applications, production characteristics and flock demography characteristics. Out of the 200 farmers initially selected, 173 had records on treatment applications and flock demography characteristics while 127 farmers had records on production characteristics. The demographic analysis with a dissimilarity index of flock size produced 7 distinct farm groups from among the 173 farms. Two of these farm groups were represented in similar numbers in each of the five regions. The research process also involved a number of dissemination and communication strategies that have brought the process and project outcomes into the domain of accessibility by wider readership locally and globally. These include workshops, seminars, field visits and consultations, local and international conferences, electronic conferencing, publications and personal communication via emailing and conventional posting. A number of research and development proposals were also developed based on the knowledge and experiences gained from the research process. The thesis captures the research process activities and outcomes in 8 chapters which include in ascending order – introduction, theoretical concepts underpinning FPR, research methodology and process, on-station research output, FPR descriptive statistical analysis, FPR inferential statistical analysis on production characteristics, FPR demographic analysis and conclusions. Various research approaches both quantitative and qualitative have been applied in the research process indicating the possibilities and importance of combining both systems for greater understanding of issues being studied. In our case, participatory studies of the improved management of indigenous chickens indicates their potential importance as livelihood assets for poor people.
Resumo:
We consider the forecasting performance of two SETAR exchange rate models proposed by Kräger and Kugler [J. Int. Money Fin. 12 (1993) 195]. Assuming that the models are good approximations to the data generating process, we show that whether the non-linearities inherent in the data can be exploited to forecast better than a random walk depends on both how forecast accuracy is assessed and on the ‘state of nature’. Evaluation based on traditional measures, such as (root) mean squared forecast errors, may mask the superiority of the non-linear models. Generalized impulse response functions are also calculated as a means of portraying the asymmetric response to shocks implied by such models.
Resumo:
Predictors of random effects are usually based on the popular mixed effects (ME) model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969. Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64, 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004. Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error (MSE) than the competitors based on either the ME or Scott and Smith`s models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of MSE, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Local influence diagnostics based on estimating equations as the role of a gradient vector derived from any fit function are developed for repeated measures regression analysis. Our proposal generalizes tools used in other studies (Cook, 1986: Cadigan and Farrell, 2002), considering herein local influence diagnostics for a statistical model where estimation involves an estimating equation in which all observations are not necessarily independent of each other. Moreover, the measures of local influence are illustrated with some simulated data sets to assess influential observations. Applications using real data are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We consider consider the problem of dichotomizing a continuous covariate when performing a regression analysis based on a generalized estimation approach. The problem involves estimation of the cutpoint for the covariate and testing the hypothesis that the binary covariate constructed from the continuous covariate has a significant impact on the outcome. Due to the multiple testing used to find the optimal cutpoint, we need to make an adjustment to the usual significance test to preserve the type-I error rates. We illustrate the techniques on one data set of patients given unrelated hematopoietic stem cell transplantation. Here the question is whether the CD34 cell dose given to patient affects the outcome of the transplant and what is the smallest cell dose which is needed for good outcomes. (C) 2010 Elsevier BM. All rights reserved.
Resumo:
In this article, we deal with the issue of performing accurate small-sample inference in the Birnbaum-Saunders regression model, which can be useful for modeling lifetime or reliability data. We derive a Bartlett-type correction for the score test and numerically compare the corrected test with the usual score test and some other competitors.
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
Resumo:
In general, the normal distribution is assumed for the surrogate of the true covariates in the classical error model. This paper considers a class of distributions, which includes the normal one, for the variables subject to error. An estimation approach yielding consistent estimators is developed and simulation studies reported.
Resumo:
Calculations of local influence curvatures and leverage have been well developed when the parameters are unrestricted. In this article, we discuss the assessment of local influence and leverage under linear equality parameter constraints with extensions to inequality constraints. Using a penalized quadratic function we express the normal curvature of local influence for arbitrary perturbation schemes and the generalized leverage matrix in interpretable forms, which depend on restricted and unrestricted components. The results are quite general and can be applied in various statistical models. In particular, we derive the normal curvature under three useful perturbation schemes for generalized linear models. Four illustrative examples are analyzed by the methodology developed in the article.
Resumo:
The aim of this article is to discuss the estimation of the systematic risk in capital asset pricing models with heavy-tailed error distributions to explain the asset returns. Diagnostic methods for assessing departures from the model assumptions as well as the influence of observations on the parameter estimates are also presented. It may be shown that outlying observations are down weighted in the maximum likelihood equations of linear models with heavy-tailed error distributions, such as Student-t, power exponential, logistic II, so on. This robustness aspect may also be extended to influential observations. An application in which the systematic risk estimate of Microsoft is compared under normal and heavy-tailed errors is presented for illustration.
Resumo:
Influence diagnostics methods are extended in this article to the Grubbs model when the unknown quantity x (latent variable) follows a skew-normal distribution. Diagnostic measures are derived from the case-deletion approach and the local influence approach under several perturbation schemes. The observed information matrix to the postulated model and Delta matrices to the corresponding perturbed models are derived. Results obtained for one real data set are reported, illustrating the usefulness of the proposed methodology.
Resumo:
We present a new version of the hglm package for fittinghierarchical generalized linear models (HGLM) with spatially correlated random effects. A CAR family for conditional autoregressive random effects was implemented. Eigen decomposition of the matrix describing the spatial structure (e.g. the neighborhood matrix) was used to transform the CAR random effectsinto an independent, but heteroscedastic, gaussian random effect. A linear predictor is fitted for the random effect variance to estimate the parameters in the CAR model.This gives a computationally efficient algorithm for moderately sized problems (e.g. n<5000).
Resumo:
We present a new version (> 2.0) of the hglm package for fitting hierarchical generalized linear models (HGLMs) with spatially correlated random effects. CAR() and SAR() families for conditional and simultaneous autoregressive random effects were implemented. Eigen decomposition of the matrix describing the spatial structure (e.g., the neighborhood matrix) was used to transform the CAR/SAR random effects into an independent, but eteroscedastic, Gaussian random effect. A linear predictor is fitted for the random effect variance to estimate the parameters in the CAR and SAR models. This gives a computationally efficient algorithm for moderately sized problems.
Resumo:
Sistemas de previsão de cheias podem ser adequadamente utilizados quando o alcance é suficiente, em comparação com o tempo necessário para ações preventivas ou corretivas. Além disso, são fundamentalmente importantes a confiabilidade e a precisão das previsões. Previsões de níveis de inundação são sempre aproximações, e intervalos de confiança não são sempre aplicáveis, especialmente com graus de incerteza altos, o que produz intervalos de confiança muito grandes. Estes intervalos são problemáticos, em presença de níveis fluviais muito altos ou muito baixos. Neste estudo, previsões de níveis de cheia são efetuadas, tanto na forma numérica tradicional quanto na forma de categorias, para as quais utiliza-se um sistema especialista baseado em regras e inferências difusas. Metodologias e procedimentos computacionais para aprendizado, simulação e consulta são idealizados, e então desenvolvidos sob forma de um aplicativo (SELF – Sistema Especialista com uso de Lógica “Fuzzy”), com objetivo de pesquisa e operação. As comparações, com base nos aspectos de utilização para a previsão, de sistemas especialistas difusos e modelos empíricos lineares, revelam forte analogia, apesar das diferenças teóricas fundamentais existentes. As metodologias são aplicadas para previsão na bacia do rio Camaquã (15543 km2), para alcances entre 10 e 48 horas. Dificuldades práticas à aplicação são identificadas, resultando em soluções as quais constituem-se em avanços do conhecimento e da técnica. Previsões, tanto na forma numérica quanto categorizada são executadas com sucesso, com uso dos novos recursos. As avaliações e comparações das previsões são feitas utilizandose um novo grupo de estatísticas, derivadas das freqüências simultâneas de ocorrência de valores observados e preditos na mesma categoria, durante a simulação. Os efeitos da variação da densidade da rede são analisados, verificando-se que sistemas de previsão pluvio-hidrométrica em tempo atual são possíveis, mesmo com pequeno número de postos de aquisição de dados de chuva, para previsões sob forma de categorias difusas.