923 resultados para Generalised Linear Models
Resumo:
Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models. In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation.
Resumo:
Deforestation in Brazilian Amazonia accounts for a disproportionate global scale fraction of both carbon emissions from biomass burning and biodiversity erosion through habitat loss. Here we use field- and remote-sensing data to examine the effects of private landholding size on the amount and type of forest cover retained within economically active rural properties in an aging southern Amazonian deforestation frontier. Data on both upland and riparian forest cover from a survey of 300 rural properties indicated that 49.4% (SD = 29.0%) of the total forest cover was maintained as of 2007. and that property size is a key regional-scale determinant of patterns of deforestation and land-use change. Small properties (<= 150 ha) retained a lower proportion of forest (20.7%, SD = 17.6) than did large properties (>150 ha; 55.6%, SD = 27.2). Generalized linear models showed that property size had a positive effect on remaining areas of both upland and total forest cover. Using a Landsat time-series, the age of first clear-cutting that could be mapped within the boundaries of each property had a negative effect on the proportion of upland, riparian, and total forest cover retained. Based on these data, we show contrasts in land-use strategies between smallholders and largeholders, as well as differences in compliance with legal requirements in relation to minimum forest cover set-asides within private landholdings. This suggests that property size structure must be explicitly considered in landscape-scale conservation planning initiatives guiding agro-pastoral frontier expansion into remaining areas of tropical forest. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The humpback whale (Megaptera novaeangliae) population that uses Abrolhos Bank, off the east coast of Brazil as a breeding ground is increasing. To describe temporal changes in the relative abundance of humpback whales around Abrolhos, seven years (1998-2004) of whale count data were collected during July through to November. During one-hour-scans, observers determined group size within 9.3 km (5 n.m.) of a land-based observing station. A total Of 930 scans, comprising 7996 sightings of adults and 2044 calves were analysed using generalized linear models that included variables for time of day, day of the season, years and two-way interactions as possible predictors. The pattern observed was the gradual build-up and decline in whale counts within seasons. Patterns and peaks of adult and calf counts varied among years. Although fluctuation was observed, there was generally an increasing trend in adult counts among years. Calf counts increased only in 2004. These fluctuations may have been caused by some environmental conditions in humpback whales` summering grounds and also by changes in spatial-temporal concentrations in Abrolhos Bank. The general pattern observed within the study area mirrored what was observed in the whole Abrolhos Bank. Knowledge of the consistency with which humpback whales use this important nursing area should prove beneficial for designing future monitoring programmes especially related to whale watching activities around Abrolhos Archipelago.
Resumo:
A mixed integer continuous nonlinear model and a solution method for the problem of orthogonally packing identical rectangles within an arbitrary convex region are introduced in the present work. The convex region is assumed to be made of an isotropic material in such a way that arbitrary rotations of the items, preserving the orthogonality constraint, are allowed. The solution method is based on a combination of branch and bound and active-set strategies for bound-constrained minimization of smooth functions. Numerical results show the reliability of the presented approach. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Predictors of random effects are usually based on the popular mixed effects (ME) model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969. Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64, 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004. Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error (MSE) than the competitors based on either the ME or Scott and Smith`s models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of MSE, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Local influence diagnostics based on estimating equations as the role of a gradient vector derived from any fit function are developed for repeated measures regression analysis. Our proposal generalizes tools used in other studies (Cook, 1986: Cadigan and Farrell, 2002), considering herein local influence diagnostics for a statistical model where estimation involves an estimating equation in which all observations are not necessarily independent of each other. Moreover, the measures of local influence are illustrated with some simulated data sets to assess influential observations. Applications using real data are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We consider consider the problem of dichotomizing a continuous covariate when performing a regression analysis based on a generalized estimation approach. The problem involves estimation of the cutpoint for the covariate and testing the hypothesis that the binary covariate constructed from the continuous covariate has a significant impact on the outcome. Due to the multiple testing used to find the optimal cutpoint, we need to make an adjustment to the usual significance test to preserve the type-I error rates. We illustrate the techniques on one data set of patients given unrelated hematopoietic stem cell transplantation. Here the question is whether the CD34 cell dose given to patient affects the outcome of the transplant and what is the smallest cell dose which is needed for good outcomes. (C) 2010 Elsevier BM. All rights reserved.
Resumo:
In this article, we deal with the issue of performing accurate small-sample inference in the Birnbaum-Saunders regression model, which can be useful for modeling lifetime or reliability data. We derive a Bartlett-type correction for the score test and numerically compare the corrected test with the usual score test and some other competitors.
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
Resumo:
Calculations of local influence curvatures and leverage have been well developed when the parameters are unrestricted. In this article, we discuss the assessment of local influence and leverage under linear equality parameter constraints with extensions to inequality constraints. Using a penalized quadratic function we express the normal curvature of local influence for arbitrary perturbation schemes and the generalized leverage matrix in interpretable forms, which depend on restricted and unrestricted components. The results are quite general and can be applied in various statistical models. In particular, we derive the normal curvature under three useful perturbation schemes for generalized linear models. Four illustrative examples are analyzed by the methodology developed in the article.
Resumo:
Influence diagnostics methods are extended in this article to the Grubbs model when the unknown quantity x (latent variable) follows a skew-normal distribution. Diagnostic measures are derived from the case-deletion approach and the local influence approach under several perturbation schemes. The observed information matrix to the postulated model and Delta matrices to the corresponding perturbed models are derived. Results obtained for one real data set are reported, illustrating the usefulness of the proposed methodology.
Resumo:
Animal traits differ not only in mean, but also in variation around the mean. For instance, one sire’s daughter group may be very homogeneous, while another sire’s daughters are much more heterogeneous in performance. The difference in residual variance can partially be explained by genetic differences. Models for such genetic heterogeneity of environmental variance include genetic effects for the mean and residual variance, and a correlation between the genetic effects for the mean and residual variance to measure how the residual variance might vary with the mean. The aim of this thesis was to develop a method based on double hierarchical generalized linear models for estimating genetic heteroscedasticity, and to apply it on four traits in two domestic animal species; teat count and litter size in pigs, and milk production and somatic cell count in dairy cows. The method developed is fast and has been implemented in software that is widely used in animal breeding, which makes it convenient to use. It is based on an approximation of double hierarchical generalized linear models by normal distributions. When having repeated observations on individuals or genetic groups, the estimates were found to be unbiased. For the traits studied, the estimated heritability values for the mean and the residual variance, and the genetic coefficients of variation, were found in the usual ranges reported. The genetic correlation between mean and residual variance was estimated for the pig traits only, and was found to be favorable for litter size, but unfavorable for teat count.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
BACKGROUND: Misoprostol is established for the treatment of incomplete abortion but has not been systematically assessed when provided by midwives at district level in a low-resource setting. We investigated the effectiveness and safety of midwives diagnosing and treating incomplete abortion with misoprostol, compared with physicians. METHODS: We did a multicentre randomised controlled equivalence trial at district level at six facilities in Uganda. Eligibility criteria were women with signs of incomplete abortion. We randomly allocated women with first-trimester incomplete abortion to clinical assessment and treatment with misoprostol either by a physician or a midwife. The randomisation (1:1) was done in blocks of 12 and was stratified for study site. Primary outcome was complete abortion not needing surgical intervention within 14-28 days after initial treatment. The study was not masked. Analysis of the primary outcome was done on the per-protocol population with a generalised linear-mixed effects model. The predefined equivalence range was -4% to 4%. The trial was registered at ClinicalTrials.gov, number NCT01844024. FINDINGS: From April 30, 2013, to July 21, 2014, 1108 women were assessed for eligibility. 1010 women were randomly assigned to each group (506 to midwife group and 504 to physician group). 955 women (472 in the midwife group and 483 in the physician group) were included in the per-protocol analysis. 452 (95·8%) of women in the midwife group had complete abortion and 467 (96·7%) in the physician group. The model-based risk difference for midwife versus physician group was -0·8% (95% CI -2·9 to 1·4), falling within the predefined equivalence range (-4% to 4%). The overall proportion of women with incomplete abortion was 3·8% (36/955), similarly distributed between the two groups (4·2% [20/472] in the midwife group, 3·3% [16/483] in the physician group). No serious adverse events were recorded. INTERPRETATION: Diagnosis and treatment of incomplete abortion with misoprostol by midwives is equally safe and effective as when provided by physicians, in a low-resource setting. Scaling up midwives' involvement in treatment of incomplete abortion with misoprostol at district level would increase access to safe post-abortion care. FUNDING: The Swedish Research Council, Karolinska Institutet, and Dalarna University.
Resumo:
Sistemas de previsão de cheias podem ser adequadamente utilizados quando o alcance é suficiente, em comparação com o tempo necessário para ações preventivas ou corretivas. Além disso, são fundamentalmente importantes a confiabilidade e a precisão das previsões. Previsões de níveis de inundação são sempre aproximações, e intervalos de confiança não são sempre aplicáveis, especialmente com graus de incerteza altos, o que produz intervalos de confiança muito grandes. Estes intervalos são problemáticos, em presença de níveis fluviais muito altos ou muito baixos. Neste estudo, previsões de níveis de cheia são efetuadas, tanto na forma numérica tradicional quanto na forma de categorias, para as quais utiliza-se um sistema especialista baseado em regras e inferências difusas. Metodologias e procedimentos computacionais para aprendizado, simulação e consulta são idealizados, e então desenvolvidos sob forma de um aplicativo (SELF – Sistema Especialista com uso de Lógica “Fuzzy”), com objetivo de pesquisa e operação. As comparações, com base nos aspectos de utilização para a previsão, de sistemas especialistas difusos e modelos empíricos lineares, revelam forte analogia, apesar das diferenças teóricas fundamentais existentes. As metodologias são aplicadas para previsão na bacia do rio Camaquã (15543 km2), para alcances entre 10 e 48 horas. Dificuldades práticas à aplicação são identificadas, resultando em soluções as quais constituem-se em avanços do conhecimento e da técnica. Previsões, tanto na forma numérica quanto categorizada são executadas com sucesso, com uso dos novos recursos. As avaliações e comparações das previsões são feitas utilizandose um novo grupo de estatísticas, derivadas das freqüências simultâneas de ocorrência de valores observados e preditos na mesma categoria, durante a simulação. Os efeitos da variação da densidade da rede são analisados, verificando-se que sistemas de previsão pluvio-hidrométrica em tempo atual são possíveis, mesmo com pequeno número de postos de aquisição de dados de chuva, para previsões sob forma de categorias difusas.