932 resultados para generalized linear-models
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
Even though antenatal care is universally regarded as important, determinants of demand for antenatal care have not been widely studied. Evidence concerning which and how socioeconomic conditions influence whether a pregnant woman attends or not at least one antenatal consultation or how these factors affect the absences to antenatal consultations is very limited. In order to generate this evidence, a two-stage analysis was performed with data from the Demographic and Health Survey carried out by Profamilia in Colombia during 2005. The first stage was run as a logit model showing the marginal effects on the probability of attending the first visit and an ordinary least squares model was performed for the second stage. It was found that mothers living in the pacific region as well as young mothers seem to have a lower probability of attending the first visit but these factors are not related to the number of absences to antenatal consultation once the first visit has been achieved. The effect of health insurance was surprising because of the differing effects that the health insurers showed. Some familiar and personal conditions such as willingness to have the last children and number of previous children, demonstrated to be important in the determination of demand. The effect of mother’s educational attainment was proved as important whereas the father’s educational achievement was not. This paper provides some elements for policy making in order to increase the demand inducement of antenatal care, as well as stimulating research on demand for specific issues on health.
Resumo:
We study the role of natural resource windfalls in explaining the efficiency of public expenditures. Using a rich dataset of expenditures and public good provision for 1,836 municipalities in Peru for period 2001-2010, we estimate a non-monotonic relationship between the efficiency of public good provision and the level of natural resource transfers. Local governments that were extremely favored by the boom of mineral prices were more efficient in using fiscal windfalls whereas those benefited with modest transfers were more inefficient. These results can be explained by the increase in political competition associated with the boom. However, the fact that increases in efficiency were related to reductions in public good provision casts doubts about the beneficial effects of political competition in promoting efficiency.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
Dynamic global vegetation models (DGVMs) typically rely on plant functional types (PFTs), which are assigned distinct environmental tolerances and replace one another progressively along environmental gradients. Fixed values of traits are assigned to each PFT; modelled trait variation along gradients is thus driven by PFT replacement. But empirical studies have revealed "universal" scaling relationships (quantitative trait variations with climate that are similar within and between species, PFTs and communities); and continuous, adaptive trait variation has been proposed to replace PFTs as the basis for next-generation DGVMs. Here we analyse quantitative leaf-trait variation on long temperature and moisture gradients in China with a view to understanding the relative importance of PFT replacement vs. continuous adaptive variation within PFTs. Leaf area (LA), specific leaf area (SLA), leaf dry matter content (LDMC) and nitrogen content of dry matter were measured on all species at 80 sites ranging from temperate to tropical climates and from dense forests to deserts. Chlorophyll fluorescence traits and carbon, phosphorus and potassium contents were measured at 47 sites. Generalized linear models were used to relate log-transformed trait values to growing-season temperature and moisture indices, with or without PFT identity as a predictor, and to test for differences in trait responses among PFTs. Continuous trait variation was found to be ubiquitous. Responses to moisture availability were generally similar within and between PFTs, but biophysical traits (LA, SLA and LDMC) of forbs and grasses responded differently from woody plants. SLA and LDMC responses to temperature were dominated by the prevalence of evergreen PFTs with thick, dense leaves at the warm end of the gradient. Nutrient (N, P and K) responses to climate gradients were generally similar within all PFTs. Area-based nutrients generally declined with moisture; Narea and Karea declined with temperature, but Parea increased with temperature. Although the adaptive nature of many of these trait-climate relationships is understood qualitatively, a key challenge for modelling is to predict them quantitatively. Models must take into account that community-level responses to climatic gradients can be influenced by shifts in PFT composition, such as the replacement of deciduous by evergreen trees, which may run either parallel or counter to trait variation within PFTs. The importance of PFT shifts varies among traits, being important for biophysical traits but less so for physiological and chemical traits. Finally, models should take account of the diversity of trait values that is found in all sites and PFTs, representing the "pool" of variation that is locally available for the natural adaptation of ecosystem function to environmental change.
Resumo:
Deforestation in Brazilian Amazonia accounts for a disproportionate global scale fraction of both carbon emissions from biomass burning and biodiversity erosion through habitat loss. Here we use field- and remote-sensing data to examine the effects of private landholding size on the amount and type of forest cover retained within economically active rural properties in an aging southern Amazonian deforestation frontier. Data on both upland and riparian forest cover from a survey of 300 rural properties indicated that 49.4% (SD = 29.0%) of the total forest cover was maintained as of 2007. and that property size is a key regional-scale determinant of patterns of deforestation and land-use change. Small properties (<= 150 ha) retained a lower proportion of forest (20.7%, SD = 17.6) than did large properties (>150 ha; 55.6%, SD = 27.2). Generalized linear models showed that property size had a positive effect on remaining areas of both upland and total forest cover. Using a Landsat time-series, the age of first clear-cutting that could be mapped within the boundaries of each property had a negative effect on the proportion of upland, riparian, and total forest cover retained. Based on these data, we show contrasts in land-use strategies between smallholders and largeholders, as well as differences in compliance with legal requirements in relation to minimum forest cover set-asides within private landholdings. This suggests that property size structure must be explicitly considered in landscape-scale conservation planning initiatives guiding agro-pastoral frontier expansion into remaining areas of tropical forest. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The humpback whale (Megaptera novaeangliae) population that uses Abrolhos Bank, off the east coast of Brazil as a breeding ground is increasing. To describe temporal changes in the relative abundance of humpback whales around Abrolhos, seven years (1998-2004) of whale count data were collected during July through to November. During one-hour-scans, observers determined group size within 9.3 km (5 n.m.) of a land-based observing station. A total Of 930 scans, comprising 7996 sightings of adults and 2044 calves were analysed using generalized linear models that included variables for time of day, day of the season, years and two-way interactions as possible predictors. The pattern observed was the gradual build-up and decline in whale counts within seasons. Patterns and peaks of adult and calf counts varied among years. Although fluctuation was observed, there was generally an increasing trend in adult counts among years. Calf counts increased only in 2004. These fluctuations may have been caused by some environmental conditions in humpback whales` summering grounds and also by changes in spatial-temporal concentrations in Abrolhos Bank. The general pattern observed within the study area mirrored what was observed in the whole Abrolhos Bank. Knowledge of the consistency with which humpback whales use this important nursing area should prove beneficial for designing future monitoring programmes especially related to whale watching activities around Abrolhos Archipelago.
Resumo:
Local influence diagnostics based on estimating equations as the role of a gradient vector derived from any fit function are developed for repeated measures regression analysis. Our proposal generalizes tools used in other studies (Cook, 1986: Cadigan and Farrell, 2002), considering herein local influence diagnostics for a statistical model where estimation involves an estimating equation in which all observations are not necessarily independent of each other. Moreover, the measures of local influence are illustrated with some simulated data sets to assess influential observations. Applications using real data are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this article, we deal with the issue of performing accurate small-sample inference in the Birnbaum-Saunders regression model, which can be useful for modeling lifetime or reliability data. We derive a Bartlett-type correction for the score test and numerically compare the corrected test with the usual score test and some other competitors.
Resumo:
Calculations of local influence curvatures and leverage have been well developed when the parameters are unrestricted. In this article, we discuss the assessment of local influence and leverage under linear equality parameter constraints with extensions to inequality constraints. Using a penalized quadratic function we express the normal curvature of local influence for arbitrary perturbation schemes and the generalized leverage matrix in interpretable forms, which depend on restricted and unrestricted components. The results are quite general and can be applied in various statistical models. In particular, we derive the normal curvature under three useful perturbation schemes for generalized linear models. Four illustrative examples are analyzed by the methodology developed in the article.
Resumo:
Animal traits differ not only in mean, but also in variation around the mean. For instance, one sire’s daughter group may be very homogeneous, while another sire’s daughters are much more heterogeneous in performance. The difference in residual variance can partially be explained by genetic differences. Models for such genetic heterogeneity of environmental variance include genetic effects for the mean and residual variance, and a correlation between the genetic effects for the mean and residual variance to measure how the residual variance might vary with the mean. The aim of this thesis was to develop a method based on double hierarchical generalized linear models for estimating genetic heteroscedasticity, and to apply it on four traits in two domestic animal species; teat count and litter size in pigs, and milk production and somatic cell count in dairy cows. The method developed is fast and has been implemented in software that is widely used in animal breeding, which makes it convenient to use. It is based on an approximation of double hierarchical generalized linear models by normal distributions. When having repeated observations on individuals or genetic groups, the estimates were found to be unbiased. For the traits studied, the estimated heritability values for the mean and the residual variance, and the genetic coefficients of variation, were found in the usual ranges reported. The genetic correlation between mean and residual variance was estimated for the pig traits only, and was found to be favorable for litter size, but unfavorable for teat count.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)