36 resultados para Overdispersion
Resumo:
Methods to explicitly represent uncertainties in weather and climate models have been developed and refined over the past decade, and have reduced biases and improved forecast skill when implemented in the atmospheric component of models. These methods have not yet been applied to the land surface component of models. Since the land surface is strongly coupled to the atmospheric state at certain times and in certain places (such as the European summer of 2003), improvements in the representation of land surface uncertainty may potentially lead to improvements in atmospheric forecasts for such events. Here we analyse seasonal retrospective forecasts for 1981–2012 performed with the European Centre for Medium-Range Weather Forecasts’ (ECMWF) coupled ensemble forecast model. We consider two methods of incorporating uncertainty into the land surface model (H-TESSEL): stochastic perturbation of tendencies, and static perturbation of key soil parameters. We find that the perturbed parameter approach considerably improves the forecast of extreme air temperature for summer 2003, through better representation of negative soil moisture anomalies and upward sensible heat flux. Averaged across all the reforecasts the perturbed parameter experiment shows relatively little impact on the mean bias, suggesting perturbations of at least this magnitude can be applied to the land surface without any degradation of model climate. There is also little impact on skill averaged across all reforecasts and some evidence of overdispersion for soil moisture. The stochastic tendency experiments show a large overdispersion for the soil temperature fields, indicating that the perturbation here is too strong. There is also some indication that the forecast of the 2003 warm event is improved for the stochastic experiments, however the improvement is not as large as observed for the perturbed parameter experiment.
Resumo:
We propose a geoadditive negative binomial model (Geo-NB-GAM) for regional count data that allows us to address simultaneously some important methodological issues, such as spatial clustering, nonlinearities, and overdispersion. This model is applied to the study of location determinants of inward greenfield investments that occurred during 2003–2007 in 249 European regions. After presenting the data set and showing the presence of overdispersion and spatial clustering, we review the theoretical framework that motivates the choice of the location determinants included in the empirical model, and we highlight some reasons why the relationship between some of the covariates and the dependent variable might be nonlinear. The subsequent section first describes the solutions proposed by previous literature to tackle spatial clustering, nonlinearities, and overdispersion, and then presents the Geo-NB-GAM. The empirical analysis shows the good performance of Geo-NB-GAM. Notably, the inclusion of a geoadditive component (a smooth spatial trend surface) permits us to control for spatial unobserved heterogeneity that induces spatial clustering. Allowing for nonlinearities reveals, in keeping with theoretical predictions, that the positive effect of agglomeration economies fades as the density of economic activities reaches some threshold value. However, no matter how dense the economic activity becomes, our results suggest that congestion costs never overcome positive agglomeration externalities.
Resumo:
We analyze data obtained from a study designed to evaluate training effects on the performance of certain motor activities of Parkinson`s disease patients. Maximum likelihood methods were used to fit beta-binomial/Poisson regression models tailored to evaluate the effects of training on the numbers of attempted and successful specified manual movements in 1 min periods, controlling for disease stage and use of the preferred hand. We extend models previously considered by other authors in univariate settings to account for the repeated measures nature of the data. The results suggest that the expected number of attempts and successes increase with training, except for patients with advanced stages of the disease using the non-preferred hand. Copyright (c) 2008 John Wiley & Sons, Ltd.
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
In this paper, we propose a class of ACD-type models that accommodates overdispersion, intermittent dynamics, multiple regimes, and sign and size asymmetries in financial durations. In particular, our functional coefficient autoregressive conditional duration (FC-ACD) model relies on a smooth-transition autoregressive specification. The motivation lies on the fact that the latter yields a universal approximation if one lets the number of regimes grows without bound. After establishing that the sufficient conditions for strict stationarity do not exclude explosive regimes, we address model identifiability as well as the existence, consistency, and asymptotic normality of the quasi-maximum likelihood (QML) estimator for the FC-ACD model with a fixed number of regimes. In addition, we also discuss how to consistently estimate using a sieve approach a semiparametric variant of the FC-ACD model that takes the number of regimes to infinity. An empirical illustration indicates that our functional coefficient model is flexible enough to model IBM price durations.
Resumo:
In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome. © 2013 Copyright Taylor and Francis Group, LLC.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Background: Cancer is the second leading cause of death in Argentina, and there is little knowledge about its incidence. The first study based on population-based cancer registry described spatial incidence and indicated that there existed at least county-level aggregation. The aim of the present work is to model the incidence patterns for the most incidence cancer in Córdoba Province, Argentina, using information from the Córdoba Cancer Registry by performing multilevel mixed model approach to deal with dependence and unobserved heterogeneity coming from the geo-reference cancer occurrence. Methods: Standardized incidence rates (world standard population) (SIR) by sex based on 5-year age groups were calculated for 109 districts nested on 26 counties for the most incidence cancers in Cordoba using 2004 database. A Poisson twolevel random effect model representing unobserved heterogeneity between first level-districts and second level-counties was fitted to assess the spatial distribution of the overall and site specific cancer incidence rates. Results: SIR cancer at Córdoba province shown an average of 263.53±138.34 and 200.45±98.30 for men and women, respectively. Considering the ratio site specific mean SIR to the total mean, breast cancer ratio was 0.25±0.19, prostate cancer ratio was 0.12±0.10 and lower values for lung and colon cancer for both sexes. The Poisson two-level random intercepts model fitted for SIR data distributed with overdispersion shown significant hierarchical structure for the cancer incidence distribution. Conclusions: a strong spatial-nested effect for the cancer incidence in Córdoba was observed and will help to begin the study of the factors associated with it.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships between abundance and environmental covariates. Spatial Poisson process likelihoods can be used to simultaneously estimate detection and intensity parameters by modeling distance sampling data as a thinned spatial point process. A model-based spatial approach to distance sampling data has three main benefits: it allows complex and opportunistic transect designs to be employed, it allows estimation of abundance in small subregions, and it provides a framework to assess the effects of habitat or experimental manipulation on density. We demonstrate the model-based methodology with a small simulation study and analysis of the Dubbo weed data set. In addition, a simple ad hoc method for handling overdispersion is also proposed. The simulation study showed that the model-based approach compared favorably to conventional distance sampling methods for abundance estimation. In addition, the overdispersion correction performed adequately when the number of transects was high. Analysis of the Dubbo data set indicated a transect effect on abundance via Akaike’s information criterion model selection. Further goodness-of-fit analysis, however, indicated some potential confounding of intensity with the detection function.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.
Resumo:
In this paper, we propose a random intercept Poisson model in which the random effect is assumed to follow a generalized log-gamma (GLG) distribution. This random effect accommodates (or captures) the overdispersion in the counts and induces within-cluster correlation. We derive the first two moments for the marginal distribution as well as the intraclass correlation. Even though numerical integration methods are, in general, required for deriving the marginal models, we obtain the multivariate negative binomial model from a particular parameter setting of the hierarchical model. An iterative process is derived for obtaining the maximum likelihood estimates for the parameters in the multivariate negative binomial model. Residual analysis is proposed and two applications with real data are given for illustration. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The importance of competition between similar species in driving community assembly is much debated. Recently, phylogenetic patterns in species composition have been investigated to help resolve this question: phylogenetic clustering is taken to imply environmental filtering, and phylogenetic overdispersion to indicate limiting similarity between species. We used experimental plant communities with random species compositions and initially even abundance distributions to examine the development of phylogenetic pattern in species abundance distributions. Where composition was held constant by weeding, abundance distributions became overdispersed through time, but only in communities that contained distantly related clades, some with several species (i.e., a mix of closely and distantly related species). Phylogenetic pattern in composition therefore constrained the development of overdispersed abundance distributions, and this might indicate limiting similarity between close relatives and facilitation/complementarity between distant relatives. Comparing the phylogenetic patterns in these communities with those expected from the monoculture abundances of the constituent species revealed that interspecific competition caused the phylogenetic patterns. Opening experimental communities to colonization by all species in the species pool led to convergence in phylogenetic diversity. At convergence, communities were composed of several distantly related but species-rich clades and had overdispersed abundance distributions. This suggests that limiting similarity processes determine which species dominate a community but not which species occur in a community. Crucially, as our study was carried out in experimental communities, we could rule out local evolutionary or dispersal explanations for the patterns and identify ecological processes as the driving force, underlining the advantages of studying these processes in experimental communities. Our results show that phylogenetic relations between species provide a good guide to understanding community structure and add a new perspective to the evidence that niche complementarity is critical in driving community assembly.
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.