944 resultados para STATISTICAL HADRONIZATION
Resumo:
We discuss the connection between information and copula theories by showing that a copula can be employed to decompose the information content of a multivariate distribution into marginal and dependence components, with the latter quantified by the mutual information. We define the information excess as a measure of deviation from a maximum-entropy distribution. The idea of marginal invariant dependence measures is also discussed and used to show that empirical linear correlation underestimates the amplitude of the actual correlation in the case of non-Gaussian marginals. The mutual information is shown to provide an upper bound for the asymptotic empirical log-likelihood of a copula. An analytical expression for the information excess of T-copulas is provided, allowing for simple model identification within this family. We illustrate the framework in a financial data set. Copyright (C) EPLA, 2009
Resumo:
Approximate Lie symmetries of the Navier-Stokes equations are used for the applications to scaling phenomenon arising in turbulence. In particular, we show that the Lie symmetries of the Euler equations are inherited by the Navier-Stokes equations in the form of approximate symmetries that allows to involve the Reynolds number dependence into scaling laws. Moreover, the optimal systems of all finite-dimensional Lie subalgebras of the approximate symmetry transformations of the Navier-Stokes are constructed. We show how the scaling groups obtained can be used to introduce the Reynolds number dependence into scaling laws explicitly for stationary parallel turbulent shear flows. This is demonstrated in the framework of a new approach to derive scaling laws based on symmetry analysis [11]-[13].
Resumo:
We propose a likelihood ratio test ( LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrapbased approach. LRT is shown to be significantly faster and statistically powerful even within non- Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests is also provided.
Resumo:
Quadratic assignment problems (QAPs) are commonly solved by heuristic methods, where the optimum is sought iteratively. Heuristics are known to provide good solutions but the quality of the solutions, i.e., the confidence interval of the solution is unknown. This paper uses statistical optimum estimation techniques (SOETs) to assess the quality of Genetic algorithm solutions for QAPs. We examine the functioning of different SOETs regarding biasness, coverage rate and length of interval, and then we compare the SOET lower bound with deterministic ones. The commonly used deterministic bounds are confined to only a few algorithms. We show that, the Jackknife estimators have better performance than Weibull estimators, and when the number of heuristic solutions is as large as 100, higher order JK-estimators perform better than lower order ones. Compared with the deterministic bounds, the SOET lower bound performs significantly better than most deterministic lower bounds and is comparable with the best deterministic ones.
Resumo:
Solutions to combinatorial optimization problems, such as problems of locating facilities, frequently rely on heuristics to minimize the objective function. The optimum is sought iteratively and a criterion is needed to decide when the procedure (almost) attains it. Pre-setting the number of iterations dominates in OR applications, which implies that the quality of the solution cannot be ascertained. A small, almost dormant, branch of the literature suggests using statistical principles to estimate the minimum and its bounds as a tool to decide upon stopping and evaluating the quality of the solution. In this paper we examine the functioning of statistical bounds obtained from four different estimators by using simulated annealing on p-median test problems taken from Beasley’s OR-library. We find the Weibull estimator and the 2nd order Jackknife estimator preferable and the requirement of sample size to be about 10 being much less than the current recommendation. However, reliable statistical bounds are found to depend critically on a sample of heuristic solutions of high quality and we give a simple statistic useful for checking the quality. We end the paper with an illustration on using statistical bounds in a problem of locating some 70 distribution centers of the Swedish Post in one Swedish region.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.
Resumo:
Data available on continuos-time diffusions are always sampled discretely in time. In most cases, the likelihood function of the observations is not directly computable. This survey covers a sample of the statistical methods that have been developed to solve this problem. We concentrate on some recent contributions to the literature based on three di§erent approaches to the problem: an improvement of the Euler-Maruyama discretization scheme, the use of Martingale Estimating Functions and the application of Generalized Method of Moments (GMM).
Resumo:
Data available on continuous-time diffusions are always sampled discretely in time. In most cases, the likelihood function of the observations is not directly computable. This survey covers a sample of the statistical methods that have been developed to solve this problem. We concentrate on some recent contributions to the literature based on three di§erent approaches to the problem: an improvement of the Euler-Maruyama discretization scheme, the employment of Martingale Estimating Functions, and the application of Generalized Method of Moments (GMM).
Resumo:
Atypical points in the data may result in meaningless e±cient frontiers. This follows since portfolios constructed using classical estimates may re°ect neither the usual nor the unusual days patterns. On the other hand, portfolios constructed using robust approaches are able to capture just the dynamics of the usual days, which constitute the majority of the business days. In this paper we propose an statistical model and a robust estimation procedure to obtain an e±cient frontier which would take into account the behavior of both the usual and most of the atypical days. We show, using real data and simulations, that portfolios constructed in this way require less frequent rebalancing, and may yield higher expected returns for any risk level.
Resumo:
This paper uses an output oriented Data Envelopment Analysis (DEA) measure of technical efficiency to assess the technical efficiencies of the Brazilian banking system. Four approaches to estimation are compared in order to assess the significance of factors affecting inefficiency. These are nonparametric Analysis of Covariance, maximum likelihood using a family of exponential distributions, maximum likelihood using a family of truncated normal distributions, and the normal Tobit model. The sole focus of the paper is on a combined measure of output and the data analyzed refers to the year 2001. The factors of interest in the analysis and likely to affect efficiency are bank nature (multiple and commercial), bank type (credit, business, bursary and retail), bank size (large, medium, small and micro), bank control (private and public), bank origin (domestic and foreign), and non-performing loans. The latter is a measure of bank risk. All quantitative variables, including non-performing loans, are measured on a per employee basis. The best fits to the data are provided by the exponential family and the nonparametric Analysis of Covariance. The significance of a factor however varies according to the model fit although it can be said that there is some agreements between the best models. A highly significant association in all models fitted is observed only for nonperforming loans. The nonparametric Analysis of Covariance is more consistent with the inefficiency median responses observed for the qualitative factors. The findings of the analysis reinforce the significant association of the level of bank inefficiency, measured by DEA residuals, with the risk of bank failure.