971 resultados para Likelihood Functions
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
Background: Calluna vulgaris is one of the most important landscaping plants produced in Germany. Its enormous economic success is due to the prolonged flower attractiveness of mutants in flower morphology, the so-called bud-bloomers. In this study, we present the first genetic linkage map of C. vulgaris in which we mapped a locus of the economically highly desired trait " flower type" .Results: The map was constructed in JoinMap 4.1. using 535 AFLP markers from a single mapping population. A large fraction (40%) of markers showed distorted segregation. To test the effect of segregation distortion on linkage estimation, these markers were sorted regarding their segregation ratio and added in groups to the data set. The plausibility of group formation was evaluated by comparison of the " two-way pseudo-testcross" and the " integrated" mapping approach. Furthermore, regression mapping was compared to the multipoint-likelihood algorithm. The majority of maps constructed by different combinations of these methods consisted of eight linkage groups corresponding to the chromosome number of C. vulgaris.Conclusions: All maps confirmed the independent inheritance of the most important horticultural traits " flower type" , " flower colour" , and " leaf colour". An AFLP marker for the most important breeding target " flower type" was identified. The presented genetic map of C. vulgaris can now serve as a basis for further molecular marker selection and map-based cloning of the candidate gene encoding the unique flower architecture of C. vulgaris bud-bloomers. © 2013 Behrend et al.; licensee BioMed Central Ltd.
Resumo:
Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.
Resumo:
A quasi-maximum likelihood procedure for estimating the parameters of multi-dimensional diffusions is developed in which the transitional density is a multivariate Gaussian density with first and second moments approximating the true moments of the unknown density. For affine drift and diffusion functions, the moments are exactly those of the true transitional density and for nonlinear drift and diffusion functions the approximation is extremely good and is as effective as alternative methods based on likelihood approximations. The estimation procedure generalises to models with latent factors. A conditioning procedure is developed that allows parameter estimation in the absence of proxies.
Resumo:
The method of generalized estimating equations (GEE) is a popular tool for analysing longitudinal (panel) data. Often, the covariates collected are time-dependent in nature, for example, age, relapse status, monthly income. When using GEE to analyse longitudinal data with time-dependent covariates, crucial assumptions about the covariates are necessary for valid inferences to be drawn. When those assumptions do not hold or cannot be verified, Pepe and Anderson (1994, Communications in Statistics, Simulations and Computation 23, 939–951) advocated using an independence working correlation assumption in the GEE model as a robust approach. However, using GEE with the independence correlation assumption may lead to significant efficiency loss (Fitzmaurice, 1995, Biometrics 51, 309–317). In this article, we propose a method that extracts additional information from the estimating equations that are excluded by the independence assumption. The method always includes the estimating equations under the independence assumption and the contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries. We apply the method to a longitudinal study of the health of a group of Filipino children.
Resumo:
Robust methods are useful in making reliable statistical inferences when there are small deviations from the model assumptions. The widely used method of the generalized estimating equations can be "robustified" by replacing the standardized residuals with the M-residuals. If the Pearson residuals are assumed to be unbiased from zero, parameter estimators from the robust approach are asymptotically biased when error distributions are not symmetric. We propose a distribution-free method for correcting this bias. Our extensive numerical studies show that the proposed method can reduce the bias substantially. Examples are given for illustration.
Resumo:
This article presents frequentist inference of accelerated life test data of series systems with independent log-normal component lifetimes. The means of the component log-lifetimes are assumed to depend on the stress variables through a linear stress translation function that can accommodate the standard stress translation functions in the literature. An expectation-maximization algorithm is developed to obtain the maximum likelihood estimates of model parameters. The maximum likelihood estimates are then further refined by bootstrap, which is also used to infer about the component and system reliability metrics at usage stresses. The developed methodology is illustrated by analyzing a real as well as a simulated dataset. A simulation study is also carried out to judge the effectiveness of the bootstrap. It is found that in this model, application of bootstrap results in significant improvement over the simple maximum likelihood estimates.
Resumo:
We present a growth analysis model that combines large amounts of environmental data with limited amounts of biological data and apply it to Corbicula japonica. The model uses the maximum-likelihood method with the Akaike information criterion, which provides an objective criterion for model selection. An adequate distribution for describing a single cohort is selected from available probability density functions, which are expressed by location and scale parameters. Daily relative increase rates of the location parameter are expressed by a multivariate logistic function with environmental factors for each day and categorical variables indicating animal ages as independent variables. Daily relative increase rates of the scale parameter are expressed by an equation describing the relationship with the daily relative increase rate of the location parameter. Corbicula japonica grows to a modal shell length of 0.7 mm during the first year in Lake Abashiri. Compared with the attain-able maximum size of about 30 mm, the growth of juveniles is extremely slow because their growth is less susceptible to environmental factors until the second winter. The extremely slow growth in Lake Abashiri could be a geographical genetic variation within C. japonica.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Resumo:
Affiliation: Claudia Kleinman, Nicolas Rodrigue & Hervé Philippe : Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The vestibular system contributes to the control of posture and eye movements and is also involved in various cognitive functions including spatial navigation and memory. These functions are subtended by projections to a vestibular cortex, whose exact location in the human brain is still a matter of debate (Lopez and Blanke, 2011). The vestibular cortex can be defined as the network of all cortical areas receiving inputs from the vestibular system, including areas where vestibular signals influence the processing of other sensory (e.g. somatosensory and visual) and motor signals. Previous neuroimaging studies used caloric vestibular stimulation (CVS), galvanic vestibular stimulation (GVS), and auditory stimulation (clicks and short-tone bursts) to activate the vestibular receptors and localize the vestibular cortex. However, these three methods differ regarding the receptors stimulated (otoliths, semicircular canals) and the concurrent activation of the tactile, thermal, nociceptive and auditory systems. To evaluate the convergence between these methods and provide a statistical analysis of the localization of the human vestibular cortex, we performed an activation likelihood estimation (ALE) meta-analysis of neuroimaging studies using CVS, GVS, and auditory stimuli. We analyzed a total of 352 activation foci reported in 16 studies carried out in a total of 192 healthy participants. The results reveal that the main regions activated by CVS, GVS, or auditory stimuli were located in the Sylvian fissure, insula, retroinsular cortex, fronto-parietal operculum, superior temporal gyrus, and cingulate cortex. Conjunction analysis indicated that regions showing convergence between two stimulation methods were located in the median (short gyrus III) and posterior (long gyrus IV) insula, parietal operculum and retroinsular cortex (Ri). The only area of convergence between all three methods of stimulation was located in Ri. The data indicate that Ri, parietal operculum and posterior insula are vestibular regions where afferents converge from otoliths and semicircular canals, and may thus be involved in the processing of signals informing about body rotations, translations and tilts. Results from the meta-analysis are in agreement with electrophysiological recordings in monkeys showing main vestibular projections in the transitional zone between Ri, the insular granular field (Ig), and SII.
Resumo:
Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes.
Resumo:
2000 Mathematics Subject Classification: 62J12, 62F35