968 resultados para Information Matrix
Resumo:
Sequential Monte Carlo (SMC) methods are popular computational tools for Bayesian inference in non-linear non-Gaussian state-space models. For this class of models, we propose SMC algorithms to compute the score vector and observed information matrix recursively in time. We propose two different SMC implementations, one with computational complexity $\mathcal{O}(N)$ and the other with complexity $\mathcal{O}(N^{2})$ where $N$ is the number of importance sampling draws. Although cheaper, the performance of the $\mathcal{O}(N)$ method degrades quickly in time as it inherently relies on the SMC approximation of a sequence of probability distributions whose dimension is increasing linearly with time. In particular, even under strong \textit{mixing} assumptions, the variance of the estimates computed with the $\mathcal{O}(N)$ method increases at least quadratically in time. The $\mathcal{O}(N^{2})$ is a non-standard SMC implementation that does not suffer from this rapid degrade. We then show how both methods can be used to perform batch and recursive parameter estimation.
Resumo:
2000 Mathematics Subject Classification: 33C90, 62E99.
Resumo:
Natural gradient learning is an efficient and principled method for improving on-line learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out efficient inversion and study the efficacy of a single step estimation of the Fisher information matrix. We analyse the proposed algorithm in a two-layer network, using a statistical mechanics framework which allows us to describe analytically the learning dynamics, and compare performance with true natural gradient learning and standard gradient descent.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
In this article, we extend the earlier work of Freeland and McCabe [Journal of time Series Analysis (2004) Vol. 25, pp. 701–722] and develop a general framework for maximum likelihood (ML) analysis of higher-order integer-valued autoregressive processes. Our exposition includes the case where the innovation sequence has a Poisson distribution and the thinning is binomial. A recursive representation of the transition probability of the model is proposed. Based on this transition probability, we derive expressions for the score function and the Fisher information matrix, which form the basis for ML estimation and inference. Similar to the results in Freeland and McCabe (2004), we show that the score function and the Fisher information matrix can be neatly represented as conditional expectations. Using the INAR(2) speci?cation with binomial thinning and Poisson innovations, we examine both the asymptotic e?ciency and ?nite sample properties of the ML estimator in relation to the widely used conditional least
squares (CLS) and Yule–Walker (YW) estimators. We conclude that, if the Poisson assumption can be justi?ed, there are substantial gains to be had from using ML especially when the thinning parameters are large.
Resumo:
In this study, we introduce an original distance definition for graphs, called the Markov-inverse-F measure (MiF). This measure enables the integration of classical graph theory indices with new knowledge pertaining to structural feature extraction from semantic networks. MiF improves the conventional Jaccard and/or Simpson indices, and reconciles both the geodesic information (random walk) and co-occurrence adjustment (degree balance and distribution). We measure the effectiveness of graph-based coefficients through the application of linguistic graph information for a neural activity recorded during conceptual processing in the human brain. Specifically, the MiF distance is computed between each of the nouns used in a previous neural experiment and each of the in-between words in a subgraph derived from the Edinburgh Word Association Thesaurus of English. From the MiF-based information matrix, a machine learning model can accurately obtain a scalar parameter that specifies the degree to which each voxel in (the MRI image of) the brain is activated by each word or each principal component of the intermediate semantic features. Furthermore, correlating the voxel information with the MiF-based principal components, a new computational neurolinguistics model with a network connectivity paradigm is created. This allows two dimensions of context space to be incorporated with both semantic and neural distributional representations.
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999), and a quality control data set.
Resumo:
In this article, we discuss inferential aspects of the measurement error regression models with null intercepts when the unknown quantity x (latent variable) follows a skew normal distribution. We examine first the maximum-likelihood approach to estimation via the EM algorithm by exploring statistical properties of the model considered. Then, the marginal likelihood, the score function and the observed information matrix of the observed quantities are presented allowing direct inference implementation. In order to discuss some diagnostics techniques in this type of models, we derive the appropriate matrices to assessing the local influence on the parameter estimates under different perturbation schemes. The results and methods developed in this paper are illustrated considering part of a real data set used by Hadgu and Koch [1999, Application of generalized estimating equations to a dental randomized clinical trial. Journal of Biopharmaceutical Statistics, 9, 161-178].
Resumo:
This paper considers an extension to the skew-normal model through the inclusion of an additional parameter which can lead to both uni- and bi-modal distributions. The paper presents various basic properties of this family of distributions and provides a stochastic representation which is useful for obtaining theoretical properties and to simulate from the distribution. Moreover, the singularity of the Fisher information matrix is investigated and maximum likelihood estimation for a random sample with no covariates is considered. The main motivation is thus to avoid using mixtures in fitting bimodal data as these are well known to be complicated to deal with, particularly because of identifiability problems. Data-based illustrations show that such model can be useful. Copyright (C) 2009 John Wiley & Sons, Ltd.
Resumo:
In this paper we introduce a new extension for the Birnbaum-Saunder distribution based on the family of the epsilon-skew-symmetric distributions studied in Arellano-Valle et al. (J Stat Plan Inference 128(2):427-443, 2005). The extension allows generating Birnbaun-Saunders type distributions able to deal with extreme or outlying observations (Dupuis and Mills, IEEE Trans Reliab 47:88-95, 1998). Basic properties such as moments and Fisher information matrix are also studied. Results of a real data application are reported illustrating good fitting properties of the proposed model.
Resumo:
In this article, we study further properties of a skew normal distribution, called the skew-normal-Cauchy (SNC) distribution by Nadarajah and Kotz (2003). A stochastic representation is obtained which allows alternative derivations for moments, moments generating function, and skewness and kurtosis coefficients. Issues related to singularity of the Fisher information matrix are investigated.
Resumo:
In this article, we study some results related to a specific class of distributions, called skew-curved-symmetric family of distributions that depends on a parameter controlling the skewness and kurtosis at the same time. Special elements of this family which are studied include symmetric and well-known asymmetric distributions. General results are given for the score function and the observed information matrix. It is shown that the observed information matrix is always singular for some special cases. We illustrate the flexibility of this class of distributions with an application to a real dataset on characteristics of Australian athletes.
Resumo:
In this article, we present the EM-algorithm for performing maximum likelihood estimation of an asymmetric linear calibration model with the assumption of skew-normally distributed error. A simulation study is conducted for evaluating the performance of the calibration estimator with interpolation and extrapolation situations. As one application in a real data set, we fitted the model studied in a dimensional measurement method used for calculating the testicular volume through a caliper and its calibration by using ultrasonography as the standard method. By applying this methodology, we do not need to transform the variables to have symmetrical errors. Another interesting aspect of the approach is that the developed transformation to make the information matrix nonsingular, when the skewness parameter is near zero, leaves the parameter of interest unchanged. Model fitting is implemented and the best choice between the usual calibration model and the model proposed in this article was evaluated by developing the Akaike information criterion, Schwarz`s Bayesian information criterion and Hannan-Quinn criterion.