965 resultados para Latent variable models
Resumo:
The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.
To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.
The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
Model misspecification affects the classical test statistics used to assess the fit of the Item Response Theory (IRT) models. Robust tests have been derived under model misspecification, as the Generalized Lagrange Multiplier and Hausman tests, but their use has not been largely explored in the IRT framework. In the first part of the thesis, we introduce the Generalized Lagrange Multiplier test to detect differential item response functioning in IRT models for binary data under model misspecification. By means of a simulation study and a real data analysis, we compare its performance with the classical Lagrange Multiplier test, computed using the Hessian and the cross-product matrix, and the Generalized Jackknife Score test. The power of these tests is computed empirically and asymptotically. The misspecifications considered are local dependence among items and non-normal distribution of the latent variable. The results highlight that, under mild model misspecification, all tests have good performance while, under strong model misspecification, the performance of the tests deteriorates. None of the tests considered show an overall superior performance than the others. In the second part of the thesis, we extend the Generalized Hausman test to detect non-normality of the latent variable distribution. To build the test, we consider a seminonparametric-IRT model, that assumes a more flexible latent variable distribution. By means of a simulation study and two real applications, we compare the performance of the Generalized Hausman test with the M2 limited information goodness-of-fit test and the Likelihood-Ratio test. Additionally, the information criteria are computed. The Generalized Hausman test has a better performance than the Likelihood-Ratio test in terms of Type I error rates and the M2 test in terms of power. The performance of the Generalized Hausman test and the information criteria deteriorates when the sample size is small and with a few items.
Resumo:
Maturity models are adopted to minimise our complexity perception over a truly complex phenomenon. In this sense, maturity models are tools that enable the assessment of the most relevant variables that impact on the outputs of a specific system. Ideally a maturity model should provide information concerning the qualitative and quantitative relationships between variables and how they affect the latent variable, that is, the maturity level. Management systems (MSs) are implemented worldwide and by an increasing number of companies. Integrated management systems (IMSs) consider the implementation of one or several MSs usually coexisting with the quality management subsystem (QMS). It is intended in this chapter to report a model based on two components that enables the assessment of the IMS maturity, considering the key process agents (KPAs) identified through a systematic literature review and the results collected from two surveys.
Resumo:
El déficit existente a nuestro país con respecto a la disponibilidad de indicadores cuantitativos con los que llevar a término un análisis coyuntural de la actividad industrial regional ha abierto un debate centrado en el estudio de cuál es la metodología más adecuada para elaborar indicadores de estas características. Dentro de este marco, en este trabajo se presentan las principales conclusiones obtenidas en anteriores estudios (Clar, et. al., 1997a, 1997b y 1998) sobre la idoneidad de extender las metodologías que actualmente se están aplicando a las regiones españolas para elaborar indicadores de la actividad industrial mediante métodos indirectos. Estas conclusiones llevan a plantear una estrategia distinta a las que actualmente se vienen aplicando. En concreto, se propone (siguiendo a Israilevich y Kuttner, 1993) un modelo de variables latentes para estimar el indicador de la producción industrial regional. Este tipo de modelo puede especificarse en términos de un modelo statespace y estimarse mediante el filtro de Kalman. Para validar la metodología propuesta se estiman unos indicadores de acuerdo con ella para tres de las cuatro regiones españolas que disponen d¿un Índice de Producción Industrial (IPI) elaborado mediante el método directo (Andalucía, Asturias y el País Vasco) y se comparan con los IPIs publicados (oficiales). Los resultados obtenidos muestran el buen comportamiento de l¿estrategia propuesta, abriendo así una línea de trabajo con la que subsanar el déficit al que se hacía referencia anteriormente
Resumo:
El déficit existente a nuestro país con respecto a la disponibilidad de indicadores cuantitativos con los que llevar a término un análisis coyuntural de la actividad industrial regional ha abierto un debate centrado en el estudio de cuál es la metodología más adecuada para elaborar indicadores de estas características. Dentro de este marco, en este trabajo se presentan las principales conclusiones obtenidas en anteriores estudios (Clar, et. al., 1997a, 1997b y 1998) sobre la idoneidad de extender las metodologías que actualmente se están aplicando a las regiones españolas para elaborar indicadores de la actividad industrial mediante métodos indirectos. Estas conclusiones llevan a plantear una estrategia distinta a las que actualmente se vienen aplicando. En concreto, se propone (siguiendo a Israilevich y Kuttner, 1993) un modelo de variables latentes para estimar el indicador de la producción industrial regional. Este tipo de modelo puede especificarse en términos de un modelo statespace y estimarse mediante el filtro de Kalman. Para validar la metodología propuesta se estiman unos indicadores de acuerdo con ella para tres de las cuatro regiones españolas que disponen d¿un Índice de Producción Industrial (IPI) elaborado mediante el método directo (Andalucía, Asturias y el País Vasco) y se comparan con los IPIs publicados (oficiales). Los resultados obtenidos muestran el buen comportamiento de l¿estrategia propuesta, abriendo así una línea de trabajo con la que subsanar el déficit al que se hacía referencia anteriormente
Resumo:
The partial least squares technique (PLS) has been touted as a viable alternative to latent variable structural equation modeling (SEM) for evaluating theoretical models in the differential psychology domain. We bring some balance to the discussion by reviewing the broader methodological literature to highlight: (1) the misleading characterization of PLS as an SEM method; (2) limitations of PLS for global model testing; (3) problems in testing the significance of path coefficients; (4) extremely high false positive rates when using empirical confidence intervals in conjunction with a new "sign change correction" for path coefficients; (5) misconceptions surrounding the supposedly superior ability of PLS to handle small sample sizes and non-normality; and (6) conceptual and statistical problems with formative measurement and the application of PLS to such models. Additionally, we also reanalyze the dataset provided by Willaby et al. (2015; doi:10.1016/j.paid.2014.09.008) to highlight the limitations of PLS. Our broader review and analysis of the available evidence makes it clear that PLS is not useful for statistical estimation and testing.
Resumo:
Dans le domaine des neurosciences computationnelles, l'hypothèse a été émise que le système visuel, depuis la rétine et jusqu'au cortex visuel primaire au moins, ajuste continuellement un modèle probabiliste avec des variables latentes, à son flux de perceptions. Ni le modèle exact, ni la méthode exacte utilisée pour l'ajustement ne sont connus, mais les algorithmes existants qui permettent l'ajustement de tels modèles ont besoin de faire une estimation conditionnelle des variables latentes. Cela nous peut nous aider à comprendre pourquoi le système visuel pourrait ajuster un tel modèle; si le modèle est approprié, ces estimé conditionnels peuvent aussi former une excellente représentation, qui permettent d'analyser le contenu sémantique des images perçues. Le travail présenté ici utilise la performance en classification d'images (discrimination entre des types d'objets communs) comme base pour comparer des modèles du système visuel, et des algorithmes pour ajuster ces modèles (vus comme des densités de probabilité) à des images. Cette thèse (a) montre que des modèles basés sur les cellules complexes de l'aire visuelle V1 généralisent mieux à partir d'exemples d'entraînement étiquetés que les réseaux de neurones conventionnels, dont les unités cachées sont plus semblables aux cellules simples de V1; (b) présente une nouvelle interprétation des modèles du système visuels basés sur des cellules complexes, comme distributions de probabilités, ainsi que de nouveaux algorithmes pour les ajuster à des données; et (c) montre que ces modèles forment des représentations qui sont meilleures pour la classification d'images, après avoir été entraînés comme des modèles de probabilités. Deux innovations techniques additionnelles, qui ont rendu ce travail possible, sont également décrites : un algorithme de recherche aléatoire pour sélectionner des hyper-paramètres, et un compilateur pour des expressions mathématiques matricielles, qui peut optimiser ces expressions pour processeur central (CPU) et graphique (GPU).
Resumo:
Cette étude longitudinale visait à vérifier si les traits de personnalité (selon le modèle en cinq facteurs, « Big Five ») au début de l’adolescence (12-13 ans) permettent de prédire les symptômes intériorisés deux ans plus tard (14-15 ans), en contrôlant pour le niveau initial de symptômes intériorisés ainsi que l’influence de plusieurs facteurs de risque connus. Les données employées proviennent d’une étude longitudinale prospective. L’échantillon compte 1036 adolescents provenant de huit écoles secondaires québécoises. Les adolescents ont répondu à un questionnaire autorévélé. Des modèles d’équations structurales ont d’abord démontré la pertinence de conceptualiser les symptômes intériorisés comme une variable latente. D’autres modèles ont démontré que certains traits de personnalité prédisent effectivement les symptômes intériorisés ultérieurs. Cependant, contrairement aux études effectuées auprès d’adultes, le rôle de la Stabilité émotionnelle et de l’Extraversion n’est pas significatif après que l’influence de facteurs de risque connus et du sexe ait été contrôlée. Ce sont plutôt le Contrôle et l’Amabilité qui sont significativement reliés aux symptômes intériorisés ultérieurs dans la présente étude. Les résultats soulignent également le rôle important des facteurs de risque liés aux relations avec les pairs. Finalement, des modèles d’équations structurales multi-groupes ont mis en évidence des différences sexuelles significatives dans les relations prédictives. Cette étude confirme que les traits de personnalité des adolescents peuvent jouer un rôle dans le développement des symptômes intériorisés, ce qui leur confère une pertinence théorique et clinique.
Resumo:
Cette étude longitudinale visait à évaluer si les traits de personnalité des adolescents permettent de prédire leurs comportements antisociaux ultérieurs, après avoir contrôlé pour l’effet du niveau initial du comportement antisocial ainsi que celui de plusieurs facteurs de risque connus de ces comportements. L’échantillon utilisé compte 1036 adolescents provenant de huit écoles secondaires québécoises. Les adolescents ont été évalués à deux reprises, soit en secondaire 1 (12-13 ans) et en secondaire 3 (14-15 ans). Ils ont répondu à un questionnaire autorévélé. Des modèles d’équations structurales ont d’abord confirmé que la covariation entre différents comportements antisociaux des adolescents peut être expliquée par une variable latente. Les résultats ont confirmé que les traits de personnalité des adolescents à 12 et 13 ans prédisent leurs comportements antisociaux à 14 et 15 ans. En accord avec les études antérieures, l’Extraversion, le Contrôle et la Stabilité émotionnelle prédisent les comportements antisociaux futurs. Toutefois, l’effet de l’Amabilité disparait une fois que le niveau initial est contrôlé. Finalement, des modèles d’équations structurales multi-groupes ont permis de démontrer que certaines relations prédictives sont différentes selon le sexe. Les résultats de cette étude soulignent l’importance des traits de personnalité pour les théories du comportement antisocial ainsi que pour la pratique clinique.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
Skew-normal distribution is a class of distributions that includes the normal distributions as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in a multivariate, null intercept, measurement error model [R. Aoki, H. Bolfarine, J.A. Achcar, and D. Leao Pinto Jr, Bayesian analysis of a multivariate null intercept error-in -variables regression model, J. Biopharm. Stat. 13(4) (2003b), pp. 763-771] where the unobserved value of the covariate (latent variable) follows a skew-normal distribution. The results and methods are applied to a real dental clinical trial presented in [A. Hadgu and G. Koch, Application of generalized estimating equations to a dental randomized clinical trial, J. Biopharm. Stat. 9 (1999), pp. 161-178].
Resumo:
In this article, we discuss inferential aspects of the measurement error regression models with null intercepts when the unknown quantity x (latent variable) follows a skew normal distribution. We examine first the maximum-likelihood approach to estimation via the EM algorithm by exploring statistical properties of the model considered. Then, the marginal likelihood, the score function and the observed information matrix of the observed quantities are presented allowing direct inference implementation. In order to discuss some diagnostics techniques in this type of models, we derive the appropriate matrices to assessing the local influence on the parameter estimates under different perturbation schemes. The results and methods developed in this paper are illustrated considering part of a real data set used by Hadgu and Koch [1999, Application of generalized estimating equations to a dental randomized clinical trial. Journal of Biopharmaceutical Statistics, 9, 161-178].
Resumo:
Influence diagnostics methods are extended in this article to the Grubbs model when the unknown quantity x (latent variable) follows a skew-normal distribution. Diagnostic measures are derived from the case-deletion approach and the local influence approach under several perturbation schemes. The observed information matrix to the postulated model and Delta matrices to the corresponding perturbed models are derived. Results obtained for one real data set are reported, illustrating the usefulness of the proposed methodology.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.