935 resultados para Latent variables
Resumo:
The reliability of measurement refers to unsystematic error in observed responses. Investigations of the prevalence of random error in stated estimates of willingness to pay (WTP) are important to an understanding of why tests of validity in CV can fail. However, published reliability studies have tended to adopt empirical methods that have practical and conceptual limitations when applied to WTP responses. This contention is supported in a review of contingent valuation reliability studies that demonstrate important limitations of existing approaches to WTP reliability. It is argued that empirical assessments of the reliability of contingent values may be better dealt with by using multiple indicators to measure the latent WTP distribution. This latent variable approach is demonstrated with data obtained from a WTP study for stormwater pollution abatement. Attitude variables were employed as a way of assessing the reliability of open-ended WTP (with benchmarked payment cards) for stormwater pollution abatement. The results indicated that participants' decisions to pay were reliably measured, but not the magnitude of the WTP bids. This finding highlights the need to better discern what is actually being measured in VVTP studies, (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
There is currently considerable interest in developing general non-linear density models based on latent, or hidden, variables. Such models have the ability to discover the presence of a relatively small number of underlying `causes' which, acting in combination, give rise to the apparent complexity of the observed data set. Unfortunately, to train such models generally requires large computational effort. In this paper we introduce a novel latent variable algorithm which retains the general non-linear capabilities of previous models but which uses a training procedure based on the EM algorithm. We demonstrate the performance of the model on a toy problem and on data from flow diagnostics for a multi-phase oil pipeline.
Resumo:
Researchers often develop and test conceptual models containing formative variables. In many cases, these formative variables are specified as being endogenous. This article provides a clarification of formative variable theory, distinguishing between the formative latent variable and the formative composite variable. When an endogenous latent variable relies on formative indicators for measurement, empirical studies can say nothing about the relationship between exogenous variables and the endogenous formative latent variable: conclusions can only be drawn regarding the exogenous variables' relationships with a composite variable. The authors also show the dangers associated with developing theory about antecedents to endogenous formative variables at the (aggregate) formative latent variable level. Modeling relationships with endogenous formative variables at the (disaggregate) indicator level informs richer theory development, and encourages more precise empirical testing. When antecedents' relationships with endogenous formative variables are modeled at the formative latent variable level rather than the formative indicator level, theory construction can verge on the superficial, and empirical findings can be ambiguous in substantive meaning.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Resumo:
Au Sénégal, les maladies diarrhéiques constituent un fardeau important, qui pèse encore lourdement sur la santé des enfants. Ces maladies sont influencées par un large éventail de facteurs, appartenant à différents niveaux et sphères d'analyse. Cet article analyse ces facteurs de risque et leur rôle relatif dans les maladies diarrhéiques de l'enfant à Dakar. Ce faisant, elle illustre une nouvelle approche pour synthétiser le réseau de ces déterminants. Une analyse en classes latentes (LCA) est d’abord menée, puis les variables latentes ainsi construites sont utilisées comme variables explicatives dans une régression logistique sur trois niveaux. Les résultats confirment que les déterminants des diarrhées chez l'enfant appartiennent aux trois niveaux d'analyse et que les facteurs comportementaux et l'assainissement du quartier jouent un rôle prépondérant. Les résultats illustrent aussi l'utilité des LCA pour synthétiser plusieurs indicateurs, afin de créer une image causale intégrée, tout en utilisant des modèles statistiques parcimonieux.
Resumo:
The information on climate variations is essential for the research of many subjects, such as the performance of buildings and agricultural production. However, recorded meteorological data are often incomplete. There may be a limited number of locations recorded, while the number of recorded climatic variables and the time intervals can also be inadequate. Therefore, the hourly data of key weather parameters as required by many building simulation programmes are typically not readily available. To overcome this gap in measured information, several empirical methods and weather data generators have been developed. They generally employ statistical analysis techniques to model the variations of individual climatic variables, while the possible interactions between different weather parameters are largely ignored. Based on a statistical analysis of 10 years historical hourly climatic data over all capital cities in Australia, this paper reports on the finding of strong correlations between several specific weather variables. It is found that there are strong linear correlations between the hourly variations of global solar irradiation (GSI) and dry bulb temperature (DBT), and between the hourly variations of DBT and relative humidity (RH). With an increase in GSI, DBT would generally increase, while the RH tends to decrease. However, no such a clear correlation can be found between the DBT and atmospheric pressure (P), and between the DBT and wind speed. These findings will be useful for the research and practice in building performance simulation.
Resumo:
An estimation of costs for maintenance and rehabilitation is subject to variation due to the uncertainties of input parameters. This paper presents the results of an analysis to identify input parameters that affect the prediction of variation in road deterioration. Road data obtained from 1688 km of a national highway located in the tropical northeast of Queensland in Australia were used in the analysis. Data were analysed using a probability-based method, the Monte Carlo simulation technique and HDM-4’s roughness prediction model. The results of the analysis indicated that among the input parameters the variability of pavement strength, rut depth, annual equivalent axle load and initial roughness affected the variability of the predicted roughness. The second part of the paper presents an analysis to assess the variation in cost estimates due to the variability of the overall identified critical input parameters.
Resumo:
Aim – To develop and assess the predictive capabilities of a statistical model that relates routinely collected Trauma Injury Severity Score (TRISS) variables to length of hospital stay (LOS) in survivors of traumatic injury. Method – Retrospective cohort study of adults who sustained a serious traumatic injury, and who survived until discharge from Auckland City, Middlemore, Waikato, or North Shore Hospitals between 2002 and 2006. Cubic-root transformed LOS was analysed using two-level mixed-effects regression models. Results – 1498 eligible patients were identified, 1446 (97%) injured from a blunt mechanism and 52 (3%) from a penetrating mechanism. For blunt mechanism trauma, 1096 (76%) were male, average age was 37 years (range: 15-94 years), and LOS and TRISS score information was available for 1362 patients. Spearman’s correlation and the median absolute prediction error between LOS and the original TRISS model was ρ=0.31 and 10.8 days, respectively, and between LOS and the final multivariable two-level mixed-effects regression model was ρ=0.38 and 6.0 days, respectively. Insufficient data were available for the analysis of penetrating mechanism models. Conclusions – Neither the original TRISS model nor the refined model has sufficient ability to accurately or reliably predict LOS. Additional predictor variables for LOS and other indicators for morbidity need to be considered.