976 resultados para latent binary variables
Resumo:
We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dissimilar latent values. We formulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively couple the structure of the hash codes with continuously growing structure of the neighbourhood preserving infinite latent feature space.
Resumo:
Ecosystems consist of complex dynamic interactions among species and the environment, the understanding of which has implications for predicting the environmental response to changes in climate and biodiversity. However, with the recent adoption of more explorative tools, like Bayesian networks, in predictive ecology, few assumptions can be made about the data and complex, spatially varying interactions can be recovered from collected field data. In this study, we compare Bayesian network modelling approaches accounting for latent effects to reveal species dynamics for 7 geographically and temporally varied areas within the North Sea. We also apply structure learning techniques to identify functional relationships such as prey–predator between trophic groups of species that vary across space and time. We examine if the use of a general hidden variable can reflect overall changes in the trophic dynamics of each spatial system and whether the inclusion of a specific hidden variable can model unmeasured group of species. The general hidden variable appears to capture changes in the variance of different groups of species biomass. Models that include both general and specific hidden variables resulted in identifying similarity with the underlying food web dynamics and modelling spatial unmeasured effect. We predict the biomass of the trophic groups and find that predictive accuracy varies with the models' features and across the different spatial areas thus proposing a model that allows for spatial autocorrelation and two hidden variables. Our proposed model was able to produce novel insights on this ecosystem's dynamics and ecological interactions mainly because we account for the heterogeneous nature of the driving factors within each area and their changes over time. Our findings demonstrate that accounting for additional sources of variation, by combining structure learning from data and experts' knowledge in the model architecture, has the potential for gaining deeper insights into the structure and stability of ecosystems. Finally, we were able to discover meaningful functional networks that were spatially and temporally differentiated with the particular mechanisms varying from trophic associations through interactions with climate and commercial fisheries.
Resumo:
In this paper we compare conceptualising single factor technical and allocative efficiency as indicators of a single latent variable, or as separate observed variables. In the former case, the impacts on both efficiency types are analysed by means of structural equation modeling (SEM), in the latter by seemingly unrelated regression (SUR). We compare estimation results of the two approaches based on a dataset on single factor irrigation water use efficiency obtained from a survey of 360 farmers in the Guanzhong Plain, China. The main methodological findings are that SEM allows identification of the most important dimension of irrigation water efficiency (technical efficiency) via comparison of their factor scores and reliability. Moreover, it reduces multicollinearity and attenuation bias. It thus is preferable to SUR. The SEM estimates show that perception of water scarcity is the most important positive determinant of both types of efficiency, followed by irrigation infrastructure, income and water price. Furthermore, there is a strong negative reverse effect from efficiency on perception.
Resumo:
This paper assesses the empirical performance of an intertemporal option pricing model with latent variables which generalizes the Hull-White stochastic volatility formula. Using this generalized formula in an ad-hoc fashion to extract two implicit parameters and forecast next day S&P 500 option prices, we obtain similar pricing errors than with implied volatility alone as in the Hull-White case. When we specialize this model to an equilibrium recursive utility model, we show through simulations that option prices are more informative than stock prices about the structural parameters of the model. We also show that a simple method of moments with a panel of option prices provides good estimates of the parameters of the model. This lays the ground for an empirical assessment of this equilibrium model with S&P 500 option prices in terms of pricing errors.
Resumo:
Les simulations ont été implémentées avec le programme Java.
Resumo:
We review several asymmetrical links for binary regression models and present a unified approach for two skew-probit links proposed in the literature. Moreover, under skew-probit link, conditions for the existence of the ML estimators and the posterior distribution under improper priors are established. The framework proposed here considers two sets of latent variables which are helpful to implement the Bayesian MCMC approach. A simulation study to criteria for models comparison is conducted and two applications are made. Using different Bayesian criteria we show that, for these data sets, the skew-probit links are better than alternative links proposed in the literature.
Resumo:
Model diagnostics is an integral part of model determination and an important part of the model diagnostics is residual analysis. We adapt and implement residuals considered in the literature for the probit, logistic and skew-probit links under binary regression. New latent residuals for the skew-probit link are proposed here. We have detected the presence of outliers using the residuals proposed here for different models in a simulated dataset and a real medical dataset.
Resumo:
The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.
Resumo:
Dealing with latent constructs (loaded by reflective and congeneric measures) cross-culturally compared means studying how these unobserved variables vary, and/or covary each other, after controlling for possibly disturbing cultural forces. This yields to the so-called ‘measurement invariance’ matter that refers to the extent to which data collected by the same multi-item measurement instrument (i.e., self-reported questionnaire of items underlying common latent constructs) are comparable across different cultural environments. As a matter of fact, it would be unthinkable exploring latent variables heterogeneity (e.g., latent means; latent levels of deviations from the means (i.e., latent variances), latent levels of shared variation from the respective means (i.e., latent covariances), levels of magnitude of structural path coefficients with regard to causal relations among latent variables) across different populations without controlling for cultural bias in the underlying measures. Furthermore, it would be unrealistic to assess this latter correction without using a framework that is able to take into account all these potential cultural biases across populations simultaneously. Since the real world ‘acts’ in a simultaneous way as well. As a consequence, I, as researcher, may want to control for cultural forces hypothesizing they are all acting at the same time throughout groups of comparison and therefore examining if they are inflating or suppressing my new estimations with hierarchical nested constraints on the original estimated parameters. Multi Sample Structural Equation Modeling-based Confirmatory Factor Analysis (MS-SEM-based CFA) still represents a dominant and flexible statistical framework to work out this potential cultural bias in a simultaneous way. With this dissertation I wanted to make an attempt to introduce new viewpoints on measurement invariance handled under covariance-based SEM framework by means of a consumer behavior modeling application on functional food choices.
Resumo:
Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.
Resumo:
Integrated choice and latent variable (ICLV) models represent a promising new class of models which merge classic choice models with the structural equation approach (SEM) for latent variables. Despite their conceptual appeal, applications of ICLV models in marketing remain rare. We extend previous ICLV applications by first estimating a multinomial choice model and, second, by estimating hierarchical relations between latent variables. An empirical study on travel mode choice clearly demonstrates the value of ICLV models to enhance the understanding of choice processes. In addition to the usually studied directly observable variables such as travel time, we show how abstract motivations such as power and hedonism as well as attitudes such as a desire for flexibility impact on travel mode choice. Furthermore, we show that it is possible to estimate such a complex ICLV model with the widely available structural equation modeling package Mplus. This finding is likely to encourage more widespread application of this appealing model class in the marketing field.
Resumo:
The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.