990 resultados para TRAIT MODELS
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.
Resumo:
Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. © 2005 IEEE.
Resumo:
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
A single-generation dataset consisting of 1,730 records from a selection program for high growth rate in giant freshwater prawn (GFP, Macrobrachium rosenbergii) was used to derive prediction equations for meat weight and meat yield. Models were based on body traits [body weight, total length and abdominal width (AW)] and carcass measurements (tail weight and exoskeleton-off weight). Lengths and width were adjusted for the systematic effects of selection line, male morphotypes and female reproductive status, and for the covariables of age at slaughter within sex and body weight. Body and meat weights adjusted for the same effects (except body weight) were used to calculate meat yield (expressed as percentage of tail weight/body weight and exoskeleton-off weight/body weight). The edible meat weight and yield in this GFP population ranged from 12 to 15 g and 37 to 45 %, respectively. The simple (Pearson) correlation coefficients between body traits (body weight, total length and AW) and meat weight were moderate to very high and positive (0.75–0.94), but the correlations between body traits and meat yield were negative (−0.47 to −0.74). There were strong linear positive relationships between measurements of body traits and meat weight, whereas relationships of body traits with meat yield were moderate and negative. Step-wise multiple regression analysis showed that the best model to predict meat weight included all body traits, with a coefficient of determination (R 2) of 0.99 and a correlation between observed and predicted values of meat weight of 0.99. The corresponding figures for meat yield were 0.91 and 0.95, respectively. Body weight or length was the best predictor of meat weight, explaining 91–94 % of observed variance when it was fitted alone in the model. By contrast, tail width explained a lower proportion (69–82 %) of total variance in the single trait models. It is concluded that in practical breeding programs, improvement of meat weight can be easily made through indirect selection for body trait combinations. The improvement of meat yield, albeit being more difficult, is possible by genetic means, with 91 % of the variation in the trait explained by the body and carcass traits examined in this study.
Resumo:
In this paper, we present different ofrailtyo models to analyze longitudinal data in the presence of covariates. These models incorporate the extra-Poisson variability and the possible correlation among the repeated counting data for each individual. Assuming a CD4 counting data set in HIV-infected patients, we develop a hierarchical Bayesian analysis considering the different proposed models and using Markov Chain Monte Carlo methods. We also discuss some Bayesian discrimination aspects for the choice of the best model.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Pós-graduação em Zootecnia - FCAV
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Genética e Melhoramento Animal - FCAV
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
Detailed knowledge on genetic diversity among germplasm is important for hybrid maize ( Zea mays L.) breeding. The objective of the study was to determine genetic diversity in widely grown hybrids in Southern Africa, and compare effectiveness of phenotypic analysis models for determining genetic distances between hybrids. Fifty hybrids were evaluated at one site with two replicates. The experiment was a randomized complete block design. Phenotypic and genotypic data were analyzed using SAS and Power Marker respectively. There was significant (p < 0.01) variation and diversity among hybrid brands but small within brand clusters. Polymorphic Information Content (PIC) ranged from 0.07 to 0.38 with an average of 0.34 and genetic distance ranged from 0.08 to 0.50 with an average of 0.43. SAH23 and SAH21 (0.48) and SAH33 and SAH3 (0.47) were the most distantly related hybrids. Both single nucleotide polymorphism (SNP) markers and phenotypic data models were effective for discriminating genotypes according to genetic distance. SNP markers revealed nine clusters of hybrids. The 12-trait phenotypic analysis model, revealed eight clusters at 85%, while the five-trait model revealed six clusters. Path analysis revealed significant direct and indirect effects of secondary traits on yield. Plant height and ear height were negatively correlated with grain yield meaning shorter hybrids gave high yield. Ear weight, days to anthesis, and number of ears had highest positive direct effects on yield. These traits can provide good selection index for high yielding maize hybrids. Results confirmed that diversity of hybrids is small within brands and also confirm that phenotypic trait models are effective for discriminating hybrids.