918 resultados para Latent class model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Switzerland has a complex human immunodeficiency virus (HIV) epidemic involving several populations. We examined transmission of HIV type 1 (HIV-1) in a national cohort study. Latent class analysis was used to identify socioeconomic and behavioral groups among 6,027 patients enrolled in the Swiss HIV Cohort Study between 2000 and 2011. Phylogenetic analysis of sequence data, available for 4,013 patients, was used to identify transmission clusters. Concordance between sociobehavioral groups and transmission clusters was assessed in correlation and multiple correspondence analyses. A total of 2,696 patients were infected with subtype B, 203 with subtype C, 196 with subtype A, and 733 with recombinant subtypes (mainly CRF02_AG and CRF01_AE). Latent class analysis identified 8 patient groups. Most transmission clusters of subtype B were shared between groups of gay men (groups 1-3) or between the heterosexual groups "heterosexual people of lower socioeconomic position" (group 4) and "injection drug users" (group 8). Clusters linking homosexual and heterosexual groups were associated with "older heterosexual and gay people on welfare" (group 5). "Migrant women in heterosexual partnerships" (group 6) and "heterosexual migrants on welfare" (group 7) shared non-B clusters with groups 4 and 5. Combining approaches from social and molecular epidemiology can provide insights into HIV-1 transmission and inform the design of prevention strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In machine learning, Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space, the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets. © 2010 Elsevier B.V.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Matrix factorization (MF) has evolved as one of the better practice to handle sparse data in field of recommender systems. Funk singular value decomposition (SVD) is a variant of MF that exists as state-of-the-art method that enabled winning the Netflix prize competition. The method is widely used with modifications in present day research in field of recommender systems. With the potential of data points to grow at very high velocity, it is prudent to devise newer methods that can handle such data accurately as well as efficiently than Funk-SVD in the context of recommender system. In view of the growing data points, I propose a latent factor model that caters to both accuracy and efficiency by reducing the number of latent features of either users or items making it less complex than Funk-SVD, where latent features of both users and items are equal and often larger. A comprehensive empirical evaluation of accuracy on two publicly available, amazon and ml-100 k datasets reveals the comparable accuracy and lesser complexity of proposed methods than Funk-SVD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper evaluates the production activities of Japanese airports by using a finite mixture model that allows controlling for unobserved heterogeneity. In doing so, a stochastic frontier latent class model, which allows the existence of different technologies, is adopted to estimate production frontiers. This procedure not only enables the identification of different groups of Japanese airports but also permits the analysis of their production efficiency. The main result is that there are two groups of Japanese airports, both following completely different "technologies" to obtain passengers and cargo, suggesting that business strategies need to be adapted to the characteristics of the airports. Some managerial implications are developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The scale of environmental problems in China is clearly evident. This paper analyses foreign direct investment (FDI) in China with a finite mixture model, also known as latent class model to understand the relationship between FDI and several pollutions. This is used to regresses FDI as function covariates including pollutants. The results reveal that FDI is affected by pollutants. There are cases reducing pollution deters foreign investment in China.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the gap in economic theory underlying the multidimensional concept of food security and observed data by deriving a composite food security index using the latent class model. The link between poverty and food security is then examined using the new food security index and the robustness of the link is compared with two unidimensional measures often used in the literature. Using Vietnam as a case study, it was found that a weak link exists for the rural but not for the urban composite food security index. The unidimensional measures on the other hand show a strong link in both the rural and urban regions. The results on the link are also different and mixed when two poverty types given by persistent and transient poverty are considered. These findings have important policy implications for a targeted approach to addressing food security.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Unlike previous studies’ finding on western and developed economies, income is a significant determinant of multidimensional deprivation in Vietnam. This first study on a developing country also incorporates food security in a latent class framework to compute a new multidimensional deprivation index. It was found that chronic poverty and not transient poverty has a detrimental effect on multidimensional deprivation and thus current poverty alleviation programs should potentially be tailored according to these poverty types to effectively combat multidimensional deprivation. The finding that 20% of non-poor are most deprived with85% of this group living in urban Vietnam also points to the need for a new form of targeted policy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper uses discrete choice models, supported by GIS data, to analyse the National Land Use Database, a register of more than 21,000 English brownfields - previously used sites with or without contamination that are currently unused or underused. Using spatial discrete choice models, including the first application of a spatial probit latent class model with class-specific neighbourhood effects, we find evidence of large local differences in the determinants of brownfields redevelopment in England and that the reuse decisions of adjacent sites affect the reuse of a site. We also find that sites with a history of industrial activities, large sites, and sites that are located in the poorest and bleakest areas of cities and regions of England are more difficult to redevelop. In particular, we find that the probability of reusing a brownfield increases by up to 8.5% for a site privately owned compared to a site publicly owned and between 15% - 30% if a site is located in London compared to the North West of England. We suggest that local tailored policies are more suitable than regional or national policies to boost the reuse of brownfield sites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Huit cent trente et un troupeaux de vaches laitières répartis dans 5 états américains ont été enrôlés dans une étude de cohorte prospective. Un modèle d’équations d'estimation généralisées a été utilisé pour étudier l'association entre les signes cliniques et la détection de salmonelles dans les fèces des animaux soupçonnés de salmonellose clinique. La sensibilité et la spécificité de la culture bactériologique ont été estimées à l’aide d’un modèle de classes latentes. Dix-huit pour cent des 874 échantillons provenant de veaux et 29% des 1479 échantillons de vaches adultes étaient positifs pour Salmonella spp. Il n’a pas été possible d’établir une association claire entre les différents signes cliniques observés et la détection de salmonelles. Les 2 sérotypes les plus fréquemment isolés étaient Typhimurium et Newport. La probabilité de détecter des salmonelles était plus élevée chez les veaux où un autre agent entéropathogène était également détecté. La proportion d’échantillons positifs était plus élevée parmi les vaches ayant reçu des antibiotiques dans les jours précédant l’échantillonnage. La sensibilité de la culture a été estimée à 0,48 (intervalle de crédibilité à 95% [ICr95%]: 0,22-0,95) pour les veaux et 0,78 (ICr95%: 0,55-0,99) pour les vaches. La spécificité de la culture était de 0,94 (ICr95%: 0,87-1,00) pour les veaux et de 0,96 (ICr95%: 0,90-1,00) pour les vaches. Malgré une sensibilité imparfaite, la culture bactériologique demeure utile pour obtenir une meilleure estimation de la probabilité post-test de salmonellose clinique chez un bovin laitier, par rapport à la probabilité estimée suite au seul examen clinique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La mammite subclinique est un problème de santé fréquent et coûteux. Les infections intra-mammaires (IIM) sont souvent détectées à l’aide de mesures du comptage des cellules somatiques (CCS). La culture bactériologique du lait est cependant requise afin d’identifier le pathogène en cause. À cause de cette difficulté, pratiquement toutes les recherches sur la mammite subclinique ont été centrées sur la prévalence d’IIM et les facteurs de risque pour l’incidence ou l’élimination des IIM sont peu connus. L’objectif principal de cette thèse était d’identifier les facteurs de risque modifiables associés à l’incidence, l’élimination et la prévalence d’IIM d’importance dans les troupeaux laitiers Canadiens. En premier lieu, une revue systématique de la littérature sur les associations entre pratiques utilisées à la ferme et CCS a été réalisée. Les pratiques de gestion constamment associées au CCS ont été identifiées et différentiées de celles faisant l’objet de rapports anecdotiques. Par la suite, un questionnaire bilingue a été développé, validé, et utilisé afin de mesurer les pratiques de gestion d’un échantillon de 90 troupeaux laitiers canadiens. Afin de valider l’outil, des mesures de répétabilité et de validité des items composant le questionnaire ont été analysées et une évaluation de l’équivalence des versions anglaise et française a été réalisée. Ces analyses ont permis d’identifier des items problématiques qui ont du être recatégorisés, lorsque possible, ou exclus des analyses subséquentes pour assurer une certaine qualité des données. La plupart des troupeaux étudiés utilisaient déjà la désinfection post-traite des trayons et le traitement universel des vaches au tarissement, mais beaucoup des pratiques recommandées n’étaient que peu utilisées. Ensuite, les facteurs de risque modifiables associés à l’incidence, à l’élimination et à la prévalence d’IIM à Staphylococcus aureus ont été investigués de manière longitudinale sur les 90 troupeaux sélectionnés. L’incidence d’IIM semblait être un déterminant plus important de la prévalence d’IIM du troupeau comparativement à l’élimination des IIM. Le port de gants durant la traite, la désinfection pré-traite des trayons, de même qu’une condition adéquate des bouts de trayons démontraient des associations désirables avec les différentes mesures d’IIM. Ces résultats viennent souligner l’importance des procédures de traite pour l’obtention d’une réduction à long-terme de la prévalence d’IIM. Finalement, les facteurs de risque modifiables associés à l’incidence, à l’élimination et à la prévalence d’IIM à staphylocoques coagulase-négatif (SCN) ont été étudiés de manière similaire. Cependant, afin de prendre en considération les limitations de la culture bactériologique du lait pour l’identification des IIM causées par ce groupe de pathogènes, une approche semi-Bayesienne à l’aide de modèles de variable à classe latente a été utilisée. Les estimés non-ajusté de l’incidence, de l’élimination, de la prévalence et des associations avec les expositions apparaissaient tous considérablement biaisés par les imperfections de la procédure diagnostique. Ce biais était en général vers la valeur nulle. Encore une fois, l’incidence d’IIM était le principal déterminant de la prévalence d’IIM des troupeaux. Les litières de sable et de produits du bois, de même que l’accès au pâturage étaient associés à une incidence et une prévalence plus basse de SCN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: The frequent occurrence of inconclusive serology in blood banks and the absence of a gold standard test for Chagas'disease led us to examine the efficacy of the blood culture test and five commercial tests (ELISA, IIF, HAI, c-ELISA, rec-ELISA) used in screening blood donors for Chagas disease, as well as to investigate the prevalence of Trypanosoma cruzi infection among donors with inconclusive serology screening in respect to some epidemiological variables. METHODS: To obtain estimates of interest we considered a Bayesian latent class model with inclusion of covariates from the logit link. RESULTS: A better performance was observed with some categories of epidemiological variables. In addition, all pairs of tests (excluding the blood culture test) presented as good alternatives for both screening (sensitivity > 99.96% in parallel testing) and for confirmation (specificity > 99.93% in serial testing) of Chagas disease. The prevalence of 13.30% observed in the stratum of donors with inconclusive serology, means that probably most of these are non-reactive serology. In addition, depending on the level of specific epidemiological variables, the absence of infection can be predicted with a probability of 100% in this group from the pairs of tests using parallel testing. CONCLUSION: The epidemiological variables can lead to improved test results and thus assist in the clarification of inconclusive serology screening results. Moreover, all combinations of pairs using the five commercial tests are good alternatives to confirm results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.