27 resultados para Latent Dirichlet Allocation

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LDA(Latent Dirichlet Allocation)模型是近年来提出的一种能够提取文本隐含主题的非监督学习模型.通过在传统LDA模型中融入文本类别信息,文中提出了一种附加类别标签的LDA模型(Labeled-LDA).基于该模型可以在各类别上协同计算隐含主题的分配量,从而克服了传统LDA模型用于分类时强制分配隐含主题的缺陷.与传统LDA模型的实验对比表明:基于Labeled-LDA模型的文本分类新算法可以有效改进文本分类的性能,在复旦大学中文语料库上micro_F1提高约5.7%,在英文语料库20newsgroup的comp子集上micro—F-提高约3%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

随着互联网的不断发展,网络文本信息资源正在急剧增加,如何利用计算机自动有效地对海量的文本信息进行管理和查询成为了当前的迫切需求。而文本分类技术提供了一种对自然语言文档进行自动组织的有效手段。本文从模型平滑、类别信息嵌入和高性能求解算法等方面对LDA 模型(Latent Dirichlet Allocation,隐含狄利克雷分配)开展了研究,研究内容涉及文本分类中不平衡语料分类、文本表示、复杂分类器加速等多个方面,论文的主要工作和创新点总结如下:第一、为了克服传统LDA 模型平滑算法中直接修改多级图模型隐含变量分布的随意性我们提出了数据驱动的Laplacian 平滑方法和数据驱动的 Jelinek-Mercer 平滑方法。数据驱动的Laplacian 平滑方法缓解了传统LDA 模型的过拟合现象,数据驱动的Jelinek-Mercer 平滑方法在基本保持整体性能的情况下,降低了预测阶段的时间复杂度。数据驱动的平滑策略在平衡和非平衡语料库上都能够显著提高LDA 模型的分类性能。第二、传统LDA 模型计算目标文档在不同类别上的生成概率时,在文档非所属类别的隐含主题上会发生强制分配。针对这一问题我们提出了Labeled-LDA 模型将隐含主题和类别信息融合在一起,在分类目标文档时协同计算全部类别的隐含主题的分配以改善分类性能。第三、LDA 等隐含主题模型是近期在文本挖掘领域发展起来的重要研究方向,但是算法具有较高的计算复杂度。我们以动态负载均衡的算法为重点,研究和实现了LDA 模型在多内核计算机上的并行计算和CTM 模型(Correlated Topic Model,关联主题模型)在异构集群环境下的分布式计算。第四、基于提出的Labeled-LDA 模型我们设计和实现了一个具有柔性文本分类功能的实验系统。该系统利用Labeled-LDA 模型对目标文档中的隐含主题进行概率推断,进而得到文档在各类别上的分配量。相对概率支持向量机等判别模型概率估计方法,该分配量具有较明确的实际意义,同时避免了贝叶斯等生成分类模型输出极端概率值的问题。

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A meso material model for polycrystalline metals is proposed, in which the tiny slip systems distributing randomly between crystal slices in micro-grains or on grain boundaries are replaced by macro equivalent slip systems determined by the work-conjugate principle. The elastoplastic constitutive equation of this model is formulated for the active hardening, latent hardening and Bauschinger effect to predict macro elastoplastic stress-strain responses of polycrystalline metals under complex loading conditions. The influence of the material property parameters on size and shape of the subsequent yield surfaces is numerically investigated to demonstrate the fundamental features of the proposed material model. The derived constitutive equation is proved accurate and efficient in numerical analysis. Compared with the self-consistent theories with crystal grains as their basic components, the present theory is much simpler in mathematical treatment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The configuration of semisubmersibles consisting of pontoons and columns and their corresponding heave motion response in incident progressive waves are examined. The purpose of the present study is to provide a theoretical approach to estimating the effects of volumetric allocation on natural period and response amplitude operator (RAO) in heave motion. We conclude that the amplitude of heave motion response can be considerably suppressed by appropriately adjusting volumetric allocation so that the natural heave period keeps away from the range of wave energy. The theoretical formulae are found in good agreement with the corresponding computational results by WAMIT.