17 resultados para document clustering

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

70.00% 70.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

文本聚类在信息过滤,网页分类中有着很好的应用。但它面临数据量大,特征维度高的难点。由于K平均算法易于实现,对数据依赖度底,在文本聚类中得到应用。然而,传统K平均以及它的变种会产生有较大波动的聚类结果。因此对K平均算法进行了改进,通过优化聚类初始中心的选择,得到一种适合对文本数据聚类分析的改进算法。大量实验显示,该算法可以生成质量较高而且聚类质量波动性较小的结果。

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a highly efficient content-lossless compression scheme for Chinese document images. The scheme combines morphologic analysis with pattern matching to cluster patterns. In order to achieve the error maps with minimal error numbers, the morphologic analysis is applied to decomposing and recomposing the Chinese character patterns. In the pattern matching, the criteria are adapted to the characteristics of Chinese characters. Since small-size components sometimes can be inserted into the blank spaces of large-size components, we can achieve small-size pattern library images. Arithmetic coding is applied to the final compression. Our method achieves much better compression performance than most alternative methods, and assures content-lossless reconstruction. (c) 2006 Society of Photo-Optical Instrumentation Engineers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a highly efficient content-lossless compression scheme for Chinese document images. The scheme combines morphologic analysis with pattern matching to cluster patterns. In order to achieve the error maps with minimal error numbers, the morphologic analysis is applied to decomposing and recomposing the Chinese character patterns. In the pattern matching, the criteria are adapted to the characteristics of Chinese characters. Since small-size components sometimes can be inserted into the blank spaces of large-size components, we can achieve small-size pattern library images. Arithmetic coding is applied to the final compression. Our method achieves much better compression performance than most alternative methods, and assures content-lossless reconstruction. (c) 2006 Society of Photo-Optical Instrumentation Engineers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is common that documents are represented by document icon in graphical user interfaces. The document icon facilitates user to retrieve documents, but it is difficult to distinguish the document from a collection of documents that user have accessed to. Our paper presents a document icon on which the users can add some subjective values and mark. Then we describe a system ex-explorer that users can browser and search the extent document icon. We found that it is easy to re-find the document on which users added some annotation or mark by themselves.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Photoluminescence (PL) spectra of GaInNAs/GaAs multiple quantum wells and GaInNAs epilayers grown on GaAs substrate show an apparent "S-shape" temperature-dependence of the of dominant luminescence peak. At low temperature and weak excitation conditions, a PL peak related to nitrogen cluster-induced bound states can be well resolved in the PL spectra. It displays a remarkable red shift of up to 60 meV and is thermally quenched below 100 K with increasing temperature, being attributed to N-cluster induced bound states. The indium incorporation exhibits significant effect on the cluster formation. The rapid thermal annealing treatment at 750 C can essentially remove the bound states-induced peak.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tianjin University of Technology

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ACM SIGIR; ACM SIGWEB