A maximal frequent itemset approach for web document clustering


Autoria(s): Zhuang, Ling; Dai, Honghua
Contribuinte(s)

Wei, Daming

Wang, Hui

Peng, Zhiyong

Kara, Atsushi

He, Yanxiang

Data(s)

01/01/2004

Resumo

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets. <br />

Identificador

http://hdl.handle.net/10536/DRO/DU:30005533

Idioma(s)

eng

Publicador

IEEE Computer Society

Relação

http://dro.deakin.edu.au/eserv/DU:30005533/dai-amaximalfrequentitemset-2004.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1357322

Direitos

2004, IEEE

Tipo

Conference Paper