Using patterns co-occurrence matrix for cleaning closed sequential patterns for text mining


Autoria(s): Albathan, Mubarak; Li, Yuefeng; Algarni, Abdulmohsen
Contribuinte(s)

Zhong, Ning

Gong, Zhiguo

Data(s)

04/12/2012

Resumo

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/58289/

Publicador

IEEE

Relação

http://eprints.qut.edu.au/58289/1/58289.pdf

http://www.fst.umac.mo/wic2012/WI/

Albathan, Mubarak, Li, Yuefeng, & Algarni, Abdulmohsen (2012) Using patterns co-occurrence matrix for cleaning closed sequential patterns for text mining. In Zhong, Ning & Gong, Zhiguo (Eds.) 2012 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE, Macau, China, pp. 201-205.

Direitos

Copyright 2012 IEEE

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #090000 ENGINEERING
Tipo

Conference Paper