Performance analysis of algorithms for frequent pattern generation


Autoria(s): Islam, Md. Rafiqul; Chowdhury, Morshed; Khan, Safwan Mahmood
Contribuinte(s)

Stonier, Russel

Han, Qinglong

Li, Wei

Data(s)

01/01/2004

Resumo

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is also called a method of "knowledge presentation" where visualization and knowledge representation techniques are used to present the mined knowledge to the user. Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. The Pattern Decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass makes it more efficient to mine all frequent patterns in a large dataset. This algorithm avoids the costly process of candidate set generation and saves a large amount of counting time to evaluate support with reduced datasets. In this paper, some existing frequent pattern generation algorithms are explored and their comparisons are discussed. The results show that the PD algorithm outperforms an improved version of Apriori named Direct Count of candidates & Prune transactions (DCP) by one order of magnitude and is faster than an improved FP-tree named as Predictive Item Pruning (PIP). Further, PD is also more scalable than both DCP and PIP.<br />

Identificador

http://hdl.handle.net/10536/DRO/DU:30005388

Idioma(s)

eng

Publicador

Central Queensland University

Relação

http://dro.deakin.edu.au/eserv/DU:30005388/chowdhury-performanceanalysis-2004.pdf

http://www.complexsystems.net.au/content/about_us

Palavras-Chave #data mining #association rules #frequent Pattern #DCP algorithm #PIP algorithm #PD algorithm
Tipo

Conference Paper