Enhanced n-gram extraction using relevance feature discovery
Contribuinte(s) |
Cranefield, Stephen Nayak, Abhaya |
---|---|
Data(s) |
2013
|
Resumo |
Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio. |
Identificador | |
Publicador |
Springer |
Relação |
http://link.springer.com/chapter/10.1007%2F978-3-319-03680-9_46 DOI:10.1007/978-3-319-03680-9_46 Albathan, Mubarak, Li, Yuefeng, & Algarni, Abdulmohsen (2013) Enhanced n-gram extraction using relevance feature discovery. In Cranefield, Stephen & Nayak, Abhaya (Eds.) Proceedings of the 26th Australasian Joint Conference : AI2013 Advances in Artificial Intelligence, Springer, Dunedin, New Zealand, pp. 453-465. |
Direitos |
Copyright 2013 Springer International Publishing Switzerland |
Fonte |
School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #Feature selection #N-gram #Terms weight #Relevance feedback |
Tipo |
Conference Paper |