A Boundary-Oriented Chinese Segmentation Method Using NGram Mutual Information


Autoria(s): Tang, Ling-Xiang; Geva, Shlomo; Trotman, Andrew; Xu, Yue
Contribuinte(s)

Sun, L

Chen, K.J.

Data(s)

2010

Resumo

This paper describes our participation in the Chinese word segmentation task of CIPS-SIGHAN 2010. We implemented an n-gram mutual information (NGMI) based segmentation algorithm with the mixed-up features from unsupervised, supervised and dictionarybased segmentation methods. This algorithm is also combined with a simple strategy for out-of-vocabulary (OOV) word recognition. The evaluation for both open and closed training shows encouraging results of our system. The results for OOV word recognition in closed training evaluation were however found unsatisfactory.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/80001/

Publicador

Chinese Information Processing Society of China

Relação

http://eprints.qut.edu.au/80001/1/80001_tang_2011006236.pdf

Tang, Ling-Xiang, Geva, Shlomo, Trotman, Andrew, & Xu, Yue (2010) A Boundary-Oriented Chinese Segmentation Method Using NGram Mutual Information. In Sun, L & Chen, K.J. (Eds.) Proceedings of the CIPS-SIGHAN Joint Conference on Chinese Language Processing, Chinese Information Processing Society of China, China.

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #Translation; Chinese Segmentation #Boundary-Oriented Segmentation #Chinese language #N-Gram Mutual Information
Tipo

Conference Paper