An English-translated parallel corpus for the CJK Wikipedia collections
Data(s) |
01/12/2012
|
---|---|
Resumo |
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia. |
Formato |
application/pdf |
Identificador | |
Publicador |
ACM |
Relação |
http://eprints.qut.edu.au/57835/1/CJK2E-Wikipedia-XML-Corpus-V7.1.pdf DOI:10.1145/2407085.2407099 Tang, Ling-Xiang, Geva, Shlomo, & Trotman, Andrew (2012) An English-translated parallel corpus for the CJK Wikipedia collections. In 17th Australasian Document Computing Symposium, 5-6 December 2012, Dunedin, New Zealand. |
Direitos |
Copyright 2012 ACM Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. |
Fonte |
Faculty of Science and Technology |
Palavras-Chave | #080704 Information Retrieval and Web Search #Wikipedia #Corpus #English #Chinese #Japanese #Korean #machine learning #cross-lingual information retrieval #cross-lingual link discovery |
Tipo |
Conference Paper |