2 resultados para Cantonese
em Cambridge University Engineering Department Publications Database
Resumo:
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517. © 2013 IEEE.
Resumo:
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.