Vocabulary and language model adaptation using information retrieval


Autoria(s): Bigi, Brigitte; Huang, Y; De Mori, Renato
Contribuinte(s)

International Computer Science Institute [Berkeley] (ICSI) ; International Computer Science Institute

Laboratoire Informatique d'Avignon (LIA) ; Université d'Avignon et des Pays de Vaucluse (UAPV) - Centre d'Enseignement et de Recherche en Informatique - CERI

Cobertura

Jeju Island, South Korea

Data(s)

2004

Resumo

International audience

The goal of vocabulary optimization is to construct a vocabulary with exactly those words that are the most likely to appear in the testdata. We will present a new approach to reduce the out-of-vocabulary (OOV) rate by adapting the vocabulary model during the ASR process.This method can also be used for the statistical language model (SLM) adaptation. An information retrieval system is used after the first pass of the ASR system to obtain a set of relevant documents. These documents are then used to generate the new vocabulary and/or corpus. In this paper, we propose a new retrieving method well-adapted for this purpose. Experiments were carried out on French with a 28% OOV rate reduction. Experiments were also carried out on English for the SLM adaptation, with 7.9% perplexity reduction, and minor WER improvement.

Identificador

hal-01392515

https://hal.archives-ouvertes.fr/hal-01392515

Idioma(s)

en

Publicador

HAL CCSD

Fonte

International Conference on Spoken Language Processing

https://hal.archives-ouvertes.fr/hal-01392515

International Conference on Spoken Language Processing, 2004, Jeju Island, South Korea. II, pp.1361-1364

Palavras-Chave #Language modelling #[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] #[SHS.INFO] Humanities and Social Sciences/Library and information sciences
Tipo

info:eu-repo/semantics/conferenceObject

Conference papers