Biblioteca Digital

Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection

**Autoria(s):** Kalantari, Shahram; Dean, David B.; Sridharan, Sridha; Ghaemmaghami, Houman; Fookes, Clinton B.
Data(s)	30/10/2015
Resumo	Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.
Identificador	http://eprints.qut.edu.au/86033/
Publicador	Association for Computing Machinery
Relação	DOI:10.1145/2802558.2814648 Kalantari, Shahram, Dean, David B., Sridharan, Sridha, Ghaemmaghami, Houman, & Fookes, Clinton B. (2015) Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection. In Proceedings of the Third Edition Workshop on Speech, Language and Audio in Multimedia, Association for Computing Machinery, Brisbane, Qld, pp. 11-14.
Direitos	Copyright 2015 ACM
Fonte	School of Electrical Engineering & Computer Science; Science & Engineering Faculty; Smart Services CRC
Palavras-Chave	#audio visual spoken term detection #cross database training
Tipo	Conference Paper

Acesso ao item digital