Incorporating visual information for spoken term detection
Data(s) |
2015
|
---|---|
Resumo |
Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments. |
Formato |
application/pdf |
Identificador | |
Publicador |
International Speech Communication Association |
Relação |
http://eprints.qut.edu.au/86034/1/Audio%20visual%20STD.pdf http://www.isca-speech.org/archive/interspeech_2015/i15_0558.html Kalantari, Shahram, Dean, David, & Sridharan, Sridha (2015) Incorporating visual information for spoken term detection. In Proceedings of the 16th Annual Conference of the International Speech Communication Association, Interspeech 2015, International Speech Communication Association, Maritim International Congress Center, Dresden, Germany, pp. 558-562. |
Direitos |
Copyright 2015 [Please consult the author] |
Fonte |
School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #Spoken term detection #keyword spotting #audio visual phone recognition #DMLS system |
Tipo |
Conference Paper |