Complete-linkage clustering for voice activity detection in audio and visual speech
Data(s) |
06/09/2015
|
---|---|
Resumo |
We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features. |
Formato |
application/pdf |
Identificador | |
Publicador |
Interspeech 2015 |
Relação |
http://eprints.qut.edu.au/85160/1/Houman_Interspeech2015.pdf http://www.isca-speech.org/archive/interspeech_2015/ Ghaemmaghami, Houman, Dean, David, Kalantari, Shahram, Sridharan, Sridha, & Fookes, Clinton (2015) Complete-linkage clustering for voice activity detection in audio and visual speech. In Interspeech 2015: 16th Annual Conference of the International Speech Communication Association, 6-10 September 2015, Maritim International Congress Center, Dresden, Germany. http://purl.org/au-research/grants/ARC/LP130100110 |
Direitos |
Copyright 2015 [please consult the authors] |
Fonte |
School of Electrical Engineering & Computer Science; Institute for Future Environments; Science & Engineering Faculty |
Palavras-Chave | #Voice activity detection #High noise #Gaussian mixture modeling #complete-linkage clustering |
Tipo |
Conference Paper |