Complete-linkage clustering for voice activity detection in audio and visual speech


Autoria(s): Ghaemmaghami, Houman; Dean, David; Kalantari, Shahram; Sridharan, Sridha; Fookes, Clinton
Data(s)

06/09/2015

Resumo

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/85160/

Publicador

Interspeech 2015

Relação

http://eprints.qut.edu.au/85160/1/Houman_Interspeech2015.pdf

http://www.isca-speech.org/archive/interspeech_2015/

Ghaemmaghami, Houman, Dean, David, Kalantari, Shahram, Sridharan, Sridha, & Fookes, Clinton (2015) Complete-linkage clustering for voice activity detection in audio and visual speech. In Interspeech 2015: 16th Annual Conference of the International Speech Communication Association, 6-10 September 2015, Maritim International Congress Center, Dresden, Germany.

http://purl.org/au-research/grants/ARC/LP130100110

Direitos

Copyright 2015 [please consult the authors]

Fonte

School of Electrical Engineering & Computer Science; Institute for Future Environments; Science & Engineering Faculty

Palavras-Chave #Voice activity detection #High noise #Gaussian mixture modeling #complete-linkage clustering
Tipo

Conference Paper