Cross likelihood ratio based speaker clustering using eigenvoice models


Autoria(s): Wang, David; Vogt, Robert J.; Sridharan, Sridha; Dean , David
Data(s)

01/08/2011

Resumo

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/46177/

Relação

http://eprints.qut.edu.au/46177/1/46177a.pdf

http://www.interspeech2011.org/

Wang, David, Vogt, Robert J., Sridharan, Sridha, & Dean , David (2011) Cross likelihood ratio based speaker clustering using eigenvoice models. In Interspeech 2011 : 12th Annual Conference of the International Speech Communication Association, 28-31 August 2011, Florence, Italy.

http://purl.org/au-research/grants/ARC/LP0991238

Direitos

Copyright 2011 please consult authors

Fonte

Faculty of Built Environment and Engineering; Information Security Institute; School of Engineering Systems

Palavras-Chave #080109 Pattern Recognition and Data Mining #090609 Signal Processing #eigenvoice modeling #joint factor analysis #cross likelihood ratio #speaker clustering #speaker diarization
Tipo

Conference Paper