Speaker attribution of Australian broadcast news data


Autoria(s): Ghaemmaghami, Houman; Dean, David; Sridharan, Sridha
Data(s)

15/08/2013

Resumo

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/63498/

Publicador

Sun SITE Central Europe

Relação

http://eprints.qut.edu.au/63498/1/SLAM13_eprints.pdf

http://ceur-ws.org/Vol-1012/

Ghaemmaghami, Houman, Dean, David, & Sridharan, Sridha (2013) Speaker attribution of Australian broadcast news data. In Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM): CEUR Workshop Proceedings, Volume 1012, Sun SITE Central Europe , Marseille, France, pp. 72-77.

http://purl.org/au-research/grants/ARC/LP0991238

Direitos

Copyright 2013 [please consult the author]

Fonte

School of Electrical Engineering & Computer Science; Faculty of Built Environment and Engineering; Information Security Institute

Palavras-Chave #090609 Signal Processing
Tipo

Conference Paper