Automatic speaker recognition under adverse conditions


Autoria(s): Vogt, Robert Jeffery
Data(s)

2006

Resumo

Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/36195/

Publicador

Queensland University of Technology

Relação

http://eprints.qut.edu.au/36195/1/Robert_Vogt_Thesis.pdf

Vogt, Robert Jeffery (2006) Automatic speaker recognition under adverse conditions. PhD thesis, Queensland University of Technology.

Fonte

Faculty of Built Environment and Engineering; School of Engineering Systems

Palavras-Chave #automatic speech recognition #speaker recognition #speaker verification #Bayes factor #mismatch #session variability #feature mapping #confidence measures
Tipo

Thesis