Selection of TDOA Parameters for MDM Speaker Diarization


Autoria(s): Martínez González, Beatriz; Pardo Muñoz, José Manuel; Echeverry Correa, Julian David; Vallejo Pinto, José Ángel; Barra Chicote, Roberto
Data(s)

01/09/2012

Resumo

Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline.

Formato

application/pdf

Identificador

http://oa.upm.es/20413/

Idioma(s)

eng

Publicador

E.T.S.I. Telecomunicación (UPM)

Relação

http://oa.upm.es/20413/1/INVE_MEM_2012_134468.pdf

info:eu-repo/semantics/altIdentifier/doi/null

Direitos

http://creativecommons.org/licenses/by-nc-nd/3.0/es/

info:eu-repo/semantics/openAccess

Fonte

InterSpeech 2012, 13th Annual Conference of the International Speech Communication Association | InterSpeech 2012, 13th Annual Conference of the International Speech Communication Association | 09/09/2012 - 13/09/2012 | Portland, Oregon

Palavras-Chave #Telecomunicaciones
Tipo

info:eu-repo/semantics/conferenceObject

Ponencia en Congreso o Jornada

PeerReviewed