Robust bimodal person identification using face and speech with limited training data and corruption of both modalities
Data(s) |
01/01/2011
|
---|---|
Resumo |
This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy. |
Formato |
application/pdf |
Identificador |
http://pure.qub.ac.uk/ws/files/16286083/McLaughlin_Interspeech_2011.pdf |
Idioma(s) |
eng |
Direitos |
info:eu-repo/semantics/openAccess |
Fonte |
McLaughlin , N , Ji , M & Crookes , D 2011 , Robust bimodal person identification using face and speech with limited training data and corruption of both modalities . in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . pp. 585-588 . |
Palavras-Chave | #/dk/atira/pure/subjectarea/asjc/1200/1203 #Language and Linguistics #/dk/atira/pure/subjectarea/asjc/1700/1709 #Human-Computer Interaction #/dk/atira/pure/subjectarea/asjc/1700/1711 #Signal Processing #/dk/atira/pure/subjectarea/asjc/1700/1712 #Software #/dk/atira/pure/subjectarea/asjc/2600/2611 #Modelling and Simulation |
Tipo |
contributionToPeriodical |