Deep Head Pose: Gaze-Direction Estimation in Multimodal Video


Autoria(s): Mukherjee, Sankha S.; Robertson, Neil Martin
Data(s)

01/11/2015

Resumo

In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware

Identificador

http://pure.qub.ac.uk/portal/en/publications/deep-head-pose-gazedirection-estimation-in-multimodal-video(4405d052-1f67-48d5-a00f-4bf4bbe38d40).html

http://dx.doi.org/10.1109/TMM.2015.2482819

Idioma(s)

eng

Direitos

info:eu-repo/semantics/openAccess

Fonte

Mukherjee , S S & Robertson , N M 2015 , ' Deep Head Pose: Gaze-Direction Estimation in Multimodal Video ' IEEE Transactions on Multimedia , vol 17 , no. 11 , pp. 2094-2107 . DOI: 10.1109/TMM.2015.2482819

Palavras-Chave #Convolutional neural networks (CNNs), deep learning, gaze direction, head-pose, RGB-D
Tipo

article