Abstract :
[en] Humans use facial expressions successfully for conveying
their emotional states. However, replicating such success
in the human-computer interaction domain is an active
research problem. In this paper, we propose deep convolutional
neural network (DCNN) for joint learning of robust
facial expression features from fused RGB and depth
map latent representations. We posit that learning jointly
from both modalities result in a more robust classifier for
facial expression recognition (FER) as opposed to learning
from either of the modalities independently. Particularly,
we construct a learning pipeline that allows us to learn several
hierarchical levels of feature representations and then
perform the fusion of RGB and depth map latent representations
for joint learning of facial expressions. Our experimental
results on the BU-3DFE dataset validate the proposed
fusion approach, as a model learned from the joint
modalities outperforms models learned from either of the
modalities.
Funders :
This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR- 0424-05-D/Bjorn Ottersten
Scopus citations®
without self-citations
38