[en] In this paper, we propose a novel view-invariant action recognition method using a single monocular RGB camera. View-invariance remains a very challenging topic in 2D action recognition due to the lack of 3D information in RGB images. Most successful approaches make use of the concept of knowledge transfer by projecting 3D synthetic data to multiple viewpoints.
Instead of relying on knowledge transfer, we propose to augment the RGB data by a third dimension by means of 3D skeleton estimation from 2D images using a CNN-based pose estimator. In order to ensure view-invariance, a pre-processing for alignment is applied followed by data expansion as a way for denoising. Finally, a Long-Short Term Memory (LSTM) architecture is used to model the temporal dependency between skeletons. The proposed network is trained to directly recognize actions from aligned 3D skeletons. The experiments performed on the challenging Northwestern-UCLA dataset show the superiority of our approach as compared to state-of-the-art ones.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM
Disciplines :
Computer science
Author, co-author :
Baptista, Renato ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Ghorbel, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Papadopoulos, Konstantinos ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Demisse, Girum
Aouada, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Ottersten, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
no
Language :
English
Title :
VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION
Publication date :
May 2019
Event name :
International Conference on Acoustics, Speech and Signal Processing
Event organizer :
IEEE
Event place :
Brighton, United Kingdom
Event date :
12-17 May 2019
Main work title :
IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
FnR Project :
FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten
David Mumford, "Pattern theory: a unifying perspective, " in Fields Medallists' Lectures, pp. 226-261. World Scientific, 1997.
Xiaodong Yang and Ying Li Tian, "Eigenjoints-based action recognition using naive-bayes-nearest-neighbor, " in CVPRW, 2012, pp. 14-19.
Lu Xia, Chia-Chih Chen, and Jake K Aggarwal, "View invariant human action recognition using histograms of 3D joints, " in CVPRW, 2012, pp. 20-27.
Girum G Demisse, Konstantinos Papadopoulos, Djamila Aouada, and Björn Ottersten, "Pose Encoding for Robust Skeleton-Based Action Recognition, " in CVPRW, 2018, pp. 188-194.
Enjie Ghorbel, Rémi Boutteau, Jacques Boonaert, Xavier Savatier, and Stéphane Lecoeuche, "Kinematic Spline Curves: A temporal invariant descriptor for fast action recognition, " IVC, vol. 77, pp. 60-71, 2018.
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa, "Human action recognition by representing 3D skeletons as points in a lie group, " in CVPR, 2014, pp. 588-595.
Maxime Devanne, Hazem Wannous, Stefano Berretti, Pietro Pala, Mohamed Daoudi, and Alberto Del Bimbo, "3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold, " Transactions on Cybernetics, vol. 45, no. 7, pp. 1340-1352, 2015.
Enjie Ghorbel, Konstantinos Papadopoulos, Renato Baptista, Himadri Pathak, Girum Demisse, Djamila Aouada, and Björn Ottersten, "A view-invariant framework for fast skeleton-based action recognition using a single rgb camera, " in VISAPP, 2019.
Ankur Gupta, Julieta Martinez, James J Little, and Robert J Woodham, "3d pose from motion for crossview action recognition via non-linear circulant temporal encoding, " in CVPR, 2014, pp. 2601-2608.
Hossein Rahmani and Ajmal Mian, "Learning a nonlinear knowledge transfer model for cross-view action recognition, " in CVPR, 2015, pp. 2458-2466.
Konstantinos Papadopoulos, Michel Antunes, Djamila Aouada, and Björn Ottersten, "Enhanced trajectorybased action recognition using human pose, " in ICIP, 2017, pp. 1807-1811.
Konstantinos Papadopoulos, Michel Antunes, Djamila Aouada, and Björn Ottersten, "A Revisit of Action Detection using Improved Trajectories, " in ICASSP, 2018.
Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt, "Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision, " CoRR, vol. abs/1611. 09813, 2016.
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis, "Coarse-to-fine volumetric prediction for single-image 3D human pose, " in CVPR, 2017, pp. 1263-1272.
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt, "VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera, " 2017, vol. 36.
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio, Deep learning, vol. 1, MIT press Cambridge, 2016.
Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik, "Dimensionality reduction: a comparative, " JMLR, vol. 10, pp. 66-71, 2009.
Sepp Hochreiter and Jürgen Schmidhuber, "Long shortterm memory, " Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
JiangWang, Xiaohan Nie, Yin Xia, YingWu, and Song-Chun Zhu, "Cross-view action modeling, learning and recognition, " in CVPR, 2014, pp. 2649-2656.
Hossein Rahmani and Ajmal Mian, "Learning a nonlinear knowledge transfer model for cross-view action recognition, " in CVPR. jun 2015, IEEE.
Binlong Li, Octavia I Camps, and Mario Sznaier, "Cross-view activity recognition using hankelets, " in CVPR. IEEE, 2012, pp. 1362-1369.
Ruonan Li and Todd Zickler, "Discriminative virtual views for cross-view action recognition, " in CVPR, 2012, pp. 2855-2862.
Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi, "Cross-view action recognition via a continuous virtual path, " in CVPR, 2013, pp. 2690-2697.
JiangWang, Xiaohan Nie, Yin Xia, YingWu, and Song-Chun Zhu, "Cross-view action modeling, learning and recognition, " in CVPR, 2014, pp. 2649-2656.
Ankur Gupta, Julieta Martinez, James J. Little, and Robert J. Woodham, "3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding, " in CVPR. jun 2014, IEEE.
Hossein Rahmani, Ajmal Mian, and Mubarak Shah, "Learning a deep model for human action recognition from novel viewpoints, " PAMI, vol. 40, no. 3, pp. 667-681, 2018.