VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION

BAPTISTA, Renato; GHORBEL, Enjie; PAPADOPOULOS, Konstantinos; Demisse, Girum; AOUADA, Djamila; OTTERSTEN, Björn

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION

BAPTISTA, Renato; GHORBEL, Enjie; PAPADOPOULOS, Konstantinos et al.

2019 • In IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019

Peer reviewed

Permalink
https://hdl.handle.net/10993/39033

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ICASSP_Baptista_toappear.pdf

Author postprint (466.37 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Pose Estimation; View-Invariance; LSTM

Abstract :

[en] In this paper, we propose a novel view-invariant action recognition method using a single monocular RGB camera. View-invariance remains a very challenging topic in 2D action recognition due to the lack of 3D information in RGB images. Most successful approaches make use of the concept of knowledge transfer by projecting 3D synthetic data to multiple viewpoints. Instead of relying on knowledge transfer, we propose to augment the RGB data by a third dimension by means of 3D skeleton estimation from 2D images using a CNN-based pose estimator. In order to ensure view-invariance, a pre-processing for alignment is applied followed by data expansion as a way for denoising. Finally, a Long-Short Term Memory (LSTM) architecture is used to model the temporal dependency between skeletons. The proposed network is trained to directly recognize actions from aligned 3D skeletons. The experiments performed on the challenging Northwestern-UCLA dataset show the superiority of our approach as compared to state-of-the-art ones.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM

Disciplines :

Computer science

Author, co-author :

BAPTISTA, Renato ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

PAPADOPOULOS, Konstantinos ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Demisse, Girum

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

External co-authors :

Language :

English

Title :

VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION

Publication date :

May 2019

Event name :

International Conference on Acoustics, Speech and Signal Processing

Event organizer :

IEEE

Event place :

Brighton, United Kingdom

Event date :

12-17 May 2019

Main work title :

IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

FnR Project :

FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten

Funders :

FNR - Fonds National de la Recherche

Available on ORBilu :

since 14 March 2019

Statistics

Number of views

383 (26 by Unilu)

Number of downloads

492 (16 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

David Mumford, "Pattern theory: a unifying perspective, " in Fields Medallists' Lectures, pp. 226-261. World Scientific, 1997.
Xiaodong Yang and Ying Li Tian, "Eigenjoints-based action recognition using naive-bayes-nearest-neighbor, " in CVPRW, 2012, pp. 14-19.
Lu Xia, Chia-Chih Chen, and Jake K Aggarwal, "View invariant human action recognition using histograms of 3D joints, " in CVPRW, 2012, pp. 20-27.
Girum G Demisse, Konstantinos Papadopoulos, Djamila Aouada, and Björn Ottersten, "Pose Encoding for Robust Skeleton-Based Action Recognition, " in CVPRW, 2018, pp. 188-194.
Enjie Ghorbel, Rémi Boutteau, Jacques Boonaert, Xavier Savatier, and Stéphane Lecoeuche, "Kinematic Spline Curves: A temporal invariant descriptor for fast action recognition, " IVC, vol. 77, pp. 60-71, 2018.
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa, "Human action recognition by representing 3D skeletons as points in a lie group, " in CVPR, 2014, pp. 588-595.
Maxime Devanne, Hazem Wannous, Stefano Berretti, Pietro Pala, Mohamed Daoudi, and Alberto Del Bimbo, "3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold, " Transactions on Cybernetics, vol. 45, no. 7, pp. 1340-1352, 2015.
Enjie Ghorbel, Konstantinos Papadopoulos, Renato Baptista, Himadri Pathak, Girum Demisse, Djamila Aouada, and Björn Ottersten, "A view-invariant framework for fast skeleton-based action recognition using a single rgb camera, " in VISAPP, 2019.
Ankur Gupta, Julieta Martinez, James J Little, and Robert J Woodham, "3d pose from motion for crossview action recognition via non-linear circulant temporal encoding, " in CVPR, 2014, pp. 2601-2608.
Hossein Rahmani and Ajmal Mian, "Learning a nonlinear knowledge transfer model for cross-view action recognition, " in CVPR, 2015, pp. 2458-2466.
Konstantinos Papadopoulos, Michel Antunes, Djamila Aouada, and Björn Ottersten, "Enhanced trajectorybased action recognition using human pose, " in ICIP, 2017, pp. 1807-1811.
Konstantinos Papadopoulos, Michel Antunes, Djamila Aouada, and Björn Ottersten, "A Revisit of Action Detection using Improved Trajectories, " in ICASSP, 2018.
Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt, "Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision, " CoRR, vol. abs/1611. 09813, 2016.
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis, "Coarse-to-fine volumetric prediction for single-image 3D human pose, " in CVPR, 2017, pp. 1263-1272.
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt, "VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera, " 2017, vol. 36.
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio, Deep learning, vol. 1, MIT press Cambridge, 2016.
Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik, "Dimensionality reduction: a comparative, " JMLR, vol. 10, pp. 66-71, 2009.
Sepp Hochreiter and Jürgen Schmidhuber, "Long shortterm memory, " Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
JiangWang, Xiaohan Nie, Yin Xia, YingWu, and Song-Chun Zhu, "Cross-view action modeling, learning and recognition, " in CVPR, 2014, pp. 2649-2656.
Hossein Rahmani and Ajmal Mian, "Learning a nonlinear knowledge transfer model for cross-view action recognition, " in CVPR. jun 2015, IEEE.
Binlong Li, Octavia I Camps, and Mario Sznaier, "Cross-view activity recognition using hankelets, " in CVPR. IEEE, 2012, pp. 1362-1369.
Ruonan Li and Todd Zickler, "Discriminative virtual views for cross-view action recognition, " in CVPR, 2012, pp. 2855-2862.
Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi, "Cross-view action recognition via a continuous virtual path, " in CVPR, 2013, pp. 2690-2697.
JiangWang, Xiaohan Nie, Yin Xia, YingWu, and Song-Chun Zhu, "Cross-view action modeling, learning and recognition, " in CVPR, 2014, pp. 2649-2656.
Ankur Gupta, Julieta Martinez, James J. Little, and Robert J. Woodham, "3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding, " in CVPR. jun 2014, IEEE.
Hossein Rahmani, Ajmal Mian, and Mubarak Shah, "Learning a deep model for human action recognition from novel viewpoints, " PAMI, vol. 40, no. 3, pp. 667-681, 2018.