Temporal 3D Human Pose Estimation for Action Recognition from Arbitrary Viewpoints

Adel Musallam, Mohamed; BAPTISTA, Renato; AL ISMAEIL, Kassem; AOUADA, Djamila

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Temporal 3D Human Pose Estimation for Action Recognition from Arbitrary Viewpoints

Adel Musallam, Mohamed; BAPTISTA, Renato; AL ISMAEIL, Kassem et al.

2019 • In 6th Annual Conf. on Computational Science & Computational Intelligence, Las Vegas 5-7 December 2019

Peer reviewed

Permalink
https://hdl.handle.net/10993/41079

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

csci_cameraready_2019.pdf

Author postprint (6.01 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

View-Invariant; Human Action Recognition; Human Pose Estimation

Abstract :

[en] This work presents a new view-invariant action recognition system that is able to classify human actions by using a single RGB camera, including challenging camera viewpoints. Understanding actions from different viewpoints remains an extremely challenging problem, due to depth ambiguities, occlusion, and a large variety of appearances and scenes. Moreover, using only the information from the 2D perspective gives different interpretations for the same action seen from different viewpoints. Our system operates in two subsequent stages. The first stage estimates the 2D human pose using a convolution neural network. In the next stage, the 2D human poses are lifted to 3D human poses, using a temporal convolution neural network that enforces the temporal coherence over the estimated 3D poses. The estimated 3D poses from different viewpoints are then aligned to the same camera reference frame. Finally, we propose to use a temporal convolution network-based classifier for cross-view action recognition. Our results show that we can achieve state of art view-invariant action recognition accuracy even for the challenging viewpoints by only using RGB videos, without pre-training on synthetic or motion capture data.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM

Disciplines :

Computer science

Author, co-author :

Adel Musallam, Mohamed

BAPTISTA, Renato ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

AL ISMAEIL, Kassem ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

External co-authors :

Language :

English

Title :

Temporal 3D Human Pose Estimation for Action Recognition from Arbitrary Viewpoints

Publication date :

December 2019

Event name :

6th Annual Conf. on Computational Science & Computational Intelligence

Event organizer :

https://americancse.org/events/csci2019

Event date :

5-7 December 2019

Audience :

International

Main work title :

6th Annual Conf. on Computational Science & Computational Intelligence, Las Vegas 5-7 December 2019

Publisher :

Conference Publishing Services

Peer reviewed :

Peer reviewed

Focus Area :

Computational Sciences

European Projects :

H2020 - 689947 - STARR - Decision SupporT and self-mAnagement system for stRoke survivoRs

FnR Project :

FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten

Funders :

CE - Commission Européenne [BE]

Available on ORBilu :

since 30 November 2019

Statistics

Number of views

360 (16 by Unilu)

Number of downloads

409 (9 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

K. Papadopoulos, M. Antunes, D. Aouada, and B. Ottersten, "Enhanced trajectory-based action recognition using human pose, " in 2017 IEEE International Conference on Image Processing (ICIP), pp. 1807-1811, IEEE, 2017.
K. Papadopoulos, M. Antunes, D. Aouada, and B. Ottersten, "A revisit of action detection using improved trajectories, " in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2067-2071, IEEE, 2018.
A. E. R. Shabayek, R. Baptista, K. Papadopoulos, G. Demisse, O. Oyedotun, M. Antunes, D. Aouada, B. Ottersten, M. Anastassova, M. Boukallel, S. Panëels, G. Randall, M. Andre, A. Douchet, S. Bouilland, and L. O. Fernandez, "Starr-decision support and selfmanagement system for stroke survivors vision based rehabilitation system, " in European Project Space on Networks, Systems and Technologies-Volume 1: EPS Porto 2017, , pp. 69-80, INSTICC, SciTePress, 2017.
H. Wang, A. Kläser, C. Schmid, and L. Cheng-Lin, "Action recognition by dense trajectories, " in CVPR 2011-IEEE Conference on Computer Vision & Pattern Recognition, pp. 3169-3176, IEEE, 2011.
L. Xia, C.-C. Chen, and J. K. Aggarwal, "View invariant human action recognition using histograms of 3d joints, " in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20-27, IEEE, 2012.
R. Baptista, E. Ghorbel, K. Papadopoulos, G. Demisse, D. Aouada, and B. Ottersten, "View-invariant action recognition from rgb data via 3d pose estimation, " in IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12-17 May 2019, 2019.
E. Ghorbel, K. Papadopoulos, R. Baptista, H. Pathak, G. Demisse, D. Aouada, and B. Ottersten, "A view-invariant framework for fast skeleton-based action recognition using a single rgb camera, " in 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, 25-27 February 2018, 2019.
A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham, "3d pose from motion for cross-view action recognition via non-linear circulant temporal encoding, " in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2601-2608, 2014.
H. Rahmani and A. Mian, "Learning a non-linear knowledge transfer model for cross-view action recognition, " in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2458-2466, 2015.
K. Papadopoulos, E. Ghorbel, R. Baptista, D. Aouada, and B. Ottersten, "Two-stage rgb-based action detection using augmented 3d poses, " in Computer Analysis of Images and Patterns (M. Vento and G. Percannella, eds.), (Cham), pp. 26-35, Springer International Publishing, 2019.
D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas, and C. Theobalt, "Vnect: Real-Time 3d human pose estimation with a single rgb camera, " ACM Transactions on Graphics, vol. 36, no. 4, 2017.
D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, "3d human pose estimation in video with temporal convolutions and semi-supervised training, " arXiv preprint arXiv:1811.11742, 2018.
H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "Rmpe: Regional multi-person pose estimation, " in ICCV, 2017.
A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation, " Lecture Notes in Computer Science, p. 483-499, 2016.
J. Martinez, R. Hossain, J. Romero, and J. J. Little, "A simple yet effective baseline for 3d human pose estimation, " in ICCV, 2017.
D. Weinland, R. Ronfard, and E. Boyer, "Free viewpoint action recognition using motion history volumes, " CVIU, vol. 104, no. 2-3, pp. 249-257, 2006.
J. Redmon and A. Farhadi, "Yolov3: An incremental improvement, " arXiv, 2018.
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, "Meta-learning with temporal convolutions, " CoRR, vol. abs/1707.03141, 2017.
G. Rogez, P. Weinzaepfel, and C. Schmid, "LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images, " IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
X. Zhou, Q. Huang, X. Sun, X. Xue, and Y. Wei, "Towards 3d human pose estimation in the wild: A weakly-supervised approach, " in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
E. Ghorbel, J. Boonaert, R. Boutteau, S. Lecoeuche, and X. Savatier, "An extension of kernel learning methods using a modified log-euclidean distance for fast and accurate skeleton-based human action recognition, " Computer Vision and Image Understanding, 09 2018.
K. Lee, I. Lee, and S. Lee, "Propagating lstm: 3d pose estimation based on joint interdependency, " in Proceedings of the European Conference on Computer Vision (ECCV), pp. 119-135, 2018.
S. Bai, J. Z. Kolter, and V. Koltun, "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, " arXiv:1803.01271, 2018.
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, "A simple neural attentive meta-learner, " 2017.
M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, "A real-Time algorithm for signal analysis with the help of the wavelet transform, " in Wavelets, pp. 286-297, Springer, 1990.
A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio, " arXiv preprint arXiv:1609.03499, 2016.
F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions, " arXiv preprint arXiv:1511.07122, 2015.
N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, and K. Kavukcuoglu, "Neural machine translation in linear time, " arXiv preprint arXiv:1610.10099, 2016.
S. Hochreiter and J. Schmidhuber, "Long short-Term memory, " Neural Comput., vol. 9, pp. 1735-1780, Nov. 1997.
K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoder-decoder approaches, " CoRR, vol. abs/1409.1259, 2014.
Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, "Cascaded pyramid network for multi-person pose estimation, " 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018.
B. Li, O. I. Camps, and M. Sznaier, "Cross-view activity recognition using hankelets, " in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1362-1369, IEEE, 2012.
R. Li and T. Zickler, "Discriminative virtual views for cross-view action recognition, " in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2855-2862, IEEE, 2012.
Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu, and C. Shi, "Crossview action recognition via a continuous virtual path, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2690-2697, 2013.