View-invariant; Human action recognition; monocular camera; pose estimation
Abstract :
[en] View-invariant action recognition using a single RGB camera represents a very challenging topic due to the lack of 3D information in RGB images. Lately, the recent advances in deep learning made it possible to extract a 3D skeleton from a single RGB image.
Taking advantage of this impressive progress, we propose a simple framework for fast and view-invariant action recognition using a single RGB camera. The proposed pipeline can be seen as the association of two key steps. The first step is the estimation of a 3D skeleton from a single RGB image using a CNN-based pose estimator such as VNect. The second one aims at computing view-invariant skeleton-based features based on the estimated 3D skeletons. Experiments are conducted on two well-known benchmarks, namely, IXMAS and Northwestern-UCLA datasets. The obtained results prove the validity of our concept, which suggests a new way to address the challenge of RGB-based view-invariant action recognition.
Disciplines :
Computer science
Author, co-author :
GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
PAPADOPOULOS, Konstantinos ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
BAPTISTA, Renato ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Pathak, Himadri
Demisse, Girum
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
no
Language :
English
Title :
A View-invariant Framework for Fast Skeleton-based Action Recognition Using a Single RGB Camera
Publication date :
February 2019
Event name :
14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
Event date :
from 25-02-2019 to 27-02-2019
Audience :
International
Main work title :
14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, 25-27 February 2018
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
FnR Project :
FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten
Aggarwal, J. K. and Xia, L. (2014). Human activity recognition from 3d data: A review. Pattern Recognition Letters, 48:70–80.
Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. (2014). 2d human pose estimation: New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.
Baptista, R., Antunes, M., Aouada, D., and Ottersten, B. (2018). Anticipating suspicious actions using a small dataset of action templates. In 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP).
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., and Black, M. J. (2016). Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Computer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05). IEEE.
Dalal, N., Triggs, B., and Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In Proceedings of the 9th European Conference on Computer Vision - Volume Part II, ECCV’06, pages 428–441, Berlin, Heidelberg. Springer-Verlag.
Evangelidis, G., Singh, G., and Horaud, R. (2014). Skeletal quads: Human action recognition using joint quadruples. In Pattern Recognition (ICPR), 2014 22nd International Conference on, pages 4513–4518. IEEE.
Fernando, B., Gavves, E., Oramas, J. M., Ghodrati, A., and Tuytelaars, T. (2015). Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5378–5387.
Ghorbel, E., Boutteau, R., Bonnaert, J., Savatier, X., and Lecoeuche, S. (2016). A fast and accurate motion descriptor for human action recognition applications. In Pattern Recognition (ICPR), 2016 23rd International Conference on, pages 919–924. IEEE.
Ghorbel, E., Boutteau, R., Boonaert, J., Savatier, X., and Lecoeuche, S. (2018). Kinematic spline curves: A temporal invariant descriptor for fast action recognition. Image and Vision Computing, 77:60–71.
Gupta, A., Martinez, J., Little, J. J., and Woodham, R. J. (2014). 3d pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.
Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., and Fei-Fei, L. (2016). Towards viewpoint invariant 3d human pose estimation. In European Conference on Computer Vision, pages 160–177. Springer.
Hsu, Y.-P., Liu, C., Chen, T.-Y., and Fu, L.-C. (2016). Online view-invariant human action recognition using rgb-d spatio-temporal matrix. Pattern recognition, 60:215–226.
Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339.
Lea, C., Hager, G. D., and Vidal, R. (2015). An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In Applications of computer vision (WACV), 2015 IEEE winter conference on, pages 1123–1129. IEEE.
Li, B., Camps, O. I., and Sznaier, M. (2012). Cross-view activity recognition using hankelets. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1362–1369. IEEE.
Li, R. and Zickler, T. (2012). Discriminative virtual views for cross-view action recognition. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2855–2862. IEEE.
Lv, F. and Nevatia, R. (2007). Single view human action recognition using key pose matching and viterbi path searching. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE.
Mehta, D., Rhodin, H., Casas, D., Sotnychenko, O., Xu, W., and Theobalt, C. (2016). Monocular 3d human pose estimation using transfer learning and improved CNN supervision. CoRR, abs/1611.09813.
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.-P., Xu, W., Casas, D., and Theobalt, C. (2017). Vnect: Real-time 3d human pose estimation with a single rgb camera. volume 36.
Papadopoulos, K., Antunes, M., Aouada, D., and Ottersten, B. (2017). Enhanced trajectory-based action recognition using human pose. In Image Processing (ICIP), 2017 IEEE International Conference on, pages 1807–1811. IEEE.
Parameswaran, V. and Chellappa, R. (2006). View invariance for human action recognition. International Journal of Computer Vision, 66(1):83–101.
Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis, K. (2017). Coarse-to-fine volumetric prediction for single-image 3d human pose. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 1263–1272. IEEE.
Poppe, R. (2010). A survey on vision-based human action recognition. Image and vision computing, 28(6):976–990.
Presti, L. L. and Cascia, M. L. (2016). 3d skeleton-based human action classification: A survey. Pattern Recognition, 53:130–147.
Rahmani, H., Mahmood, A., Huynh, D., and Mian, A. (2016). Histogram of oriented principal components for cross-view action recognition. IEEE transactions on pattern analysis and machine intelligence, 38(12):2430–2443.
Rahmani, H., Mahmood, A., Huynh, D. Q., and Mian, A. (2014). Hopc: Histogram of oriented principal components of 3d pointclouds for action recognition. In European conference on computer vision, pages 742–757. Springer.
Rahmani, H. and Mian, A. (2015). Learning a non-linear knowledge transfer model for cross-view action recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
Rao, C., Yilmaz, A., and Shah, M. (2002). View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2):203–226.
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Fi-nocchio, M., Blake, A., Cook, M., and Moore, R. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116.
Song, Y., Demirdjian, D., and Davis, R. (2012). Continuous body and hand gesture recognition for natural human-computer interaction. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(1):5.
Tekin, B., Rozantsev, A., Lepetit, V., and Fua, P. (2016). Direct prediction of 3d body poses from motion compensated sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 991–1000.
Trong, N. P., Minh, A. T., Nguyen, H., Kazunori, K., and Hoai, B. L. (2017). A survey about view-invariant human action recognition. In 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE). IEEE.
Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.
Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2011). Action recognition by dense trajectories. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3169–3176. IEEE.
Wang, H. and Schmid, C. (2013). Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision, pages 3551–3558.
Wang, J., Nie, X., Xia, Y., Wu, Y., and Zhu, S.-C. (2014). Cross-view action modeling, learning and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2649–2656.
Weinland, D., Boyer, E., and Ronfard, R. (2007). Action recognition from arbitrary views using 3d exemplars. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–7. IEEE.
Weinland, D., Ronfard, R., and Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3):249–257.
Xia, L., Chen, C.-C., and Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In Computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on, pages 20–27. IEEE.
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., and Wang, X. (2018). 3d human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1.
Yang, X. and Tian, Y. L. (2012). Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In Computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on, pages 14–19. IEEE.
Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S., and Shi, C. (2013). Cross-view action recognition via a continuous virtual path. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2690–2697.