[en] Action recognition using dense trajectories is a popular concept. However, many spatio-temporal characteristics of the trajectories are lost in the final video representation when using a single Bag-of-Words model. Also, there is a significant amount of extracted trajectory features that are actually irrelevant to the activity being analyzed, which can considerably degrade the recognition performance. In this paper, we propose a human-tailored trajectory extraction scheme, in which
trajectories are clustered using information from the human pose. Two configurations are considered; first, when exact skeleton joint positions are provided, and second, when only an estimate thereof is available. In both cases, the proposed method is further strengthened by using the concept of local Bag-of-Words, where a specific codebook is generated for each skeleton joint group. This has the advantage of adding spatial human pose awareness in the video representation, effectively increasing its discriminative power. We experimentally compare the proposed method with the standard dense trajectories approach on two challenging datasets.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT)
Disciplines :
Sciences informatiques
Auteur, co-auteur :
PAPADOPOULOS, Konstantinos ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Enhanced Trajectory-based Action Recognition using Human Pose
Date de publication/diffusion :
2017
Nom de la manifestation :
2017 IEEE International Conference on Image Processing
Lieu de la manifestation :
Beijing, Chine
Date de la manifestation :
September 17-20, 2017
Manifestation à portée :
International
Titre de l'ouvrage principal :
IEEE International Conference on Image Processing, Beijing 17-20 Spetember 2017
Peer reviewed :
Peer reviewed
Projet FnR :
FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten
Samitha Herath, Mehrtash Tafazzoli Harandi, and Fatih Porikli, "Going deeper into action recognition: A survey, " CoRR, vol. Abs/1605. 04988, 2016.
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features, " in Proceedings of the 14th International Conference on Computer Communications and Networks, Washington, DC, USA, 2005, pp. 65-72.
Alexander Klaser, Marcin Marszalek, and Cordelia Schmid, "A Spatio-Temporal Descriptor Based on 3DGradients, " in BMVC 2008-19th British Machine Vision Conference, Leeds, United Kingdom, Sept. 2008, pp. 275: 1-10.
Cordelia Schmid, Benjamin Rozenfeld, Marcin Marszalek, and Ivan Laptev, "Learning realistic human actions from movies, " IEEE Conference on Computer Vision & Pattern Recognition, pp. 1-8, 2008.
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu, "Action Recognition by Dense Trajectories, " in IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, June 2011, pp. 3169-3176.
Ju Sun, Xiao Wu, Shuicheng Yan, Loong-Fah Cheong, Tat-Seng Chua, and Jintao Li, "Hierarchical spatiotemporal context modeling for action recognition, " in IEEE Conference on Computer Vision & Pattern Recognition, June 2009.
Guillermo Garcia-Hernando, Hyung Jin Chang, Ismael Serrano, Oscar Deniz, and Tae-Kyun Kim, "Transition Hough Forest for Trajectory-based Action Recognition, " in IEEE Winter Conference on Applications of Computer Vision, Mar. 2016.
Ivan Lillo, Alvaro Soto, and Juan Carlos Niebles, "Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities, " in IEEE Conference on Computer Vision & Pattern Recognition, June 2014.
Chunyu Wang, Yizhou Wang, and Alan L. Yuille, "An approach to pose-based action recognition, " in IEEE Conference on Computer Vision & Pattern Recognition, June 2013.
Guilhem Chéron, Ivan Laptev, and Cordelia Schmid, "P-CNN: Pose-based CNN Features for Action Recognition, " in IEEE International Conference on Computer Vision, Dec. 2015.
Alejandro Newell, Kaiyu Yang, and Jia Deng, "Stacked hourglass networks for human pose estimation, " CoRR, vol. Abs/1603. 06937, 2016.
Michalis Raptis, Iasonas Kokkinos, and Stefano Soatto, "Discovering Discriminative Action Parts from Mid-Level Video Representations, " in IEEE Conference on Computer Vision & Pattern Recognition, 2012.
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, " in IEEE Conference on Computer Vision & Pattern Recognition, June 2006.
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan, "Mining Actionlet Ensemble for Action Recognition with Depth Cameras, " in IEEE Conference on Computer Vision & Pattern Recognition, Providence, Rhode Island, United States, June 2012.
Salvatore Gaglio, Giuseppe Lo Re, and Marco Morana, "Human activity recognition process using 3-d posture data, " IEEE Transactions Human-Machine Systems, vol. 45, pp. 586-597, 2015.
Michal Koperski, Piotr Bilinski, and Francois Bremond, "3D Trajectories for Action Recognition, " in IEEE International Conference on Image Processing, Paris, France, Oct. 2014.