[en] Action recognition using dense trajectories is a popular concept. However, many spatio-temporal characteristics of the trajectories are lost in the final video representation when using a single Bag-of-Words model. Also, there is a significant amount of extracted trajectory features that are actually irrelevant to the activity being analyzed, which can considerably degrade the recognition performance. In this paper, we propose a human-tailored trajectory extraction scheme, in which
trajectories are clustered using information from the human pose. Two configurations are considered; first, when exact skeleton joint positions are provided, and second, when only an estimate thereof is available. In both cases, the proposed method is further strengthened by using the concept of local Bag-of-Words, where a specific codebook is generated for each skeleton joint group. This has the advantage of adding spatial human pose awareness in the video representation, effectively increasing its discriminative power. We experimentally compare the proposed method with the standard dense trajectories approach on two challenging datasets.
Interdisciplinary Centre for Security, Reliability and Trust (SnT)