S. Abriola, P. Barceló, D. Figueira, and S. Figueira. Bisimulations on data graphs. Journal of Artificial Intelligence Research, 61:171-213, 2018.
S. Bagheri, J. Y. Zheng, and S. Sinha. Temporal mapping of surveillance video for indexing and summarization. Computer Vision and Image Understanding, 144:237-257, 2016.
K. Bougiatiotis and T. Giannakopoulos. Enhanced movie content similarity based on textual, auditory and visual information. Expert Systems with Applications, 96:86-102, 2018.
H. Bredin and G. Gelly. Improving speaker diarization of tv series using talking-face detection and clustering. In Proceedings of the 2016 ACM on Multimedia Conference, pages 157-161. ACM, 2016.
Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, 2018.
D. Cazzato, M. Leo, and C. Distante. A complete framework for fully-automatic people indexing in generic videos. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, pages 248-255. IEEE, 2014.
J. Y. Choi, K. N. Plataniotis, and Y. M. Ro. Face annotation for online personal videos using color feature fusion based face recognition. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 1190-1195. IEEE, 2010.
F. Chowdhury, Q. Wang, I. L. Moreno, and L. Wan. Attention-based models for text-dependent speaker verification. arXiv preprint arXiv:1710. 10470, 2017.
E. Corvee, F. Bremond, M. Thonnat, et al. Person reidentification using haar-based and dcd-based signature. In 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 1-8. IEEE, 2010.
Z. Dong, C. Jing, M. Pei, and Y. Jia. Deep cnn based binary hash video representations for face retrieval. Pattern Recog-nition, 81:357-369, 2018.
M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. Person re-identification by symmetry-driven accumulation of local features. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2360-2367. IEEE, 2010.
G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4069-4072. IEEE, 2009.
V. Gandhi and R. Ronfard. Detecting and naming actors in movies using generative appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3706-3713, 2013.
I. D. Gebru, S. Ba, X. Li, and R. Horaud. Audio-visual speaker diarization based on spatiotemporal Bayesian fusion. arXiv preprint arXiv:1603. 09725, 2016.
I. D. Gebru, S. Ba, X. Li, and R. Horaud. Audio-visual speaker diarization based on spatiotemporal Bayesian fusion. IEEE transactions on pattern analysis and machine intelligence, 40(5):1086-1099, 2018.
N. Gheissari, T. B. Sebastian, and R. Hartley. Person reidentification using spatiotemporal appearance. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2, pages 1528-1535. IEEE, 2006.
I. U. Haq, K. Muhammad, A. Ullah, and S. W. Baik. Deepstar: Detecting starring characters in movies. IEEE Access, 7:9265-9272, 2019.
J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. CoRR, abs/1709. 01507, 2017.
T. Huang and S. Russell. Object identification in a Bayesian context. In IJCAI, volume 97, pages 1276-1282, 1997.
K. Kim, Z. Yang, I. Masi, R. Nevatia, and G. Medioni. Face and body association for video-based face recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 39-48. IEEE, 2018.
P. Kulkarni, B. Patil, and B. Joglekar. An effective content based video analysis and retrieval using pattern indexing techniques. In Industrial Instrumentation and Control (ICIC), 2015 International Conference on, pages 87-92. IEEE, 2015.
M. Leo, P. Carcagnì, C. Distante, P. Spagnolo, P. Mazzeo, A. Rosato, S. Petrocchi, C. Pellegrino, A. Levante, F. De Lumè, et al. Computational assessment of facial expression production in asd children. Sensors, 18(11):3993, 2018.
P. Li, J. Xie, Z. Li, T. Liu, and W. Yan. Facial peculiarity retrieval via deep neural networks fusion. International Journal of Computational Intelligence Systems, 11(1):58-65, 2018.
X. Liu, J. Geng, H. Ling, and Y. ming Cheung. Attention guided deep audio-face fusion for efficient speaker naming. Pattern Recognition, 88:557-568, 2019.
N. E. Maliki, H. Silkan, and M. E. Maghri. Efficient indexing and similarity search using the geometric near-neighbor access tree (gnat) for face-images data. Procedia Computer Science, 148:600-609, 2019. THE SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES, ICDS2018.
T. D. Ngo, H. T. Vu, D.-D. Le, and S. Satoh. Face retrieval in large-scale news video datasets. IEICE TRANSACTIONS on Information and Systems, 96(8):1811-1825, 2013.
S. Pini, M. Cornia, F. Bolelli, L. Baraldi, and R. Cucchiara. M-vad names: a dataset for video captioning with naming. Multimedia Tools and Applications, Dec 2018.
J. Prinosil. Blind face indexing in video. In Telecommunications and Signal Processing (TSP), 2011 34th International Conference on, pages 575-578. IEEE, 2011.
M. Ravinder and T. Venugopal. Content-based video indexing and retrieval using key frames texture, edge and motion features. 2016.
E. Sánchez-Nielsen, F. Chávez-Gutiérrez, J. Lorenzo-Navarro, and M. Castrillón-Santana. A multimedia system to produce and deliver video fragments on demand on parliamentary websites. Multimedia Tools and Applications, 76(5):6281-6307, Mar 2017.
N. Sarafianos, T. Giannakopoulos, and S. Petridis. Audio-visual speaker diarization using fisher linear semidiscriminant analysis. Multimedia Tools and Applications, 75(1):115-130, 2016.
D. Simonnet, M. Lewandowski, S. A. Velastin, J. Orwell, and E. Turkbeyler. Re-identification of pedestrians in crowds using dynamic time warping. In European Conference on Computer Vision, pages 423-432. Springer, 2012.
J. Tang, Z. Li, and X. Zhu. Supervised deep hashing for scalable face image retrieval. Pattern Recognition, 75:25-32, 2018.
T. Wang, S. Gong, X. Zhu, and S. Wang. Person reidentification by video ranking. In European Conference on Computer Vision, pages 688-703. Springer, 2014.
J. You, A. Wu, X. Li, and W.-S. Zheng. Top-push videobased person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1345-1353, 2016.
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499-1503, 2016.
L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian. Mars: A video benchmark for large-scale person re-identification. In European Conference on Computer Vision, pages 868-884. Springer, 2016.
L. Zheng, Y. Yang, and A. G. Hauptmann. Person reidentification: Past, present and future. arXiv preprint arXiv:1610. 02984, 2016.