Time series classification; multiscale visibility graph; time series mining
Résumé :
[en] This paper presents a multiscale visibility graph representation for time series as well as feature extraction methods for time series classification (TSC). Unlike traditional TSC approaches that seek to find global similarities in time series databases (eg., Nearest Neighbor with Dynamic Time Warping distance) or methods specializing in locating local patterns/subsequences (eg., shapelets), we extract solely statistical features from graphs that are generated from time series. Specifically, we augment time series by means of their multiscale approximations, which are further transformed into a set of visibility graphs. After extracting probability distributions of small motifs, density, assortativity, etc., these features are used for building highly accurate classification models using generic classifiers (eg., Support Vector Machine and eXtreme Gradient Boosting). Thanks to the way how we transform time series into graphs and extract features from them, we are able to capture both global and local features from time series. Based on extensive experiments on a large number of open datasets and comparison with five state-of-the-art TSC algorithms, our approach is shown to be both accurate and efficient: it is more accurate than Learning Shapelets and at the same time faster than Fast Shapelets.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
LI, Daoyuan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)
LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Extracting Statistical Graph Features for Accurate and Efficient Time Series Classification
Date de publication/diffusion :
mars 2018
Nom de la manifestation :
21st International Conference on Extending Database Technology
Lieu de la manifestation :
Vienna, Autriche
Date de la manifestation :
from 26-03-2018 to 29-03-2018
Manifestation à portée :
International
Titre de l'ouvrage principal :
21st International Conference on Extending Database Technology
Peyman Afshani, Mark de Berg, Henri Casanova, Benjamin Karsin, Colin Lam-brechts, Nodari Sitchinava, and Constantinos Tsirogiannis. 2017. An Efficient Algorithm for the 1D Total Visibility-Index Problem. In 2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 218-231.
Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, and Nick Duffield. 2015. Efficient Graphlet Counting for Large Networks. In ICDM. 1-10.
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606-660.
Anthony Bagnall, Jason Lines, Jon Hills, and Aaron Bostrom. 2015. Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2522-2535.
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
Gustavo EAPA Batista, Xiaoyue Wang, and Eamonn Keogh. 2011. A Complexity-Invariant Distance Measure for Time Series.. In SDM, Vol. 11. 699-710.
Donald J Berndt and James Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series.. In KDD workshop, Vol. 10. 359-370.
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. (July 2015). www.cs.ucr.edu/~eamonn/time_series_data/.
Albert Cohen, Ingrid Daubechies, and Pierre Vial. 1993. Wavelets on the interval and fast wavelet transforms. Applied and computational harmonic analysis 1, 1 (1993), 54-81.
Luciano da F Costa, Francisco A Rodrigues, Gonzalo Travieso, and Paulino Ribeiro Villas Boas. 2007. Characterization of complex networks: A survey of measurements. Advances in physics 56, 1 (2007), 167-242.
Olive Jean Dunn. 1964. Multiple comparisons using rank sums. Technometrics 6, 3 (1964), 241-252.
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res 15, 1 (2014), 3133-3181.
Thomas Gärtner, Peter Flach, and Stefan Wrobel. 2003. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines (2003), 129-143.
Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. 2014. Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 392-401.
Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (2014), 851-881.
Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time series classification under more realistic assumptions. In Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 578-586.
Jacopo Iacovacci and Lucas Lacasa. 2016. Sequential motif profile of natural visibility graphs. Physical Review E 94, 5 (2016), 052309.
Eamonn Keogh. 1997. Fast similarity search in the presence of longitudinal scaling in time series databases. In Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on. IEEE, 578-584.
Eamonn Keogh and Michael Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 285-289.
Risi Kondor and Horace Pan. 2016. The multiscale Laplacian graph kernel. In Advances in Neural Information Processing Systems. 2990-2998.
Max Kuhn and Kjell Johnson. 2013. Applied predictive modeling. Vol. 810. Springer.
Lucas Lacasa, Bartolo Luque, Fernando Ballesteros, Jordi Luque, and Juan Carlos Nuno. 2008. From time series to complex networks: The visibility graph. Proceedings of the National Academy of Sciences 105, 13 (2008), 4972-4975.
Lucas Lacasa, Bartolo Luque, Jordi Luque, and Juan Carlos Nuno. 2009. The visibility graph: A new method for estimating the Hurst exponent of fractional Brownian motion. EPL (Europhysics Letters) 86, 3 (2009), 30001.
Daoyuan Li, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. DSCo-NG: A Practical Language Modeling Approach for Time Series Classification. In The 15th International Symposium on Intelligent Data Analysis.
Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery 15, 2 (2007), 107-144.
Jessica Lin, Rohan Khade, and Yuan Li. 2012. Rotation-invariant similarity in time series using bag-of-patterns representation. Journal of Intelligent Information Systems 39, 2 (2012), 287-315.
Bartolo Luque, Lucas Lacasa, Fernando Ballesteros, and Jordi Luque. 2009. Horizontal visibility graphs: Exact results for random time series. Physical Review E 80, 4 (2009), 046103.
Mark EJ Newman and Michelle Girvan. 2003. Mixing patterns and community structure in networks. Statistical mechanics of complex networks (2003), 66-87.
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 262-270.
Thanawin Rakthanmanon and Eamonn Keogh. 2013. Fast shapelets: A scalable algorithm for discovering time series shapelets. In Proceedings of the thirteenth SIAM conference on data mining.
Karl Johan Åström. 1969. On the choice of sampling rates in parametric identification of time series. Information Sciences 1, 3 (1969), 273-278.
Chotirat Ann Ratanamahatana and Eamonn Keogh. 2005. Three myths about dynamic time warping data mining. In Proceedings of SIAM International Conference on Data Mining. 506-510.
Pedro Ribeiro, Fernando Silva, and Luis Lopes. 2010. Efficient parallel subgraph counting using g-tries. In Cluster Computing (CLUSTER), 2010 IEEE International Conference on. IEEE, 217-226.
Patrick Schäfer. 2015. The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery 29, 6 (2015), 1505-1530.
Pavel Senin and Sergey Malinchik. 2013. SAX-VSM: Interpretable time series classification using SAX and vector space model. In IEEE 13th International Conference on Data Mining. IEEE, 1175-1180.
Joseph Sill, Gábor Takács, Lester Mackey, and David Lin. 2009. Feature-weighted linear stacking. arXiv preprint arXiv:0911.0460 (2009).
Supriya Supriya, Siuly Siuly, Hua Wang, Jinli Cao, and Yanchun Zhang. 2016. Weighted visibility graph with complex network features in the detection of epilepsy. IEEE Access 4 (2016), 6554-6566.
Xing Wang, Jessica Lin, Pavel Senin, Tim Oates, Sunil Gandhi, Arnold P Boedi-hardjo, Crystal Chen, and Susan Frankenstein. 2016. RPM: Representative Pattern Mining for Efficient Time Series Classification. In Proceedings of the 19th International Conference on Extending Database Technology.
Michael Wojnowicz, Glenn Chisholm, Brian Wallace, Matt Wolff, Xuan Zhao, and Jay Luan. 2017. SUSPEND: Determining software suspiciousness by non-stationary time series modeling of entropy signals. Expert Systems with Applications 71 (2017), 301-318.
David H Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241-259.
Xiaoke Xu, Jie Zhang, and Michael Small. 2008. Superfamily phenomena and motifs of networks induced from time series. Proceedings of the National Academy of Sciences 105, 50 (2008), 19601-19605.
Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition.. In IJCAI. 3995-4001.
Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 947-956.