DSCo-NG: A Practical Language Modeling Approach for Time Series Classification

LI, Daoyuan; BISSYANDE, Tegawendé François D Assise; KLEIN, Jacques; LE TRAON, Yves

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

DSCo-NG: A Practical Language Modeling Approach for Time Series Classification

LI, Daoyuan; BISSYANDE, Tegawendé François D Assise; KLEIN, Jacques et al.

2016 • In The 15th International Symposium on Intelligent Data Analysis

Peer reviewed

Permalink
https://hdl.handle.net/10993/27942

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

li2016dsco-ng.pdf

Author preprint (6.3 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others.

Disciplines :

Computer science

Author, co-author :

LI, Daoyuan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

BISSYANDE, Tegawendé François D Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)

LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

External co-authors :

Language :

English

Title :

DSCo-NG: A Practical Language Modeling Approach for Time Series Classification

Publication date :

October 2016

Event name :

The 15th International Symposium on Intelligent Data Analysis

Event date :

from 13-10-2016 to 15-10-2016

Audience :

International

Main work title :

The 15th International Symposium on Intelligent Data Analysis

Peer reviewed :

Peer reviewed

Focus Area :

Computational Sciences

Available on ORBilu :

since 07 July 2016

Statistics

Number of views

303 (25 by Unilu)

Number of downloads

416 (7 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Batista, G.E., Wang, X., Keogh, E.J.: A complexity-invariant distance measure for time series. In: SDM, vol. 11, pp. 699–710 (2011)
Baydogan, M.G., Runger, G., Tuv, E.: A bag-of-features framework to classify time series. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2796–2802 (2013)
Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, vol. 10, pp. 359–370 (1994)
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/∼eamonn/time seriesdata
Chung, F.L., Fu, T.C., Luk, R., Ng, V.: Flexible time series pattern matching based on perceptually important points. In: International Joint Conference on Artificial IntelligenceWorkshop on Learning from Temporal and Spatial Data, pp. 1–7 (2001)
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Keogh, E.: Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 578–584. IEEE (1997)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2004)
Li, D., Bissyande, T.F., Klein, J., Le Traon, Y.: Time series classification with discrete wavelet transformed data: insights from an empirical study. In: The 28th International Conference on Software Engineering and Knowledge Engineering (2016)
Li, D., Bissyande, T.F., Kubler, S., Klein, J., Le Traon, Y.: Profiling household appliance electricity usage with n-gram language modeling. In: The 2016 IEEE International Conference on Industrial Technology, Taipei, pp. 604–609. IEEE (2016)
Li, D., Li, L., Bissyande, T.F., Klein, J., Le Traon, Y.: DSCo: A language modeling approach for time series classification. In: The 12th International Conference on Machine Learning and Data Mining, New York (2016)
Li, Y., Lin, J.: Approximate variable-length time series motif discovery using grammar inference. In: Proceedings of the Tenth InternationalWorkshop on Multimedia Data Mining, p. 10 (2010)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: A novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)
Marteau, P.F.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)
Senin, P., et al.: GrammarViz 2.0: A tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS, vol. 8726, pp. 468–472. Springer, Heidelberg (2014). doi:10. 1007/978-3-662-44845-8_37
Senin, P., Malinchik, S.: SAX-VSM: interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining, pp. 1175–1180. IEEE (2013)
Serrà, J., Arcos, J.L.: An empirical evaluation of similarity measures for time series classification. Knowl. Based Syst. 67, 305–314 (2014)
Varrette, S., Bouvry, P., Cartiaux, H., Georgatos, F.: Management of an academic HPC cluster: the UL experience. In: Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014), Bologna, Italy, pp. 959–967. IEEE, July 2014
Wang, Q., Megalooikonomou, V.: A dimensionality reduction technique for efficient time series similarity analysis. Inf. Syst. 33(1), 115–132 (2008)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26(2), 275–309 (2013)
Wang, X., Lin, J., Senin, P., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: RPM: representative pattern mining for efficient time series classification. In: Proceedings of the 19th International Conference on Extending Database Technology (2016)
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)
Ye, L., Keogh, E.: Time series shapelets: A new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM (2009)