[en] Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.
Disciplines :
Computer science
Author, co-author :
Glauner, Patrick ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Boechat, Andre; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Dolberg, Lautaro; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
State, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
A. Abraham, "Adaptation of Fuzzy Inference System Using Neural Learning", Fuzzy Systems Engineering, Studies in Fuzziness and Soft Computing, vol. 181, pp. 53-83, July 2005.
E. dos Angelos, O. Saavedra, O. Cortes and A. De Souza, "Detection and identification of abnormalities in customer consumptions in power distribution systems", IEEE Transactions on Power Delivery, vol. 26, no. 4, pp. 2436-2442, 2011.
Plamen Angelov, "Autonomous Learning Systems: From Data Streams to Knowledge in Real-time", Wiley, ISBN: 978-1-119-95152-0, December 2012.
L. J. Cao and F. E. H. Tay, "Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting", IEEE Transactions on Neural Networks, vol. 14, no. 6, pp. 1506-1518, November 2003.
Chih-Chung Chang and Chih-Jen Lin, "LIBSVM: A library for support vector machines", ACM Transactions on Intelligent Systems and Technology, vol. 2, issue 3, pp. 27:1-27:27, 2011.
S. S. S. R. Depuru, L. Wang, V. Devabhaktuni and R. C. Green, "High Performance Computing for Detection of Electricity Theft", International Journal of Electrical Power & Energy Systems, vol. 47, issue I, pp. 21-30, May 2013.
V. Ford, A. Siraj and W. Eberle, "Smart grid energy fraud detection using artificial neural networks", IEEE Symposium on Computational Intelligence Applications in Smart Grid ( CIASG), pp. 1-6, 9-12, December 2014.
D. Gorgevik, D. Cakmakov and V. Radevski, "Handwritten digit recognition using statistical and rule-based decision fusion", 11th Mediterranean Electrotechnical Conference (MELECON), pp.131-135, 2002.
w. B. van den Hout, "The area under an ROC curve with limited information", Medical Decision Making, vol. 23, issue 1, pp. 160-166, March-April 2003.
Y Kou, c.-T. Lu, S. Sirwongwattana and Y-P. Huang, "Survey of fraud detection techniques", IEEE International Conference on Networking, Sensing and Control, vol. 2, pp. 749-754, 2004.
M. Di Martino, F. Decia, J. Molinelli and Alicia Fernandez, "Improving electric fraud detection using class imbalance strategies", 2012.
J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed and A. M. Mohammad, "Detection of abnormalities and electricity theft using genetic Support Vector Machines," IEEE Region 10 Conference on TENCON 2008, pp. 1-6, November 2008.
J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed and F. Nagi, "Improving SVM-Based Nontechnical Loss Detection in Power Utility Using the Fuzzy Inference System", IEEE Transactions on Power Delivery, vol. 26, issue 2, pp. 1284-1285, April 2011.
J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed and M. Mohamad, "Nontechnical loss detection for metered customers in power utility using support vector machines", IEEE Transactions on Power Delivery, vol. 25, issue 2, pp. 1162-1171, Apr. 2010.
A. Ng, "Machine Learning", Coursera, 2014.
c. c. B. de Oliveira, N. Kagan, A. Meffe, S. L. Caparroz and J. L. Cavaretti, "A New Method for the Computation of Technical Losses in Electrical Power Distribution Systems", Proceedings CIRED, 2001.
C. Muniz, K. Figueiredo, M. M. B. R. Vellasco, G. Chavez and M. A. C. Pacheco, "Irregularity detection on low tension electric installations by neural network ensembles", IEEE-INNS-ENNS International Joint Conference on Neural Networks, June 2009.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, V., B. Thirion, et aI., "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
c. c. O. Ramos, A. N. Souza, J. P. Papa and A. X. Falcao, "Fast Non-Technical Losses Identification Through Optimum-Path Forest", 15th International Conference on Intelligent System Applications to Power Systems (lSAP), November 2009.
S. Sahoo, D. Nikovski, T. Muso and K. Tsuru, "Electricity theft detection using smart meter data," IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (lSGT), pp. 1-5, February 2015.
Lala Septem Riza, Christoph Bergmeir, Francisco Herrera and Jose Manuel Benitez, "frbs: Fuzzy Rule-Based Systems for Classification and Regression in R", Journal of Statistical Software, vol. 65, issue 6, May 2015.
Y Tang, Y-Q. Zhang, N. V. Chawla and S. Krasser, "SVMs Modeling for Highly Imbalanced Classification", IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, issue 1, pp. 281-288, Feb. 2009.
V ladimir N. Vapnik, "An overview of statistical learning theory", IEEE Transactions on Neural Networks, vol. 10, issue 5, pp. 988-999, Sep. 1999.
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica, "Spark: cluster computing with working sets", HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 20 I O.