Bias; Big data; Covariate shift; Machine learning; Non-technical losses
Résumé :
[en] Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to 40% of the total electricity distributed. In order to detect NTLs, machine learning methods are used that learn irregular consumption patterns from customer data and inspection results. The Big Data paradigm followed in modern machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. However, the sample of inspected customers may be biased, i.e. it does not represent the population of all customers. As a consequence, machine learning models trained on these inspection results are biased as well and therefore lead to unreliable predictions of whether customers cause NTL or not. In machine learning, this issue is called covariate shift and has not been addressed in the literature on NTL detection yet. In this work, we present a novel framework for quantifying and visualizing covariate shift. We apply it to a commercial data set from Brazil that consists of 3.6M customers and 820K inspection results. We show that some features have a stronger covariate shift than others, making predictions less reliable. In particular, previous inspections were focused on certain neighborhoods or customer classes and that they were not sufficiently spread among the population of customers. This framework is about to be deployed in a commercial product for NTL detection.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
GLAUNER, Patrick ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
MIGLIOSI, Angelo ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
MEIRA, Jorge Augusto ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
VALTCHEV, Petko ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
STATE, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
A. Chauhan and S. Rajvanshi, "Non-technical losses in power system: A review," in Power, Energy and Control (ICPEC), 2013 International Conference on. IEEE, 2013, pp. 558-561
P. Glauner, A. Boechat, L. Dolberg et al., "The challenge of nontechnical loss detection using artificial intelligence: A survey," arXiv preprint arXiv:1606.00626, 2016
S. S. S. R. Depuru, L. Wang, V. Devabhaktuni, and R. C. Green, "High performance computing for detection of electricity theft," International Journal of Electrical Power &Energy Systems, vol. 47, pp. 21-30, 2013
J. Nagi, K. S. Yap, S. K. Tiong et al., "Improving svm-based nontechnical loss detection in power utility using the fuzzy inference system," IEEE Transactions on power delivery, vol. 26, no. 2, pp. 1284-1285, 2011
B. Zadrozny, "Learning and evaluating classifiers under sample selection bias," in Proceedings of the twenty-first international conference on Machine learning. ACM, 2004, p. 114
C. d. Oliveira, N. Kagan, A. Meffe et al., "A new method for the computation of technical losses in electrical power distribution systems," in Electricity Distribution, 2001. Part 1: Contributions. CIRED. 16th International Conference and Exhibition on (IEE Conf. Publ No. 482), vol. 5. IET, 2001
B. C. Costa, B. L. Alberto, A. M. Portela et al., "Fraud detection in electric power distribution networks using an ann-based knowledgediscovery process," International Journal of Artificial Intelligence &Applications, vol. 4, no. 6, p. 17, 2013
C. C. O. Ramos, A. N. Souza, D. S. Gastaldello et al., "Identification and feature selection of non-technical losses for industrial consumers using the software weka," in Industry Applications (INDUSCON), 2012 10th IEEE/IAS International Conference on. IEEE, 2012, pp. 1-6
P. Glauner, A. Boechat, L. Dolberg et al., "Large-scale detection of non-technical losses in imbalanced data sets," in Innovative Smart Grid Technologies Conference (ISGT), 2016 IEEE Power &Energy Society. IEEE, 2016
P. Glauner, J. Meira, L. Dolberg et al., "Neighborhood features help detecting non-technical losses in big data sets," in 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016), 2016
J. Meira, P. Glauner, R. State et al., "Distilling provider-independent data for general detection of non-technical losses," in Power and Energy Conference at Illinois (PECI), 2017. IEEE, 2017
T. Harford, "Big data: are we making a big mistake? ft magazine," http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html, 2014, [Online; accessed January 15, 2016]
C. Cortes and M. Mohri, "Domain adaptation and sample bias correction theory and algorithm for regression," Theoretical Computer Science, vol. 519, pp. 103-126, 2014
M. Banko and E. Brill, "Scaling to very very large corpora for natural language disambiguation," in Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2001, pp. 26-33
J. R. Quinlan, "C4. 5: Programming for machine learning," Morgan Kauffmann, p. 38, 1993
B. W. Matthews, "Comparison of the predicted and observed secondary structure of t4 phage lysozyme," Biochimica et Biophysica Acta (BBA)-Protein Structure, vol. 405, no. 2, pp. 442-451, 1975
F. J. Martin, "A simple machine learning method to detect covariate shift," https://blog.bigml.com/2014/01/03/simple-machine-learningto-detect-covariate-shift/, 2014, [Online; accessed January 15, 2016]
F. Pedregosa, G. Varoquaux, A. Gramfort et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011
Met Office, Cartopy: a cartographic python library with a matplotlib interface, Exeter, Devon, 2010-2015. [Online]. Available: http://scitools.org.uk/cartopy
A. Baddeley, I. Barany, and R. Schneider, "Spatial point processes and their applications," Stochastic Geometry: Lectures given at the CIME Summer School held in Martina Franca, Italy, September 13-18, 2004, pp. 1-75, 2007