References of "Duarte, Diogo"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailOn the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage
Glauner, Patrick UL; State, Radu UL; Valtchev, Petko et al

in Proceedings 13th International FLINS Conference on Data Science and Knowledge Engineering for Sensing Decision Support (FLINS 2018) (2018)

In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and ... [more ▼]

In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for reducing multiple biases in high-dimensional data sets in order to train more reliable predictors. We apply our methodology to the detection of irregular power usage from real, noisy industrial data. In emerging markets, irregular power usage, and electricity theft in particular, may range up to 40% of the total electricity distributed. Biased data sets are of particular issue in this domain. We show that reducing these biases increases the accuracy of the trained predictors. Our models have the potential to generate significant economic value in a real world application, as they are being deployed in a commercial software for the detection of irregular power usage. [less ▲]

Detailed reference viewed: 103 (5 UL)
Full Text
Peer Reviewed
See detailIdentifying Irregular Power Usage by Turning Predictions into Holographic Spatial Visualizations
Glauner, Patrick UL; Dahringer, Niklas; Puhachov, Oleksandr et al

in Proceedings of the 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017) (2017)

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging ... [more ▼]

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant to move to large-scale deployments of automated systems that learn NTL profiles from data due to the latter's propensity to suggest a large number of unnecessary inspections. In this paper, we propose a novel system that combines automated statistical decision making with expert knowledge. First, we propose a machine learning framework that classifies customers into NTL or non-NTL using a variety of features derived from the customers' consumption data. The methodology used is specifically tailored to the level of noise in the data. Second, in order to allow human experts to feed their knowledge in the decision loop, we propose a method for visualizing prediction results at various granularity levels in a spatial hologram. Our approach allows domain experts to put the classification results into the context of the data and to incorporate their knowledge for making the final decisions of which customers to inspect. This work has resulted in appreciable results on a real-world data set of 3.6M customers. Our system is being deployed in a commercial NTL detection software. [less ▲]

Detailed reference viewed: 154 (25 UL)
Full Text
Peer Reviewed
See detailDistilling Provider-Independent Data for General Detection of Non-Technical Losses
Meira, Jorge Augusto UL; Glauner, Patrick UL; State, Radu UL et al

in Power and Energy Conference, Illinois 23-24 February 2017 (2017)

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in ... [more ▼]

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in emerging countries. Developed countries, even though usually in smaller amounts, have to deal with NTL issues as well. In these countries the estimated annual losses are up to six billion USD. These facts have directed the focus of our work to the NTL detection. Our approach is composed of two steps: 1) We compute several features and combine them in sets characterized by four criteria: temporal, locality, similarity and infrastructure. 2) We then use the sets of features to train three machine learning classifiers: random forest, logistic regression and support vector vachine. Our hypothesis is that features derived only from provider-independent data are adequate for an accurate detection of non-technical losses. [less ▲]

Detailed reference viewed: 199 (37 UL)
Full Text
Peer Reviewed
See detailLarge-Scale Detection of Non-Technical Losses in Imbalanced Data Sets
Glauner, Patrick UL; Boechat, Andre; Dolberg, Lautaro et al

in Proceedings of the Seventh IEEE Conference on Innovative Smart Grid Technologies (ISGT 2016) (2016)

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires ... [more ▼]

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets. [less ▲]

Detailed reference viewed: 135 (9 UL)
Full Text
Peer Reviewed
See detailNeighborhood Features Help Detecting Non-Technical Losses in Big Data Sets
Glauner, Patrick UL; Meira, Jorge Augusto UL; Dolberg, Lautaro et al

in Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016) (2016)

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non ... [more ▼]

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions. [less ▲]

Detailed reference viewed: 142 (11 UL)