References of "State, Radu 50003137"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailIdentifying Irregular Power Usage by Turning Predictions into Holographic Spatial Visualizations
Glauner, Patrick UL; Dahringer, Niklas; Puhachov, Oleksandr et al

in Proceedings of the 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017) (2017)

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging ... [more ▼]

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant to move to large-scale deployments of automated systems that learn NTL profiles from data due to the latter's propensity to suggest a large number of unnecessary inspections. In this paper, we propose a novel system that combines automated statistical decision making with expert knowledge. First, we propose a machine learning framework that classifies customers into NTL or non-NTL using a variety of features derived from the customers' consumption data. The methodology used is specifically tailored to the level of noise in the data. Second, in order to allow human experts to feed their knowledge in the decision loop, we propose a method for visualizing prediction results at various granularity levels in a spatial hologram. Our approach allows domain experts to put the classification results into the context of the data and to incorporate their knowledge for making the final decisions of which customers to inspect. This work has resulted in appreciable results on a real-world data set of 3.6M customers. Our system is being deployed in a commercial NTL detection software. [less ▲]

Detailed reference viewed: 154 (25 UL)
Full Text
Peer Reviewed
See detailDistilling Provider-Independent Data for General Detection of Non-Technical Losses
Meira, Jorge Augusto UL; Glauner, Patrick UL; State, Radu UL et al

in Power and Energy Conference, Illinois 23-24 February 2017 (2017)

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in ... [more ▼]

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in emerging countries. Developed countries, even though usually in smaller amounts, have to deal with NTL issues as well. In these countries the estimated annual losses are up to six billion USD. These facts have directed the focus of our work to the NTL detection. Our approach is composed of two steps: 1) We compute several features and combine them in sets characterized by four criteria: temporal, locality, similarity and infrastructure. 2) We then use the sets of features to train three machine learning classifiers: random forest, logistic regression and support vector vachine. Our hypothesis is that features derived only from provider-independent data are adequate for an accurate detection of non-technical losses. [less ▲]

Detailed reference viewed: 199 (37 UL)
Full Text
Peer Reviewed
See detailChainGuard - A Firewall for Blockchain Applications using SDN with OpenFlow
Steichen, Mathis UL; Hommes, Stefan UL; State, Radu UL

in ChainGuard - A Firewall for Blockchain Applications using SDN with OpenFlow (2017)

Recently, blockchains have been gathering a lot of interest. Many applications can benefit from the advantages of blockchains. Nevertheless, applications with more restricted privacy or participation ... [more ▼]

Recently, blockchains have been gathering a lot of interest. Many applications can benefit from the advantages of blockchains. Nevertheless, applications with more restricted privacy or participation requirements cannot rely on public blockchains. First, the whole blockchain can be downloaded at any time, thus making the data available to the public. Second, anyone can deploy a node, join the blockchain network and take part in the consensus building process. Private and consortium blockchains promise to combine the advantages of blockchains with stricter requirements on the participating entities. This is also the reason for the comparably small number of nodes that store and extend those blockchains. However, by targeting specific nodes, an attacker can influence how consensuses are reached and possibly even halt the blockchain operation. To provide additional security to the blockchain nodes, ChainGuard utilizes SDN functionalities to filter network traffic, thus implementing a firewall for blockchain applications. ChainGuard communicates with the blockchain nodes it guards to determine which origin of the traffic is legitimate. Packets from illegitimate sources are intercepted and thus cannot have an effect on the blockchain. As is shown with experiments, ChainGuard provides access control functionality and can effectively mitigate flooding attacks from several sources at once. [less ▲]

Detailed reference viewed: 105 (10 UL)
Full Text
Peer Reviewed
See detailConfirmation Delay Prediction of Transactions in the Bitcoin Network
Fiz Pontiveros, Beltran UL; Hommes, Stefan UL; State, Radu UL

in Advances in Computer Science and Ubiquitous Computing (2017)

Bitcoin is currently the most popular digital currency. It operates on a decentralised peer-to-peer network using an open source cryptographic protocol. In this work, we create a model of the selection ... [more ▼]

Bitcoin is currently the most popular digital currency. It operates on a decentralised peer-to-peer network using an open source cryptographic protocol. In this work, we create a model of the selection process performed by mining pools on the set of unconfirmed transactions and then attempt to predict if an unconfirmed transaction will be part of the next block by treating it as a supervised classification problem. We identified a vector of features obtained through service monitoring of the Bitcoin transaction network and performed our experiments on a publicly available dataset of Bitcoin transaction. [less ▲]

Detailed reference viewed: 127 (8 UL)
Full Text
Peer Reviewed
See detailThe Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study
Glauner, Patrick UL; Du, Manxing UL; Paraschiv, Victor et al

in Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017) (2017)

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we ... [more ▼]

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research. [less ▲]

Detailed reference viewed: 178 (24 UL)
Full Text
Peer Reviewed
See detailBotGM: Unsupervised Graph Mining to Detect Botnets in Traffic Flows
Lagraa, Sofiane UL; François, Jérôme; Lahmadi, Abdelkader et al

in CSNet 2017 Conference Proceedings (2017)

Botnets are one of the most dangerous and serious cybersecurity threats since they are a major vector of large-scale attack campaigns such as phishing, distributed denial-of-service (DDoS) attacks ... [more ▼]

Botnets are one of the most dangerous and serious cybersecurity threats since they are a major vector of large-scale attack campaigns such as phishing, distributed denial-of-service (DDoS) attacks, trojans, spams, etc. A large body of research has been accomplished on botnet detection, but recent security incidents show that there are still several challenges remaining to be addressed, such as the ability to develop detectors which can cope with new types of botnets. In this paper, we propose BotGM, a new approach to detect botnet activities based on behavioral analysis of network traffic flow. BotGM identifies network traffic behavior using graph-based mining techniques to detect botnets behaviors and model the dependencies among flows to traceback the root causes then. We applied BotGM on a publicly available large dataset of Botnet network flows, where it detects various botnet behaviors with a high accuracy without any prior knowledge of them. [less ▲]

Detailed reference viewed: 100 (2 UL)
Full Text
Peer Reviewed
See detailDeep Learning on Big Data Sets in the Cloud with Apache Spark and Google TensorFlow
Glauner, Patrick UL; State, Radu UL

Scientific Conference (2016, December 09)

Machine learning is the branch of artificial intelligence giving computers the ability to learn patterns from data without being explicitly programmed. Deep Learning is a set of cutting-edge machine ... [more ▼]

Machine learning is the branch of artificial intelligence giving computers the ability to learn patterns from data without being explicitly programmed. Deep Learning is a set of cutting-edge machine learning algorithms that are inspired by how the human brain works. It allows to selflearn feature hierarchies from the data rather than modeling hand-crafted features. It has proven to significantly improve performance in challenging data analytics problems. In this tutorial, we will first provide an introduction to the theoretical foundations of neural networks and Deep Learning. Second, we will demonstrate how to use Deep Learning in a cloud using a distributed environment for Big Data analytics. This combines Apache Spark and TensorFlow, Google’s in-house Deep Learning platform made for Big Data machine learning applications. Practical demonstrations will include character recognition and time series forecasting in Big Data sets. Attendees will be provided with code snippets that they can easily amend in order to analyze their own data. A related, but shorter tutorial focusing on Deep Learning on a single computer was given at the Data Science Luxembourg Meetup in April 2016. It was attended by 70 people making it the most attended event of this Meetup series in Luxembourg ever since its beginning. [less ▲]

Detailed reference viewed: 443 (6 UL)
Full Text
Peer Reviewed
See detailInterpreting Finite Automata for Sequential Data
Hammerschmidt, Christian UL; Verwer, S.; Lin, Q. et al

in Interpretable Machine Learning for Complex Systems: NIPS 2016 workshop proceedings (2016)

Detailed reference viewed: 152 (26 UL)
Full Text
Peer Reviewed
See detailBehavior Profiling for Mobile Advertising
Du, Manxing UL; State, Radu UL; Brorsson, Mats et al

in Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (2016, December)

Detailed reference viewed: 162 (19 UL)
Full Text
Peer Reviewed
See detailEfficient Learning of Communication Profiles from IP Flow Records
Hammerschmidt, Christian UL; Marchal, Samuel; Pellegrino, Gaetano et al

Poster (2016, November)

The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information ... [more ▼]

The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information often comes with using simpler models on aggregated data (e.g. IP flow records) due to time and space constraints. A step towards utilizing IP flow records more effectively are stream learning techniques. We propose a method to collect a limited yet relevant amount of data in order to learn a class of complex models, finite state machines, in real-time. These machines are used as communication profiles to fingerprint, identify or classify hosts and services and offer high detection rates while requiring less training data and thus being faster to compute than simple models. [less ▲]

Detailed reference viewed: 222 (7 UL)
Full Text
Peer Reviewed
See detailLoad Forecasting with Artificial Intelligence on Big Data
Glauner, Patrick UL; State, Radu UL

Scientific Conference (2016, October 09)

In the domain of electrical power grids, there is a particular interest in time series analysis using artificial intelligence. Machine learning is the branch of artificial intelligence giving computers ... [more ▼]

In the domain of electrical power grids, there is a particular interest in time series analysis using artificial intelligence. Machine learning is the branch of artificial intelligence giving computers the ability to learn patterns from data without being explicitly programmed. Deep Learning is a set of cutting-edge machine learning algorithms that are inspired by how the human brain works. It allows to self-learn feature hierarchies from the data rather than modeling hand-crafted features. It has proven to significantly improve performance in challenging signal processing problems. In this tutorial, we will first provide an introduction to the theoretical foundations of neural networks and Deep Learning. Second, we will demonstrate how to use Deep Learning for load forecasting with TensorFlow, Google’s in-house Deep Learning platform made for Big Data machine learning applications. The advantage of Deep Learning is that the results can easily be applied to other problems, such as detection of nontechnical losses. Attendees will be provided with code snippets that they can easily amend in order to perform analyses on their own time series. [less ▲]

Detailed reference viewed: 198 (6 UL)
Full Text
Peer Reviewed
See detailBehavioral Clustering of Non-Stationary IP Flow Record Data
Hammerschmidt, Christian UL; Marchal, Samuel; State, Radu UL et al

Poster (2016, October)

Detailed reference viewed: 142 (5 UL)
Full Text
Peer Reviewed
See detailFlexible State-Merging for learning (P)DFAs in Python
Hammerschmidt, Christian UL; Loos, Benjamin Laurent UL; Verwer, Sicco et al

Scientific Conference (2016, October)

We present a Python package for learning (non-)probabilistic deterministic finite state automata and provide heuristics in the red-blue framework. As our package is built along the API of the popular ... [more ▼]

We present a Python package for learning (non-)probabilistic deterministic finite state automata and provide heuristics in the red-blue framework. As our package is built along the API of the popular \texttt{scikit-learn} package, it is easy to use and new learning methods are easy to add. It provides PDFA learning as an additional tool for sequence prediction or classification to data scientists, without the need to understand the algorithm itself but rather the limitations of PDFA as a model. With applications of automata learning in diverse fields such as network traffic analysis, software engineering and biology, a stratified package opens opportunities for practitioners. [less ▲]

Detailed reference viewed: 129 (10 UL)
Full Text
Peer Reviewed
See detailCompiling packet forwarding rules for switch pipelined architecture
Hamadi, Salaheddine; Blaiech, Khalil; Valtchev, Petko UL et al

in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications (2016, July 26)

Detailed reference viewed: 169 (3 UL)
Full Text
Peer Reviewed
See detailNDN.p4: Programming Information-Centric data-planes
Signorello, Salvatore UL; State, Radu UL; François, Jérôme et al

in Proceedings of the IEEE International Workshop on Open-Source Software Networking at NetSoft2016 (2016)

Detailed reference viewed: 238 (9 UL)
Full Text
See detailDeep Learning Concepts from Theory to Practice
Glauner, Patrick UL; State, Radu UL

Scientific Conference (2016, January 19)

Detailed reference viewed: 147 (10 UL)
Full Text
Peer Reviewed
See detailNeighborhood Features Help Detecting Non-Technical Losses in Big Data Sets
Glauner, Patrick UL; Meira, Jorge Augusto UL; Dolberg, Lautaro et al

in Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016) (2016)

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non ... [more ▼]

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions. [less ▲]

Detailed reference viewed: 141 (11 UL)
Full Text
Peer Reviewed
See detailLarge-Scale Detection of Non-Technical Losses in Imbalanced Data Sets
Glauner, Patrick UL; Boechat, Andre; Dolberg, Lautaro et al

in Proceedings of the Seventh IEEE Conference on Innovative Smart Grid Technologies (ISGT 2016) (2016)

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires ... [more ▼]

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets. [less ▲]

Detailed reference viewed: 134 (9 UL)
Full Text
Peer Reviewed
See detailExploring IoT Protocols Through the Information-Centric Networking's Lens
Signorello, Salvatore UL; State, Radu UL; Festor, Olivier

in Intelligent Mechanisms for Network Configuration and Security (2015, June)

Detailed reference viewed: 115 (2 UL)
Full Text
Peer Reviewed
See detailEmpirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Empirical Software Engineering (2014)

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective ... [more ▼]

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild. To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications’ CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed—F-measures dropping from over 0.9 in the lab to below 0.1 in the wild —raises one important question: How do state-of-the-art approaches perform in the wild ? [less ▲]

Detailed reference viewed: 469 (45 UL)