![]() Carvalho Ota, Fernando Kaway ![]() ![]() ![]() in Carvalho Ota, Fernando Kaway; Meira, Jorge Augusto; Frank, Raphaël (Eds.) et al 2020 Mediterranean Communication and Computer Networking Conference, Arona 17-19 June 2020 (2020, September 10) The number of smartphone users recently surpassed the numbers of desktop users on Internet, and opened up countless development challenges and business opportunities. Not only the fact that the majority ... [more ▼] The number of smartphone users recently surpassed the numbers of desktop users on Internet, and opened up countless development challenges and business opportunities. Not only the fact that the majority of users are connected using their smartphones, but the number of Internet users in general has popularized the massive use of data-driven applications. In this context, the concept of super apps seems to be the next game-changer for the mobile apps industry, and the challenges related to security and privacy are key aspects for keeping user data safe. Thus, by combining different components for provisioning, authentication, membership and others, we propose a novel framework that enables the creation of a super app using privacy by design principles. [less ▲] Detailed reference viewed: 161 (9 UL)![]() Carvalho Ota, Fernando Kaway ![]() ![]() ![]() in 2019 IEEE International Symposium on Software Reliability Engineering Workshops (2020, February 13) The current challenge for several applications is to guarantee the user’s privacy when using personal data. The broader problem is to transfer and process the data without exposing the sensitive content ... [more ▼] The current challenge for several applications is to guarantee the user’s privacy when using personal data. The broader problem is to transfer and process the data without exposing the sensitive content to anyone, including the service provider(s). In this paper, we address this challenge by proposing a protocol to combine secure frameworks in order to exchange and process sensitive data, i.e. respecting user’s privacy. Our contribution is a protocol to perform a secure exchange of data between a mobile application and a trusted execution environment. In our experiments we show independent implementations of our protocol using three different encryption modes (i.e., CBC, ECB, GCM encryption). Our results support the feasibility and importance of an end-to-end secure channel protocol. [less ▲] Detailed reference viewed: 110 (6 UL)![]() ; ; et al in Database Processing-in-Memory: A Vision (2019, August 03) Detailed reference viewed: 122 (2 UL)![]() ; Falk, Eric ![]() ![]() in Journal of Information and Data Management (2018) Detailed reference viewed: 120 (1 UL)![]() Glauner, Patrick ![]() ![]() ![]() Scientific Conference (2018) The field of Machine Learning grew out of the quest for artificial intelligence. It gives computers the ability to learn statistical patterns from data without being explicitly programmed. These patterns ... [more ▼] The field of Machine Learning grew out of the quest for artificial intelligence. It gives computers the ability to learn statistical patterns from data without being explicitly programmed. These patterns can then be applied to new data in order to make predictions. Machine Learning also allows to automatically adapt to changes in the data without amending the underlying model. We deal every day dozens of times with Machine Learning applications such as when doing a Google search, using spam filters, face detection, speaking to voice recognition software or when sitting in a self-driving car. In recent years, machine learning methods have evolved in the smart grid community. This change towards analyzing data rather than modeling specific problems has lead to adaptable, more generic methods, that require less expert knowledge and that are easier to deploy in a number of use cases. This is an introductory level course to discuss what machine learning is and how to apply it to data-driven smart grid applications. Practical case studies on real data sets, such as load forecasting, detection of irregular power usage and visualization of customer data, will be included. Therefore, attendees will not only understand, but rather experience, how to apply machine learning methods to smart grid data. [less ▲] Detailed reference viewed: 678 (11 UL)![]() Glauner, Patrick ![]() ![]() ![]() Scientific Conference (2018) Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in ... [more ▼] Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in order to record lower consumptions, bypassing meters by rigging lines from the power source, arranged false meter readings by bribing meter readers, faulty or broken meters, un-metered supply, technical and human errors in meter readings, data processing and billing. NTLs are also reported to range up to 40% of the total electricity distributed in countries such as India, Pakistan, Malaysia, Brazil or Lebanon. This is an introductory level course to discuss how to predict if a customer causes a NTL. In the last years, employing data analytics methods such as machine learning and data mining have evolved as the primary direction to solve this problem. This course will present and compare different approaches reported in the literature. Practical case studies on real data sets will be included. As an additional outcome, attendees will understand the open challenges of NTL detection and learn how these challenges could be solved in the coming years. [less ▲] Detailed reference viewed: 213 (6 UL)![]() ; ; Meira, Jorge Augusto ![]() in International Conference on Data Engineering (ICDE) (2018) During the parallel execution of queries in Non-Uniform Memory Access (NUMA) systems, he Operating System (OS) maps the threads (or processes) from modern database systems to the available cores among the ... [more ▼] During the parallel execution of queries in Non-Uniform Memory Access (NUMA) systems, he Operating System (OS) maps the threads (or processes) from modern database systems to the available cores among the NUMA nodes using the standard node-local policy. However, such non-smart mapping may result in inefficient memory activity, because shared data may be accessed by scattered threads requiring large data movements or non-shared data may be allocated to threads sharing the same cache memory, increasing its conflicts. In this paper we present a data-distribution aware and elastic multi-core allocation mechanism to improve the OS mapping of database threads in NUMA systems. Our hypothesis is that we mitigate the data movement if we only hand out to the OS the local optimum number of cores in specific nodes. We propose a mechanism based on a rule-condition-action pipeline that uses hardware counters to promptly find out the local optimum number of cores. Our mechanism uses a priority queue to track the history of the memory address space used by database threads in order to decide about the allocation/release of cores and its distribution among the NUMA nodes to decrease remote memory access. We implemented and tested a prototype of our mechanism when executing two popular Volcano-style databases improving their NUMA-affinity. For MonetDB, we show maximum speedup of 1.53 × , due to consistent reduction in the local/remote per-query data traffic ratio of up to 3.87 × running 256 concurrent clients in the 1 GB TPC-H database also showing system energy savings of 26.05%. For the NUMA-aware SQL Server, we observed speedup of up to 1.27 × and reduction on the data traffic ratio of 3.70 ×. [less ▲] Detailed reference viewed: 162 (7 UL)![]() Glauner, Patrick ![]() ![]() ![]() Scientific Conference (2017, September) Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in ... [more ▼] Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in order to record lower consumptions, bypassing meters by rigging lines from the power source, arranged false meter readings by bribing meter readers, faulty or broken meters, un-metered supply, technical and human errors in meter readings, data processing and billing. NTLs are also reported to range up to 40% of the total electricity distributed in countries such as Brazil, India, Malaysia or Lebanon. This is an introductory level course to discuss how to predict if a customer causes a NTL. In the last years, employing data analytics methods such as data mining and machine learning have evolved as the primary direction to solve this problem. This course will compare and contrast different approaches reported in the literature. Practical case studies on real data sets will be included. Therefore, attendees will not only understand, but rather experience the challenges of NTL detection and learn how these challenges could be solved in the coming years. [less ▲] Detailed reference viewed: 288 (16 UL)![]() Glauner, Patrick ![]() ![]() ![]() in Proceedings of the 19th International Conference on Intelligent System Applications to Power Systems (ISAP 2017) (2017, September) Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to ... [more ▼] Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to 40% of the total electricity distributed. In order to detect NTLs, machine learning methods are used that learn irregular consumption patterns from customer data and inspection results. The Big Data paradigm followed in modern machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. However, the sample of inspected customers may be biased, i.e. it does not represent the population of all customers. As a consequence, machine learning models trained on these inspection results are biased as well and therefore lead to unreliable predictions of whether customers cause NTL or not. In machine learning, this issue is called covariate shift and has not been addressed in the literature on NTL detection yet. In this work, we present a novel framework for quantifying and visualizing covariate shift. We apply it to a commercial data set from Brazil that consists of 3.6M customers and 820K inspection results. We show that some features have a stronger covariate shift than others, making predictions less reliable. In particular, previous inspections were focused on certain neighborhoods or customer classes and that they were not sufficiently spread among the population of customers. This framework is about to be deployed in a commercial product for NTL detection. [less ▲] Detailed reference viewed: 174 (14 UL)![]() ; ; Meira, Jorge Augusto ![]() Poster (2017, May 15) Detailed reference viewed: 133 (6 UL)![]() Meira, Jorge Augusto ![]() ![]() ![]() in Power and Energy Conference, Illinois 23-24 February 2017 (2017) Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in ... [more ▼] Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in emerging countries. Developed countries, even though usually in smaller amounts, have to deal with NTL issues as well. In these countries the estimated annual losses are up to six billion USD. These facts have directed the focus of our work to the NTL detection. Our approach is composed of two steps: 1) We compute several features and combine them in sets characterized by four criteria: temporal, locality, similarity and infrastructure. 2) We then use the sets of features to train three machine learning classifiers: random forest, logistic regression and support vector vachine. Our hypothesis is that features derived only from provider-independent data are adequate for an accurate detection of non-technical losses. [less ▲] Detailed reference viewed: 246 (39 UL)![]() Glauner, Patrick ![]() ![]() in Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017) (2017) Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we ... [more ▼] Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research. [less ▲] Detailed reference viewed: 209 (25 UL)![]() Glauner, Patrick ![]() ![]() ![]() in International Journal of Computational Intelligence Systems (2017), 10(1), 760-775 Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science ... [more ▼] Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future. [less ▲] Detailed reference viewed: 321 (15 UL)![]() Glauner, Patrick ![]() in Proceedings of the 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017) (2017) Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging ... [more ▼] Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant to move to large-scale deployments of automated systems that learn NTL profiles from data due to the latter's propensity to suggest a large number of unnecessary inspections. In this paper, we propose a novel system that combines automated statistical decision making with expert knowledge. First, we propose a machine learning framework that classifies customers into NTL or non-NTL using a variety of features derived from the customers' consumption data. The methodology used is specifically tailored to the level of noise in the data. Second, in order to allow human experts to feed their knowledge in the decision loop, we propose a method for visualizing prediction results at various granularity levels in a spatial hologram. Our approach allows domain experts to put the classification results into the context of the data and to incorporate their knowledge for making the final decisions of which customers to inspect. This work has resulted in appreciable results on a real-world data set of 3.6M customers. Our system is being deployed in a commercial NTL detection software. [less ▲] Detailed reference viewed: 206 (26 UL)![]() Glauner, Patrick ![]() ![]() in Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016) (2016) Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non ... [more ▼] Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions. [less ▲] Detailed reference viewed: 184 (11 UL)![]() Meira, Jorge Augusto ![]() ![]() in International Conference on Database and Expert Systems Applications, Porto 5-8 September 2016 (2016) Detailed reference viewed: 230 (3 UL)![]() El Kateb, Donia ![]() ![]() ![]() Scientific Conference (2014, March) Detailed reference viewed: 356 (70 UL)![]() Meira, Jorge Augusto ![]() ![]() Poster (2014) Over the last decade, large amounts of concurrent transactions have been generated from different sources, such as, Internet-based systems, mobile applications, smart- homes and cars. High-throughput ... [more ▼] Over the last decade, large amounts of concurrent transactions have been generated from different sources, such as, Internet-based systems, mobile applications, smart- homes and cars. High-throughput transaction processing is becoming commonplace, however there is no testing technique for validating non functional aspects of DBMS under transaction flooding workloads. In this paper we propose a database state machine to represent the states of DBMS when processing concurrent trans- actions. The state transitions are forced by increasing concurrency of the testing workload. Preliminary results show the effectiveness of our approach to drive the system among different performance states and to find related defects. [less ▲] Detailed reference viewed: 193 (2 UL)![]() Meira, Jorge Augusto ![]() Doctoral thesis (2014) Database Management Systems (DBMS) have been successful at processing transaction workloads over decades. But contemporary systems, including Cloud computing, Internet-based systems, and sensors (i.e ... [more ▼] Database Management Systems (DBMS) have been successful at processing transaction workloads over decades. But contemporary systems, including Cloud computing, Internet-based systems, and sensors (i.e., Internet of Things (IoT)), are challenging the architecture of the DBMS with burgeoning transaction workloads. The direct consequence is that the development agenda of the DBMS is now heavily concerned about meeting non-functional requirements, such as performance, robustness and scalability. Otherwise, any stressing workload will make the DBMS lose control of simple functional requirements, such as responding to a transaction request~\cite{stem}. While traditional DBMS, including DB2, Oracle, and PostgreSQL, require embedding new features to meet non-functional requirements, the contemporary DBMS called as NewSQL present a completely new architecture. What is still lacking in the development agenda is a proper testing approach coupled with burgeoning transaction workloads for validating the DBMS with non-functional requirements in mind. The typical non-functional validation is carried out by performance benchmarks. However, they focus on metrics comparison instead of finding defects. In this thesis, we address this lack by presenting different contributions for the domain of DBMS stress testing. These contributions fit different testing objectives to challenge each specific architecture of traditional and contemporary DBMS. For instance, testing the earlier DBMS (e.g., DB2, Oracle) require incremental performance tuning (i.e., from simple setup to complex one), while testing the latter DBMS (e.g., VoltDB, NuoDB) require driving it into different performance states due to its self-tuning capabilities. Overall, this thesis makes the following contributions: 1) Stress TEsting Methodology (STEM): A methodology to capture performance degradation and expose system defects in the internal code due to the combination of a stress workload and mistuning; 2) Model-based Database Stress Testing (MoDaST): An approach to test NewSQL database systems. Supported by a Database State Machine (DSM), MoDaST infers internal states of the database based on performance observations under different workload levels; 3) Under Pressure Benchmark (UPB): A benchmark to assess the impact of availability mechanisms in NewSQL database systems. We validate our contributions with several popular DBMS. Among the outcomes, we highlight that our methodologies succeed in driving the DBMS up to stress state conditions and expose several related defects, including a new major defect in a popular NewSQL. [less ▲] Detailed reference viewed: 137 (20 UL)![]() Meira, Jorge Augusto ![]() in Journal of Information and Data Management (2013) Transactional database management systems (DBMS) have been successful at supporting traditional transaction processing workloads. However, web-based applications that tend to generate huge numbers of ... [more ▼] Transactional database management systems (DBMS) have been successful at supporting traditional transaction processing workloads. However, web-based applications that tend to generate huge numbers of concurrent business operations are pushing DBMS performance over their limits, thus threatening overall system availability. Then, a crucial question is how to test DBMS performance under heavy workload conditions. Answering this question requires a testing methodology to set up non-biased conditions for pushing a particular DBMS over its normal performance limits (i.e., to stress it). In this article, we present a stress testing methodology for DBMS to search for defects in supporting very heavy workloads. Our methodology leverages distributed testing techniques and takes into account the various biases that may affect the test results. It progressively increases the workload along with several tuning steps up to a stress condition. We validate our methodology with empirical studies on two popular DBMS (one proprietary, one open-source) and detail the defects that have been found. [less ▲] Detailed reference viewed: 176 (14 UL) |
||