![]() Tawakuli, Amal ![]() ![]() ![]() in The Fifth International Workshop on Data: Acquisition To Analysis (2022) Data preprocessing is an integral part of Artificial Intelligence (AI) pipelines. It transforms raw data into input data that fulfill algorithmic criteria and improve prediction accuracy. As the adoption ... [more ▼] Data preprocessing is an integral part of Artificial Intelligence (AI) pipelines. It transforms raw data into input data that fulfill algorithmic criteria and improve prediction accuracy. As the adoption of Internet of Things (IoT) gains more momentum, the data volume generated from the edge is exponentially increasing that far exceeds any expansion of infrastructure. Social responsibilities and regulations (e.g., GDPR) must also be adhered when handling IoT data. In addition, we are currently witnessing a shift towards distributing AI to the edge. The aforementioned reasons render the distribution of data preprocessing to the edge an urgent requirement. In this paper, we introduce a modern data preprocessing framework that consists of two main parts. Part1 is a design tool that reduces the complexity and costs of the data preprocessing phase for AI via generalization and normalization. The design tool is a standard template that maps specific techniques into abstract categories and highlights dependencies between them. In addition, it presents a holistic notion of data preprocessing that is not limited to data cleaning. The second part is an IoT tool that adopts the edge-cloud collaboration model to progressively improve the quality of the data. It includes a synchronization mechanism that ensures adaptation to changes in data characteristics and a coordination mechanism that ensures correct and complete execution of preprocessing plans between the cloud and the edge. The paper includes an empirical analysis of the framework using a developed prototype and an automotive use-case. Our results demonstrate reductions in resource consumption (e.g., energy, bandwidth) while maintaining the value and integrity of the data. [less ▲] Detailed reference viewed: 51 (1 UL)![]() Tawakuli, Amal ![]() ![]() ![]() in 2020 IEEE International Conference on Big Data (2021, March 19) Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive ... [more ▼] Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive, resource consuming and complex phase often performed centrally on raw data for a specific application. The dataflow between the edge and the cloud can be enhanced in terms of efficiency, reliability and lineage by preprocessing the datasets closer to their data sources. We propose a dedicated data preprocessing framework that distributes preprocessing tasks between a cloud stage and two edge stages to create a dataflow with progressively improving quality. The framework handles heterogenous data and dynamic preprocessing plans simultaneously targeting diverse applications and use cases from different domains. Each stage autonomously executes sensor specific preprocessing plans in parallel while synchronizing the progressive execution and dynamic updates of the preprocessing plans with the other stages. Our approach minimizes the workload on central infrastructures and reduces the resources used for transferring raw data from the edge. We also demonstrate that preprocessing data can be sensor specific rather than application specific and thus can be performed prior to knowing a specific application. [less ▲] Detailed reference viewed: 107 (11 UL)![]() de La Cadena Ramos, Augusto Wladimir ![]() ![]() ![]() in 19th IEEE International Symposium on Network Computing and Applications (IEEE NCA 2020) (2020, November 25) Detailed reference viewed: 125 (4 UL)![]() ; Kaiser, Daniel ![]() in Proceedings of ICPS ICCNS 2020 (2020, November) Distributed Hash Table (DHT) protocols, such as Kademlia, provide a decentralized key-value lookup which is nowadays integrated into a wide variety of applications, such as Ethereum, InterPlanetary File ... [more ▼] Distributed Hash Table (DHT) protocols, such as Kademlia, provide a decentralized key-value lookup which is nowadays integrated into a wide variety of applications, such as Ethereum, InterPlanetary File System (IPFS), and BitTorrent. However, many security issues in DHT protocols have not been solved yet. DHT networks are typically evaluated using mathematical models or simulations, often abstracting away from artefacts that can be relevant for security and/or performance. Experiments capturing these artefacts are typically run with too few nodes. In this paper, we provide Locust, a novel highly concurrent DHT experimentation framework written in Elixir, which is designed for security evaluations. This framework allows running experiments with a full DHT implementation and around 4,000 nodes on a single machine including an adjustable churn rate; thus yielding a favourable trade-off between the number of analysed nodes and being realistic. We evaluate our framework in terms of memory consumption, processing power, and network traffic. [less ▲] Detailed reference viewed: 124 (6 UL)![]() Tawakuli, Amal ![]() ![]() ![]() Poster (2019, October 08) The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality ... [more ▼] The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality datasets before being used for production and business processes. Data preprocessing tasks are typically expensive, tightly coupled with their intended AI algorithms and are done manually by domain experts. Hence there is a need to automate data preprocessing to seamlessly generate cleaner data. We intend to introduce a generic data preprocessing framework that handles vehicle-to-everything (V2X) data streams and dynamic updates. We intend to decentralize and automate data preprocessing by leveraging edge computing with the objective of progressively improving the quality of the dataflow within edge components (vehicles) and onto the cloud. [less ▲] Detailed reference viewed: 237 (9 UL)![]() de La Cadena Ramos, Augusto Wladimir ![]() ![]() ![]() in Data and Applications Security and Privacy XXXIII, 2019 (2019, July 15) Detailed reference viewed: 232 (18 UL) |
||