References of "Tawakuli, Amal 50027125"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailSynchronized Preprocessing of Sensor Data
Tawakuli, Amal UL; Kaiser, Daniel UL; Engel, Thomas UL

in 2020 IEEE International Conference on Big Data (2021, March 19)

Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive ... [more ▼]

Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive, resource consuming and complex phase often performed centrally on raw data for a specific application. The dataflow between the edge and the cloud can be enhanced in terms of efficiency, reliability and lineage by preprocessing the datasets closer to their data sources. We propose a dedicated data preprocessing framework that distributes preprocessing tasks between a cloud stage and two edge stages to create a dataflow with progressively improving quality. The framework handles heterogenous data and dynamic preprocessing plans simultaneously targeting diverse applications and use cases from different domains. Each stage autonomously executes sensor specific preprocessing plans in parallel while synchronizing the progressive execution and dynamic updates of the preprocessing plans with the other stages. Our approach minimizes the workload on central infrastructures and reduces the resources used for transferring raw data from the edge. We also demonstrate that preprocessing data can be sensor specific rather than application specific and thus can be performed prior to knowing a specific application. [less ▲]

Detailed reference viewed: 47 (3 UL)
Full Text
Peer Reviewed
See detailAIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL; Chaychi, Samira UL et al

in AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing (2020, September 01)

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow ... [more ▼]

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as “Asynchronous Iterative Routing”), which facilitates a direct and asynchronous communication among all worker nodes and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR fills the gap between the prevalent scale-out (but Java-based) architectures like Apache Spark and Flink, on one hand, and recent scale-up (and C++ based) prototypes such as StreamBox and PiCo, on the other hand. Our experiments over various benchmark settings confirm that AIR performs as good as the best scale-up SPEs on a single-node setup, while it outperforms existing scale-out DSPEs in terms of processing latency and sustainable throughput by a factor of up to 15 in a distributed setting. [less ▲]

Detailed reference viewed: 53 (5 UL)
Full Text
Peer Reviewed
See detailBig Automotive Data Preprocessing: A Three Stages Approach
Tawakuli, Amal UL; Kaiser, Daniel UL; Engel, Thomas UL

Poster (2019, October 08)

The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality ... [more ▼]

The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality datasets before being used for production and business processes. Data preprocessing tasks are typically expensive, tightly coupled with their intended AI algorithms and are done manually by domain experts. Hence there is a need to automate data preprocessing to seamlessly generate cleaner data. We intend to introduce a generic data preprocessing framework that handles vehicle-to-everything (V2X) data streams and dynamic updates. We intend to decentralize and automate data preprocessing by leveraging edge computing with the objective of progressively improving the quality of the dataflow within edge components (vehicles) and onto the cloud. [less ▲]

Detailed reference viewed: 137 (8 UL)