![]() Tawakuli, Amal ![]() ![]() in 2022 IEEE 96th Vehicular Technology Conference: (VTC2022-Fall) (2022, September) Vehicles have transformed into sophisticated com- puting machines that not only serve the objective of transporta- tion from point A to point B but serve other objectives including improved experience ... [more ▼] Vehicles have transformed into sophisticated com- puting machines that not only serve the objective of transporta- tion from point A to point B but serve other objectives including improved experience, safer journey, automated and more efficient and sustainable transportation. With such sophistication comes complex applications and enormous volumes of data generated from diverse types of vehicle sensors and components. Automotive data is not sedentary but moves from the edge (the vehicle) to the cloud (e.g., infrastructure of the vehicle manufacturers, national highway agencies, insurance companies, etc.). The exponential increase in data volume and variety generated in modern vehicles far exceeds the rate of infrastructure scaling and expansion. To mitigate this challenge, the computational and storage capacities of vehicle components can be leveraged to perform in-vehicle operations on the data to either prepare and transform (prepro- cess) the data or extract information from (process) the data. This paper focuses on distributing data preprocessing to the vehicle and highlights the benefits and impact of the distribution including on the consumption of resources (e.g., energy). [less ▲] Detailed reference viewed: 52 (3 UL)![]() Tawakuli, Amal ![]() ![]() ![]() in The Fifth International Workshop on Data: Acquisition To Analysis (2022) Data preprocessing is an integral part of Artificial Intelligence (AI) pipelines. It transforms raw data into input data that fulfill algorithmic criteria and improve prediction accuracy. As the adoption ... [more ▼] Data preprocessing is an integral part of Artificial Intelligence (AI) pipelines. It transforms raw data into input data that fulfill algorithmic criteria and improve prediction accuracy. As the adoption of Internet of Things (IoT) gains more momentum, the data volume generated from the edge is exponentially increasing that far exceeds any expansion of infrastructure. Social responsibilities and regulations (e.g., GDPR) must also be adhered when handling IoT data. In addition, we are currently witnessing a shift towards distributing AI to the edge. The aforementioned reasons render the distribution of data preprocessing to the edge an urgent requirement. In this paper, we introduce a modern data preprocessing framework that consists of two main parts. Part1 is a design tool that reduces the complexity and costs of the data preprocessing phase for AI via generalization and normalization. The design tool is a standard template that maps specific techniques into abstract categories and highlights dependencies between them. In addition, it presents a holistic notion of data preprocessing that is not limited to data cleaning. The second part is an IoT tool that adopts the edge-cloud collaboration model to progressively improve the quality of the data. It includes a synchronization mechanism that ensures adaptation to changes in data characteristics and a coordination mechanism that ensures correct and complete execution of preprocessing plans between the cloud and the edge. The paper includes an empirical analysis of the framework using a developed prototype and an automotive use-case. Our results demonstrate reductions in resource consumption (e.g., energy, bandwidth) while maintaining the value and integrity of the data. [less ▲] Detailed reference viewed: 48 (1 UL)![]() Tawakuli, Amal ![]() ![]() ![]() in 2020 IEEE International Conference on Big Data (2021, March 19) Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive ... [more ▼] Sensor data whether collected for machine learning, deep learning or other applications must be preprocessed to fit input requirements or improve performance and accuracy. Data preparation is an expensive, resource consuming and complex phase often performed centrally on raw data for a specific application. The dataflow between the edge and the cloud can be enhanced in terms of efficiency, reliability and lineage by preprocessing the datasets closer to their data sources. We propose a dedicated data preprocessing framework that distributes preprocessing tasks between a cloud stage and two edge stages to create a dataflow with progressively improving quality. The framework handles heterogenous data and dynamic preprocessing plans simultaneously targeting diverse applications and use cases from different domains. Each stage autonomously executes sensor specific preprocessing plans in parallel while synchronizing the progressive execution and dynamic updates of the preprocessing plans with the other stages. Our approach minimizes the workload on central infrastructures and reduces the resources used for transferring raw data from the edge. We also demonstrate that preprocessing data can be sensor specific rather than application specific and thus can be performed prior to knowing a specific application. [less ▲] Detailed reference viewed: 103 (11 UL)![]() Ellampallil Venugopal, Vinu ![]() ![]() ![]() in AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing (2020, September 01) Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow ... [more ▼] Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as “Asynchronous Iterative Routing”), which facilitates a direct and asynchronous communication among all worker nodes and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR fills the gap between the prevalent scale-out (but Java-based) architectures like Apache Spark and Flink, on one hand, and recent scale-up (and C++ based) prototypes such as StreamBox and PiCo, on the other hand. Our experiments over various benchmark settings confirm that AIR performs as good as the best scale-up SPEs on a single-node setup, while it outperforms existing scale-out DSPEs in terms of processing latency and sustainable throughput by a factor of up to 15 in a distributed setting. [less ▲] Detailed reference viewed: 99 (13 UL)![]() Tawakuli, Amal ![]() ![]() ![]() Poster (2019, October 08) The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality ... [more ▼] The automotive industry generates large datasets of various formats, uncertainties and frequencies. To exploit Automotive Big Data, the data needs to be connected, fused and preprocessed to quality datasets before being used for production and business processes. Data preprocessing tasks are typically expensive, tightly coupled with their intended AI algorithms and are done manually by domain experts. Hence there is a need to automate data preprocessing to seamlessly generate cleaner data. We intend to introduce a generic data preprocessing framework that handles vehicle-to-everything (V2X) data streams and dynamic updates. We intend to decentralize and automate data preprocessing by leveraging edge computing with the objective of progressively improving the quality of the dataflow within edge components (vehicles) and onto the cloud. [less ▲] Detailed reference viewed: 225 (9 UL) |
||