Doctoral thesis (Dissertations and theses)
Multi-objective Robust Machine Learning For Critical Systems With Scarce Data
GHAMIZI, Salah
2022
 

Files


Full Text
PhD_Thesis_Salah_GHAMIZI-1.pdf
Publisher postprint (12.48 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Multi-objective; Robustness; Machine Learning
Abstract :
[en] With the heavy reliance on Information Technologies in every aspect of our daily lives, Machine Learning (ML) models have become a cornerstone of these technologies’ rapid growth and pervasiveness. In particular, the most critical and fundamental technologies that handle our economic systems, transportation, health, and even privacy. However, while these systems are becoming more effective, their complexity inherently decreases our ability to understand, test, and assess the dependability and trustworthiness of these systems. This problem becomes even more challenging under a multi-objective framework: When the ML model is required to learn multiple tasks together, behave under constrained inputs or fulfill contradicting concomitant objectives. Our dissertation focuses on the context of robust ML under limited training data, i.e., use cases where it is costly to collect additional training data and/or label it. We will study this topic under the prism of three real use cases: Fraud detection, pandemic forecasting, and chest x-ray diagnosis. Each use-case covers one of the challenges of robust ML with limited data, (1) robustness to imperceptible perturbations, or (2) robustness to confounding variables. We provide a study of the challenges for each case and propose novel techniques to achieve robust learning. As the first contribution of this dissertation, we collaborate with BGL BNP Paribas. We demonstrate that their overdraft and fraud detection systems are prima facie robust to adversarial attacks because of the complexity of their feature engineering and domain constraints. However, we show that gray-box attacks that take into account domain knowledge can easily break their defense. We propose, CoEva2 adversarial fine-tuning, a new defense mechanism based on multi-objective evolutionary algorithms to augment the training data and mitigate the system’s vulnerabilities. Next, we investigate how domain knowledge can protect against adversarial attacks through multi-task learning. We show that adding domain constraints in the form of additional tasks can significantly improve the robustness of models to adversarial attacks, particularly for the robot navigation use case. We propose a new set of adaptive attacks and demonstrate that adversarial training combined with such attacks can improve robustness. While the raw data available in the BGL or Robot Navigation is vast, it is heavily cleaned, feature-engineered, and annotated by domain experts (which are expensive), and the end training data is scarce. In contrast, raw data is scarce when dealing with an outbreak, and designing robust ML systems to predict, forecast, and recommend mitigation policies is challenging. In particular, for small countries like Luxembourg. Contrary to common techniques that forecast new cases based on previous data in time series, we propose a novel surrogate-based optimization as an integrated loop. It combines a neural network prediction of the infection rate based on mobility attributes and a model-based simulation that predicts the cases and deaths. Our approach has been used by the Luxembourg government’s task force and has been recognized with a best paper award at KDD2020. Our following work focuses on the challenges that pose cofounding factors to the robustness and generalization of Chest X-ray (CXR) classification. We first investigate the robustness and generalization of multi-task models, then demonstrate that multi-task learning, leveraging the cofounding variables, can significantly improve the generalization and robustness of CXR classification models. Our results suggest that task augmentation with additional knowledge (like extraneous variables) outperforms state-of-art data augmentation techniques in improving test and robust performances. Overall, this dissertation provides insights into the importance of domain knowledge in the robustness and generalization of models. It shows that instead of building data-hungry ML models, particularly for critical systems, a better understanding of the system as a whole and its domain constraints yields improved robustness and generalization performances. This dissertation also proposes theorems, algorithms, and frameworks to effectively assess and improve the robustness of ML systems for real-world cases and applications.
Disciplines :
Computer science
Author, co-author :
GHAMIZI, Salah ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Language :
English
Title :
Multi-objective Robust Machine Learning For Critical Systems With Scarce Data
Defense date :
07 September 2022
Number of pages :
236
Institution :
Unilu - University of Luxembourg, Luxembourg
Degree :
Docteur de l’Université du Luxembourg en Informatique
Promotor :
President :
Jury member :
CORDY, Maxime  
Zhang, Jingfeng
Jean-Philippe, Thiran
Focus Area :
Security, Reliability and Trust
Available on ORBilu :
since 27 September 2022

Statistics


Number of views
175 (17 by Unilu)
Number of downloads
259 (33 by Unilu)

Bibliography


Similar publications



Contact ORBilu