References of "Boytsov, Andrey"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailTowards Refined Classifications Driven by SHAP Explanations
Arslan, Yusuf UL; Lebichot, Bertrand UL; Allix, Kevin UL et al

in Holzinger, Andreas; Kieseberg, Peter; Tjoa, A. Min (Eds.) et al Machine Learning and Knowledge Extraction (2022, August 11)

Machine Learning (ML) models are inherently approximate; as a result, the predictions of an ML model can be wrong. In applications where errors can jeopardize a company's reputation, human experts often ... [more ▼]

Machine Learning (ML) models are inherently approximate; as a result, the predictions of an ML model can be wrong. In applications where errors can jeopardize a company's reputation, human experts often have to manually check the alarms raised by the ML models by hand, as wrong or delayed decisions can have a significant business impact. These experts often use interpretable ML tools for the verification of predictions. However, post-prediction verification is also costly. In this paper, we hypothesize that the outputs of interpretable ML tools, such as SHAP explanations, can be exploited by machine learning techniques to improve classifier performance. By doing so, the cost of the post-prediction analysis can be reduced. To confirm our intuition, we conduct several experiments where we use SHAP explanations directly as new features. In particular, by considering nine datasets, we first compare the performance of these "SHAP features" against traditional "base features" on binary classification tasks. Then, we add a second-step classifier relying on SHAP features, with the goal of reducing false-positive and false-negative results of typical classifiers. We show that SHAP explanations used as SHAP features can help to improve classification performance, especially for false-negative reduction. [less ▲]

Detailed reference viewed: 30 (3 UL)
Full Text
Peer Reviewed
See detailLuxemBERT: Simple and Practical Data Augmentation in Language Model Pre-Training for Luxembourgish
Lothritz, Cedric UL; Lebichot, Bertrand UL; Allix, Kevin UL et al

in Proceedings of the Language Resources and Evaluation Conference, 2022 (2022, June)

Pre-trained Language Models such as BERT have become ubiquitous in NLP where they have achieved state-of-the-art performance in most NLP tasks. While these models are readily available for English and ... [more ▼]

Pre-trained Language Models such as BERT have become ubiquitous in NLP where they have achieved state-of-the-art performance in most NLP tasks. While these models are readily available for English and other widely spoken languages, they remain scarce for low-resource languages such as Luxembourgish. In this paper, we present LuxemBERT, a BERT model for the Luxembourgish language that we create using the following approach: we augment the pre-training dataset by considering text data from a closely related language that we partially translate using a simple and straightforward method. We are then able to produce the LuxemBERT model, which we show to be effective for various NLP tasks: it outperforms a simple baseline built with the available Luxembourgish text data as well the multilingual mBERT model, which is currently the only option for transformer-based language models in Luxembourgish. Furthermore, we present datasets for various downstream NLP tasks that we created for this study and will make available to researchers on request. [less ▲]

Detailed reference viewed: 227 (39 UL)
Full Text
Peer Reviewed
See detailExploiting Prototypical Explanations for Undersampling Imbalanced Datasets
Arslan, Yusuf UL; Allix, Kevin UL; Lefebvre, Clément et al

in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (2022)

Among the reported solutions to the class imbalance issue, the undersampling approaches, which remove instances of insignificant samples from the majority class, are quite prevalent. However, the ... [more ▼]

Among the reported solutions to the class imbalance issue, the undersampling approaches, which remove instances of insignificant samples from the majority class, are quite prevalent. However, the undersampling approaches may discard significant patterns in the datasets. A prototype, which is always an actual sample from the data, represents a group of samples in the dataset. Our hypothesis is that prototypes can fill the missing significant patterns that are discarded by undersampling methods and help to improve model performance. To confirm our intuition, we articulate prototypes to undersampling methods in the machine learning pipeline. We show that there is a statistically significant difference between the AUPR and AUROC results of undersampling methods and our approach. [less ▲]

Detailed reference viewed: 9 (1 UL)