Reference : Machine Learning Techniques for Suspicious Transaction Detection and Analysis
Dissertations and theses : Doctoral thesis
Engineering, computing & technology : Computer science
Computational Sciences
Machine Learning Techniques for Suspicious Transaction Detection and Analysis
Camino, Ramiro Daniel mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > >]
University of Luxembourg, ​Luxembourg, ​​Luxembourg
Docteur de l'Université du Luxembourg en Informatique
State, Radu mailto
Frank, Raphaël mailto
Aouada, Djamila mailto
Fernández Slezak, Diego mailto
Hammerschmidt, Christian mailto
[en] machine learning ; fraud detection ; deep generative models ; anti-money laundering ; ripple ; ethereum
[en] Financial services must monitor their transactions to prevent being used for money laundering and combat the financing of terrorism.
Initially, organizations in charge of fraud regulation were only concerned about financial institutions such as banks.
However, nowadays, the Fintech industry, online businesses, or platforms involving virtual assets can also be affected by similar criminal schemes.
Regardless of the differences between the entities mentioned above, malicious activities affecting them share many common patterns.
This dissertation's first goal is to compile and compare existing studies involving machine learning to detect and analyze suspicious transactions.
The second goal is to synthesize methodologies from the last goal for tackling different use cases in an organized manner.
Finally, the third goal is to assess the applicability of deep generative models for enhancing existing solutions.

In the first part of the thesis, we propose an unsupervised methodology for detecting suspicious transactions applied to two case studies.
One is related to transactions from a money remittance network, and the other is related to a novel payment network based on distributed ledger technologies.
Anomaly detection algorithms are applied to rank user accounts based on recency, frequency, and monetary features.
The results are manually validated by domain experts, confirming known scenarios and finding unexpected new cases.

In the second part, we carry out an analogous analysis employing supervised methods, along with a case study where we classify Ethereum smart contracts into honeypots and non-honeypots.
We take features from the source code, the transaction data, and the funds' flow characterization.
The proposed classification models proved to generalize well to unseen honeypot instances and techniques and allowed us to characterize previously unknown techniques.

In the third part, we analyze the challenges that tabular data brings into the domain of deep generative models, a particular type of data used to represent financial transactions in the previous two parts.
We propose a new model architecture by adapting state-of-the-art methods to output multiple variables from mixed types distributions.
Additionally, we extend the evaluation metrics used in the literature to the multi-output setting, and we show empirically that our approach outperforms the existing methods.

Finally, in the last part, we extend the work from the third part by applying the presented models to enhance classification tasks from the second part, commonly containing a severe class imbalance.
We introduce the multi-input architecture to expand models alongside our previously proposed multi-output architecture.
We compare three techniques to sample from deep generative models defining a transparent and fair large-scale experimental protocol and interesting visual analysis tools.

We showed that general machine learning detection and visualization techniques could help address the fraud detection domain's many challenges.
In particular, deep generative models can add value to the classification task given the imbalanced nature of the fraudulent class, in exchange for implementation and time complexity.
Future and promising applications for deep generative models include missing data imputation and sharing synthetic data or data generators preserving privacy constraints.
Researchers ; Professionals ; Students
FnR ; FNR11614300 > Ramiro Daniel Camino > > Advanced Market Abuse Detection with Big Data > 01/03/2017 > 14/09/2020 > 2017

File(s) associated to this reference

Fulltext file(s):

Open access
PhD_Thesis.pdfAuthor postprint5.15 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.