Reference : Example-Dependent Cost-Sensitive Classification with Applications in Financial Risk M...
Dissertations and theses : Doctoral thesis
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/22367
Example-Dependent Cost-Sensitive Classification with Applications in Financial Risk Modeling and Marketing Analytics
English
Correa Bahnsen, Alejandro mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC) >]
15-Sep-2015
University of Luxembourg, ​​Luxembourg
Docteur en Informatique
142
Ottersten, Björn mailto
Aouada, Djamila mailto
Le Traon, Yves mailto
De Moor, Bart mailto
Bontempi, Gianluca mailto
[en] Cost-Sensitive Classification ; Machine Learning ; Fraud Detection
[en] Several real-world binary classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard binary classification methods do not take these costs into account, and assume a constant cost of misclassification errors. This approach is not realistic in many real-world applications. For example in credit card fraud detection, failing to detect a fraudulent transaction may have an economical impact from a few to thousands of Euros, depending on the particular transaction and card holder. In churn modeling, a model is used for predicting which customers are more likely to abandon a service provider. In this context, failing to identify a profitable or unprofitable churner has a significant different economic result. Similarly, in direct marketing, wrongly predicting that a customer will not accept an offer when in fact he will, may have different financial impact, as not all customers generate the same profit. Lastly, in credit scoring, accepting loans from bad customers does not have the same economical loss, since customers have different credit lines, therefore, different profit. Accordingly, the goal of this thesis is to provide an in-depth analysis of example-dependent cost-sensitive classification. We analyze four real-world classification problems, namely, credit card fraud detection, credit scoring, churn modeling and direct marketing. For each problem, we propose an example-dependent cost-sensitive evaluation measure. We propose four example-dependent cost-sensitive methods; the first method is a cost-sensitive Bayes minimum risk classifier which consists in quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions. Second, we propose a cost-sensitive logistic regression technique. This algorithm is based on a new logistic regression cost function; one that takes into account the real costs due to misclassification and correct classification. Subsequently, we propose a cost-sensitive decision trees algorithm which is based on incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Lastly, we define an example-dependent cost-sensitive framework for ensembles of decision-trees. It is based on training example-dependent cost-sensitive decision trees using four different random inducer methods and then blending them using three different combination approaches. Moreover, we present the library CostCla developed as part of the thesis. This library is an open-source implementation of all the algorithms covered in this manuscript. Finally, the experimental results show the importance of using the real example-dependent financial costs associated with real-world applications. We found that there are significant differences in the results when evaluating a model using a traditional cost-insensitive measure such as accuracy or F1Score, than when using the financial savings. Moreover, the results show that the proposed algorithms have better results for all databases, in the sense of higher savings.
Fonds National de la Recherche - FnR
http://hdl.handle.net/10993/22367
Source code is available at https://github.com/albahnsen/CostSensitiveClassification
FnR ; FNR5942749 > Alejandro Correa Bahnsen > PROTECT > Prevention of Fraud by Pattern Detection in Credit Card Transactions > 01/04/2013 > 31/05/2015 > 2013

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
Thesis_ExampleDependentCostSensitiveClassification.pdfPublisher postprint2.48 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.