References of "Correa Bahnsen, Alejandro 50001386"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailExample-Dependent Cost-Sensitive Decision Trees
Correa Bahnsen, Alejandro UL; Aouada, Djamila UL; Ottersten, Björn UL

in Expert Systems with Applications (2015), 42(19), 6609-6619

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not ... [more ▼]

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. State-of-the-art example-dependent cost-sensitive techniques only introduce the cost to the algorithm, either before or after training, therefore, leaving opportunities to investigate the potential impact of algorithms that take into account the real financial example-dependent costs during an algorithm training. In this paper, we propose an example-dependent cost-sensitive decision tree algorithm, by incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Then, using three different databases, from three real-world applications: credit card fraud detection, credit scoring and direct marketing, we evaluate the proposed method. The results show that the proposed algorithm is the best performing method for all databases. Furthermore, when compared against a standard decision tree, our method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings, leading to a method that not only has more business-oriented results, but also a method that creates simpler models that are easier to analyze. [less ▲]

Detailed reference viewed: 286 (8 UL)
Full Text
See detailExample-Dependent Cost-Sensitive Classification with Applications in Financial Risk Modeling and Marketing Analytics
Correa Bahnsen, Alejandro UL

Doctoral thesis (2015)

Several real-world binary classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However ... [more ▼]

Several real-world binary classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard binary classification methods do not take these costs into account, and assume a constant cost of misclassification errors. This approach is not realistic in many real-world applications. For example in credit card fraud detection, failing to detect a fraudulent transaction may have an economical impact from a few to thousands of Euros, depending on the particular transaction and card holder. In churn modeling, a model is used for predicting which customers are more likely to abandon a service provider. In this context, failing to identify a profitable or unprofitable churner has a significant different economic result. Similarly, in direct marketing, wrongly predicting that a customer will not accept an offer when in fact he will, may have different financial impact, as not all customers generate the same profit. Lastly, in credit scoring, accepting loans from bad customers does not have the same economical loss, since customers have different credit lines, therefore, different profit. Accordingly, the goal of this thesis is to provide an in-depth analysis of example-dependent cost-sensitive classification. We analyze four real-world classification problems, namely, credit card fraud detection, credit scoring, churn modeling and direct marketing. For each problem, we propose an example-dependent cost-sensitive evaluation measure. We propose four example-dependent cost-sensitive methods; the first method is a cost-sensitive Bayes minimum risk classifier which consists in quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions. Second, we propose a cost-sensitive logistic regression technique. This algorithm is based on a new logistic regression cost function; one that takes into account the real costs due to misclassification and correct classification. Subsequently, we propose a cost-sensitive decision trees algorithm which is based on incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Lastly, we define an example-dependent cost-sensitive framework for ensembles of decision-trees. It is based on training example-dependent cost-sensitive decision trees using four different random inducer methods and then blending them using three different combination approaches. Moreover, we present the library CostCla developed as part of the thesis. This library is an open-source implementation of all the algorithms covered in this manuscript. Finally, the experimental results show the importance of using the real example-dependent financial costs associated with real-world applications. We found that there are significant differences in the results when evaluating a model using a traditional cost-insensitive measure such as accuracy or F1Score, than when using the financial savings. Moreover, the results show that the proposed algorithms have better results for all databases, in the sense of higher savings. [less ▲]

Detailed reference viewed: 528 (13 UL)
Full Text
Peer Reviewed
See detailA novel cost-sensitive framework for customer churn predictive modeling
Correa Bahnsen, Alejandro UL; Aouada, Djamila UL; Ottersten, Björn UL

in Decision Analytics (2015), 2(5),

Customer churn predictive modeling deals with predicting the probability of a customer defecting using historical, behavioral and socio-economical information. This tool is of great benefit to ... [more ▼]

Customer churn predictive modeling deals with predicting the probability of a customer defecting using historical, behavioral and socio-economical information. This tool is of great benefit to subscription based companies allowing them to maximize the results of retention campaigns. The problem of churn predictive modeling has been widely studied by the data mining and machine learning communities. It is usually tackled by using classification algorithms in order to learn the different patterns of both the churners and non-churners. Nevertheless, current state-of-the-art classification algorithms are not well aligned with commercial goals, in the sense that, the models miss to include the real financial costs and benefits during the training and evaluation phases. In the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive power, does not yield to the best results when measured by the actual financial cost, ie. investment per subscriber on a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this paper, we present a new cost-sensitive framework for customer churn predictive modeling. First we propose a new financial based measure for evaluating the effectiveness of a churn campaign taking into account the available portfolio of offers, their individual financial cost and probability of offer acceptance depending on the customer profile. Then, using a real-world churn dataset we compare different cost-insensitive and cost-sensitive classification algorithms and measure their effectiveness based on their predictive power and also the cost optimization. The results show that using a cost-sensitive approach yields to an increase in cost savings of up to 26.4% [less ▲]

Detailed reference viewed: 266 (5 UL)
Full Text
Peer Reviewed
See detailExample-Dependent Cost-Sensitive Logistic Regression for Credit Scoring
Correa Bahnsen, Alejandro UL; Aouada, Djamila UL; Ottersten, Björn UL

in 2014 13th International Conference on Machine Learning and Applications (2014, December 03)

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. Credit scoring is a typical example of cost ... [more ▼]

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. Credit scoring is a typical example of cost-sensitive classification. However, it is usually treated using methods that do not take into account the real financial costs associated with the lending business. In this paper, we propose a new example-dependent cost matrix for credit scoring. Furthermore, we propose an algorithm that introduces the example-dependent costs into a logistic regression. Using two publicly available datasets, we compare our proposed method against state-of-the-art example-dependent cost-sensitive algorithms. The results highlight the importance of using real financial costs. Moreover, by using the proposed cost-sensitive logistic regression, significant improvements are made in the sense of higher savings. [less ▲]

Detailed reference viewed: 247 (9 UL)
Full Text
Peer Reviewed
See detailImproving Credit Card Fraud Detection with Calibrated Probabilities
Correa Bahnsen, Alejandro UL; Stojanovic, Aleksandar UL; Aouada, Djamila UL et al

in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24-26, 2014. (2014)

Previous analysis has shown that applying Bayes minimum risk to detect credit card fraud leads to better results measured by monetary savings, as compared with traditional methodologies. Nevertheless ... [more ▼]

Previous analysis has shown that applying Bayes minimum risk to detect credit card fraud leads to better results measured by monetary savings, as compared with traditional methodologies. Nevertheless, this approach requires good probability estimates that not only separate well between positive and negative examples, but also assess the real probability of the event. Unfortunately, not all classification algorithms satisfy this restriction. In this paper, two different methods for calibrating probabilities are evaluated and analyzed in the context of credit card fraud detection, with the objective of finding the model that minimizes the real losses due to fraud. Even though under-sampling is often used in the context of classification with unbalanced datasets, it is shown that when probabilistic models are used to make decisions based on minimizing risk, using the full dataset provides significantly better results. In order to test the algorithms, a real dataset provided by a large European card processing company is used. It is shown that by calibrating the probabilities and then using Bayes minimum Risk the losses due to fraud are reduced. Furthermore, because of the good overall results, the aforementioned card processing company is currently incorporating the methodology proposed in this paper into their fraud detection system. Finally, the methodology has been tested on a different application, namely, direct marketing. [less ▲]

Detailed reference viewed: 525 (34 UL)
Full Text
Peer Reviewed
See detailCost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk
Correa Bahnsen, Alejandro UL; Stojanovic, Aleksandar UL; Aouada, Djamila UL et al

in Proceedings of 12th International Conference on Machine Learning and Applications, ICMLA 2013 (2013), 1

Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card ... [more ▼]

Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company. [less ▲]

Detailed reference viewed: 920 (35 UL)