banking; categorization of operational risk events; machine learning; operational risk; risk management; Business and International Management; Finance; Economics and Econometrics
Abstract :
[en] This paper provides an overview of how machine learning can help in categorizing textual descriptions of operational loss events into Basel II event types. We apply PYTHON implementations of support vector machine and multinomial naive Bayes algorithms to precategorized Öffentliche Schadenfälle OpRisk (ÖffSchOR) data to demonstrate that operational loss events can be automatically assigned to one of the seven Basel II event types with very few costs and satisfactory accuracy. Our comprehensive case study on ÖffSchOR data, which includes the provision of parsi-monious PYTHON code, is also useful for practitioners, who can use this knowledge to improve the cost efficiency and/or reliability of their processes for categorizing operational risk events.
Disciplines :
Management information systems
Author, co-author :
Pakhchanyan, Suren; Operational/Non-Financial Risk Management, SMBC Bank EU AG, Frankfurt, Germany
FIEBERG, Christian ; University of Luxembourg ; Empirical Capital Market Research and Derivatives, University of Bremen, Bremen, Germany
Metko, Daniel; Empirical Capital Market Research and Derivatives, University of Bremen, Bremen, Germany
KASPEREIT, Thomas ; University of Luxembourg > Faculty of Law, Economics and Finance (FDEF) > Department of Economics and Management (DEM)
External co-authors :
yes
Language :
English
Title :
Machine learning for categorization of operational risk events using textual description
Publication date :
December 2022
Journal title :
Journal of Operational Risk
ISSN :
1744-6740
eISSN :
1755-2710
Publisher :
Incisive Media Ltd.
Volume :
17
Issue :
4
Pages :
37 - 65
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
Funding text :
We are grateful to VÖB-Service GmbH, Germany, and in particular to Petra Lud-wig, for providing access to the ÖffSchOR database. We also thank Thomas Moos-brucker, a partner at Deloitte Germany and coleader of the Quant Team in Risk Advisory, for many helpful comments. This paper evolved as part of the research project “Machine learning in accounting and finance”, which was supported by the Diginomics Research Group at the University of Bremen.We are grateful to VÖB-Service GmbH, Germany, and in particular to Petra Ludwig, for providing access to the ÖffSchOR database. We also thank Thomas Moos-brucker, a partner at Deloitte Germany and coleader of the Quant Team in Risk Advisory, for many helpful comments. This paper evolved as part of the research project “Machine learning in accounting and finance”, which was supported by the Diginomics Research Group at the University of Bremen.
Afanasiev, S., and Smirova, A. (2018). Predictive fraud analytics: B-tests. The Journal of Operational Risk 13(4), 17–46 (https://doi.org/10.21314/JOP.2018.213).
Agostini, A., Talamo, P., and Vecchione, V. (2010). Combining operational loss data with expert opinions through advanced credibility theory. The Journal of Operational Risk 5(1), 3–28 (https://doi.org/10.21314/JOP.2010.070).
Analytics Vidhya (2018). Comprehensive guide to understand and implement text classification in Python. Blog Post by Shivam5992 Bansal, April 23, Analytics Vidhya. URL: https://bit.ly/2ONi52S [last modified June 15, 2022].
Basel Committee on Banking Supervision (2001). The New Basel Capital Accord. Consul-tative Document, January, Bank for International Settlements. URL: www.bis.org/publ/ bcbsca02.pdf.
Basel Committee on Banking Supervision (2017). Basel III: finalising post-crisis reforms. Standards Document, December, Bank for International Settlements. URL: www.bis.org/bcbs/publ/d424.pdf.
Bermúdez, L., Pérez, J. M., Ayuso, M., Gómez, E., and Vázquez, F. J. (2008). A Bayesian dichotomous model with asymmetric link for fraud in insurance. Insurance: Mathematics and Economics 42(2), 779–786 (https://doi.org/10.1016/j.insmatheco.2007.08.002).
Bühlmann, H., Shevchenko, P. V., and Wüthrich, M. V. (2007). A “toy” model for operational risk quantification using credibility theory. The Journal of Operational Risk 2(1), 3–19 (https://doi.org/10.21314/JOP.2007.023).
Chapelle, A. (2019). Operational Risk Management: Best Practices in the Financial Industry. Wiley (https://doi.org/10.1002/9781119548997).
Cohen, R. D. (2017). The issues with the standardized measurement approach and a potential future direction for operational risk capital modeling. The Journal of Operational Risk 12(3), 17–28 (https://doi.org/10.21314/JOP.2017.203).
Colladon, A. F., and Remondi, E. (2017). Using social network analysis to prevent money laundering. Expert Systems with Applications 67, 49–58 (https://doi.org/10.1016/j.eswa.2016.09.029).
Cope, E. W., Mignola, G., Antonini, G., and Ugoccioni, R. (2009). Challenges and pitfalls in measuring operational risk from loss data. The Journal of Operational Risk 4(4), 3–27 (https://doi.org/10.21314/JOP.2009.069).
Coppola, F. (2019). The UK’s biggest financial scandal bites its biggest bank – again. Forbes, July 31. URL: https://bit.ly/3UPFxxu.
European Banking Authority (2014). Guidelines on common procedures and methodolo-gies for the supervisory review and evaluation process (SREP). Guidelines Document EBA/GL/2014/13, EBA. URL: https://bit.ly/3F3GxrV.
European Union (2006). Directive 2006/48/EC of the European Parliament and of the Council of 14 June 2006 relating to the taking up and pursuit of the business of credit institutions (recast). Official Journal of the European Union 49(L177), 1–200. URL: https://data.europa.eu/eli/dir/2006/48/oj.
European Union (2013). Regulation (EU) No 575/2013 of the European Parliament and of the Council of 26 June 2013 on prudential requirements for credit institutions and investment firms and amending Regulation (EU) No 648/2012. Official Journal of the European Union 56(L176), 1–337. URL: https://data.europa.eu/eli/reg/2013/575/oj.
European Union (2018). Commission Delegated Regulation (EU) 2018/959 of 14 March 2018 supplementing Regulation (EU) No 575/2013 of the European Parliament and of the Council with regard to regulatory technical standards of the specification of the assessment methodology under which competent authorities permit institutions to use Advanced Measurement Approaches for operational risk. Official Journal of the European Union 61(L169), 1–26. URL: https://data.europa.eu/eli/reg del/2018/959/oj.
Galli, S. (2020). Python Feature Engineering Cookbook: Over 70 Recipes for Creating, Engineering, and Transforming Features to Build Machine Learning Models. Packt, Bir-mingham.
Galloppo, G., and Previati, D. (2014). A review of methods for combining internal and external data. The Journal of Operational Risk 9(4), 83–103 (https://doi.org/10.21314/JOP.2014.135).
Gepp, A., Wilson, J. H., Kumar, K., and Bhattacharya, S. (2012). A comparative analysis of decision trees vis-à-vis other computational data mining techniques in automotive insurance fraud detection. Journal of Data Science 10(3), 537–561 (https://doi.org/10.6339/JDS.201207 10(3).0010).
Hoogs, B., Kiehl, T., Lacomb, C., and Senturk, D. (2007). A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud. Intelligent Systems in Accounting, Finance and Management 15(1–2), 41–56 (https://doi.org/10.1002/isaf.284).
Huebner, R. (2011). How can we improve the use and effectiveness of loss data? Journal of Securities Operations and Custody 3(4), 280–287.
Humpherys, S. L., Moffitt, K. C., Burns, M. B., Burgoon, J. K., and Felix, W. F. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems 50(3), 585–594 (https://doi.org/10.1016/j.dss.2010.08.009).
Jobst, A. A. (2007). It’s all in the data: consistent operational risk measurement and regu-lation. Journal of Financial Regulation and Compliance 15(4), 423–449 (https://doi.org/10.1108/13581980710835272).
Joshi, P. (2016). Python Machine Learning Cookbook: 100 Recipes That Teach You How to Perform Various Machine Learning Tasks in the Real World. Packt, Birmingham.
Kannan, S., and Somasundaram, K. (2017). Autoregressive-based outlier algorithm to detect money laundering activities. Journal of Money Laundering Control 20(2), 190– 202 (https://doi.org/10.1108/JMLC-07-2016-0031).
Kaspereit, T., Lopatta, K., Pakhchanyan, S., and Prokop, J. (2017). Systemic operational risk. Journal of Risk Finance 18(3), 252–267 (https://doi.org/10.1108/JRF-11-2016-0141).
Khrestina, M. P., Dorofeev, D. I., Kachurina, P. A., Usubaliev, T. R., and Dobrotvorskiy, A. S. (2017). Development of algorithms for searching, analyzing and detecting fraudulent activities in the financial sphere. European Research Studies Journal 20(4B), 484–498 (https://doi.org/10.35808/ersj/905).
Lambrigger, D. D., Shevchenko, P. V., and Wüthrich, M. V. (2007). The quantification of operational risk using internal data, relevant external data and expert opinions. The Journal of Operational Risk 2(3), 3–27 (https://doi.org/10.21314/JOP.2007.030).
Leo, M., Sharma, S., and Maddulety, K. (2019). Machine learning in banking risk manage-ment: a literature review. Risks 7(1), 1–22 (https://doi.org/10.3390/risks7010029).
Lewis, D. D. (1998). Naive (Bayes) at forty: the independence assumption in information retrieval. In Machine Learning: ECML-98, Nédellec, C., and Rouveirol, C. (eds), pp. 4–15. Lecture Notes in Computer Science, Volume 1398. Springer (https://doi.org/10.1007/BFb0026666).
Lin, C. C., Chiu, A. A., Huang, S. Y., and Yen, D. C. (2015). Detecting the financial statement fraud: the analysis of the differences between data mining techniques and experts’ judgments. Knowledge-Based Systems 89 459–470 (https://doi.org/10.1016/j.knosys.2015.08.011).
McCallum, A., and Nigam, K. (2001). A comparison of event models for naive Bayes text classification. In Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48. AAAI Press, Menlo Park, CA. URL: https://aaai.org/Library/Work shops/1998/ws98-05-007.php.
Mishra, C., Gupta, D. L., and Singh, R. (2017). Credit card fraud identification using artificial neural networks. International Journal of Computer Systems 4(7), 151–159. URL: https://bit.ly/3hjT16C.
Mora-Valencia, A. (2017). A note on the standard measurement approach versus the loss distribution approach–advanced measurement approach: the dawning of a new regulation. The Journal of Operational Risk 12(4), 51–69 (https://doi.org/10.21314/JOP.2017.197).
Pakhchanyan, S. (2016). Operational risk management in financial institutions: a literature review. International Journal of Financial Studies 4(4), 1–21 (https://doi.org/10.3390/ijfs 4040020).
Pathak, J., Vidyarthi, N., and Summers, S. L. (2005). A fuzzy-based algorithm for auditors to detect elements of fraud in settled insurance claims. Managerial Auditing Journal 20(6), 632–644 (https://doi.org/10.1108/02686900510606119).
Phua, C., Alahakoon, D., and Lee, V. (2004). Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explorations Newsletter 6(1), 50–59 (https://doi.org/10.1145/1007730.1007738).
Rajput, Q., Khan, N., Larik, A., and Haider, S. (2014). Ontology based expert-system for suspicious transactions detection. Computer and Information Science 7(1), 103–114 (https://doi.org/10.5539/cis.v7n1p103).
Risk.net (2020). Operational risk: unleashing the power of AI to mitigate financial crime and manage conduct risk. Webinar, Risk.net, May 11. URL: https://bit.ly/3P3dc5d.
Seeja, K. R., and Zareapoor, M. (2014). FraudMiner: a novel credit card fraud detection model based on frequent itemset mining. Scientific World Journal 2014(3), 1–10 (https://doi.org/10.1155/2014/252797).
Shevchenko, P. V., and Wüthrich, M. V. (2006). The structural modelling of operational risk via Bayesian inference: combining loss data with expert opinions. The Journal of Operational Risk 1(3), 3–26 (https://doi.org/10.21314/JOP.2006.016).
Shirgave, S., Awati, C., More, R., and Patil, S. (2019). A review on credit card fraud detection using machine learning. International Journal of Scientific and Technology Research 8(10), 1217–1220. URL: https://bit.ly/3saobiA.
Steinwart, I., and Christmann, A. (2008). Support Vector Machines. Springer.
Sturm, P. (2013). Operational and reputational risk in the European banking industry: the market reaction to operational risk events. Journal of Economic Behavior and Organization 85, 191–206 (https://doi.org/10.1016/j.jebo.2012.04.005).
Sudjianto, A., Nair, S., Yuan, M., Zhang, A., Kern, D., and Cela-Díaz, F. (2010). Statistical methods for fighting financial crimes. Technometrics 52(1), 5–19 (https://doi.org/10.1198/TECH.2010.07032).
Vaidya, A. H., and Mohod, S. W. (2012). Internet banking fraud detection using HMM and BLAST-SSAHA hybridization. International Journal of Science and Research 3(7), 574–579. URL: www.ijsr.net/archive/v3i7/MDgwNzE0MDU=.pdf.
Viaene, S., Derrig, R. A., and Dedene, G. (2004). A case study of applying boosting naive Bayes to claim fraud diagnosis. IEEE Transactions on Knowledge and Data Engineering 16(5), 612–620 (https://doi.org/10.1109/TKDE.2004.1277822).
Viaene, S., Dedene, G., and Derrig, R. A. (2005). Auto claim fraud detection using Bayesian learning neural networks. Expert Systems with Applications 29(3), 653–666 (https://doi.org/10.1016/j.eswa.2005.04.030).
Viaene, S., Ayuso, M., Gheel, D., and Dedene, G. (2007). Strategies for detecting fraudulent claims in the automobile insurance industry. European Journal of Operational Research 176(1), 565–583 (https://doi.org/10.1016/j.ejor.2005.08.005).
Wang, Y., and Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems 105, 87–95 (https://doi.org/10.1016/j.dss.2017.11.001).
Yeh, I. C., and Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36(2), 2473–2480 (https://doi.org/10.1016/j.eswa.2007.12.020).
Zhou, F., Qi, X., Xiao, C., and Wang, J. (2021). MetaRisk: semi-supervised few-shot operational risk classification in banking industry. Information Sciences 552, 1–16 (https://doi.org/10.1016/j.ins.2020.11.027).