Apache airflow; CI/CD/CT; Freight rail operations; Machine learning; ML deployment; MLOps; Delay Time; Freight rail operation; Machine-learning; MLOp; Rail operations; Real- time; Work-flows; Control and Systems Engineering; Artificial Intelligence; Electrical and Electronic Engineering; CI; CD; CT
Abstract :
[en] Railways are essential for freight transport due to their operational reliability advantages, but maintaining this advantage requires optimised railway infrastructure. Previous research has developed models to predict freight rail disruptions/disturbances and their associated delay times, in order to better understand the impact of multiple factors on them. However, because these models are built on static datasets, extracting real value from a model in a production environment remains difficult. This paper presents a methodology that demonstrates the potential of MLOps in automating the entire workflow, from data extraction to model deployment for real-time delay predictions in freight rail operations, including good practices of Continuous-Integration, Continuous-Delivery, and Continuous-Training, as well as a tool list for each process. Our research advances the field of railway operations by developing an entire MLOps workflow using data from the freight rail operations of the Luxembourgish National Freight Railway Company over a seventeen-month period. Furthermore, we employed a LightGBM model that had previously performed well in another study. This workflow can be automatically triggered to develop the processes and thus maintain an ML model capable of predicting delay times for CFL Multimodal operations in real-time. Our findings demonstrate that MLOps have the potential to automate the entire process, opening up new avenues for future research in this field. Although the methodology presented is intended to optimise freight rail operations for a specific company, it can be easily transferable to other railway companies or other transportation industries, such as aviation, shipping, and trucking.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Juan Pineda-Jaramillo ; Department of Engineering, University of Luxembourg, Luxembourg
VITI, Francesco ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
External co-authors :
no
Language :
English
Title :
MLOps in freight rail operations
Publication date :
August 2023
Journal title :
Engineering Applications of Artificial Intelligence
This study was possible thanks to the collaboration agreement signed between the University of Luxembourg and CFL Multimodal, and funding obtained by the Luxembourg National Research Fund FNR, through the project “ANticipatory Train Optimisation with Intelligent maNagEment (ANTOINE)”, under grant BRIDGES2020/MS/14767177/ANTOINE. Special thanks to Nathalie Stef and Michael Maraldi from CFL for sharing the data used in this study.This study was possible thanks to the collaboration agreement signed between the University of Luxembourg and CFL Multimodal, and funding obtained by the Luxembourg National Research Fund FNR , through the project “ANticipatory Train Optimisation with Intelligent maNagEment (ANTOINE)”, under grant BRIDGES2020/MS/14767177/ANTOINE . Special thanks to Nathalie Stef and Michael Maraldi from CFL for sharing the data used in this study.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
Ali, M., Pycaret: An open source, low-code machine learning library in python. 2020.
Apache Software Foundation, Apache airflow, a platform created by the community to programmatically author. Sched. Monit. Work., 2015.
Apache Software Foundation, Apache beam, the easiest way to do batch and streaming data processing. 2016.
Barbour, W., Martinez Mori, J.C., Kuppa, S., Work, D.B., Prediction of arrival times of freight traffic on US railroads using support vector regression. Transp. Res. C 93 (2018), 211–227, 10.1016/j.trc.2018.05.019.
Batra, D., Marakas, G.M., Conceptual data modelling in theory and practice. Eur. J. Inf. Syst. 4 (1995), 185–193, 10.1057/ejis.1995.21.
Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M., Stochastic delay prediction in large train networks. OpenAccess Ser. Inform. 20 (2011), 100–111, 10.4230/OASIcs.ATMOS.2011.100.
Bešinović, N., Goverde, R.M.P., Quaglietta, E., Roberti, R., An integrated micro–macro approach to robust railway timetabling. Transp. Res. B 87 (2016), 14–32, 10.1016/j.trb.2016.02.004.
Bollegala, D., Dynamic feature scaling for online learning of binary classifiers. Knowl.-Based Syst. 129 (2017), 97–105, 10.1016/j.knosys.2017.05.010.
Bowman, J., Emerson, S., Darnovsky, M., The Practical SQL HandBook: Using SQL Variants. Fourth ed., 2001, Addison-Wesley Professional.
Cacchiani, V., Caprara, A., Toth, P., Scheduling extra freight trains on railway networks. Transp. Res. B 44 (2010), 215–231, 10.1016/j.trb.2009.07.007.
Cacchiani, V., Huisman, D., Kidd, M., Kroon, L., Toth, P., Veelenturf, L., Wagenaar, J., An overview of recovery models and algorithms for real-time railway rescheduling. Transp. Res. B 63 (2014), 15–37, 10.1016/j.trb.2014.01.009.
Corman, F., Kecman, P., Stochastic prediction of train delays in real-time using Bayesian networks. Transp. Res. C 95 (2018), 599–615, 10.1016/j.trc.2018.08.003.
Datta, A., Thomas, H., The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses. Decis. Support Syst. 27 (1999), 289–301, 10.1016/S0167-9236(99)00052-4.
Dollevoet, T., Huisman, D., Kroon, L., Schmidt, M., Schöbel, A., Delay management including capacities of stations. Transp. Sci. 49 (2015), 185–203, 10.1287/trsc.2013.0506.
Dong, K., Romanov, I., McLellan, C., Esen, A.F., Recent text-based research and applications in railways: A critical review and future trends. Eng. Appl. Artif. Intell., 116, 2022, 105435, 10.1016/j.engappai.2022.105435.
Garg, Satvik, Pundir, P., Rathee, G., Gupta, P.K., Garg, S., 2021. On Continuous Integration/ Continuous Delivery for Automated Deployment of Machine Learning Models using MLOps. In: Proc. - 2021 IEEE 4th Int. Conf. Artif. Intell. Knowl. Eng. AIKE 2021. pp. 25–28. http://dx.doi.org/10.1109/AIKE52691.2021.00010.
Géron, A., HandS-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. second ed., 2019, O'Reilly Media, Inc.
Ghofrani, F., He, Q., Goverde, R.M.P., Liu, X., Recent applications of big data analytics in railway transportation systems: A survey. Transp. Res. C 90 (2018), 226–246, 10.1016/j.trc.2018.03.010.
Google, Kubernetes, an open-source container orchestration system for automating software deployment, scaling, and management. 2014.
Google, Kubeflow: The machine learning toolkit for kubernetes. 2018.
Goverde, R.M.P., A delay propagation algorithm for large-scale railway traffic networks. Transp. Res. C 18 (2010), 269–287, 10.1016/j.trc.2010.01.002.
Goverde, R.M.P., Bešinović, N., Binder, A., Cacchiani, V., Quaglietta, E., Roberti, R., Toth, P., A three-level framework for performance-based railway timetabling. Transp. Res. C 67 (2016), 62–83, 10.1016/j.trc.2016.02.004.
Goverde, R.M.P., Hansen, I.A., Performance indicators for railway timetables. 2013 IEEE International Conference on Intelligent Rail Transportation Proceedings, 2013, IEEE, 301–306, 10.1109/ICIRT.2013.6696312.
Granlund, T., Stirbu, V., Mikkonen, T., Towards regulatory-compliant MLOps: Oravizio's journey from a machine learning experiment to a deployed certified medical product. SN Comput. Sci., 2(342), 2021, 10.1007/s42979-021-00726-1.
Gürses-tran, G., Advances in time series forecasting development for power systems ’ operation with MLOps. 2022, 501–524.
Huang, P., Wen, C., Fu, L., Lessan, J., Jiang, C., Peng, Q., Xu, X., Modeling train operation as sequences: A study of delay prediction with operation and weather data. Transp. Res. E, 141, 2020, 102022, 10.1016/j.tre.2020.102022.
Huang, P., Wen, C., Peng, Q., Jiang, C., Yang, Y., Fu, Z., Modeling the influence of disturbances in high-speed railway systems. J. Adv. Transp. 2019 (2019), 1–13, 10.1155/2019/8639589.
Kazmierczak, J., Schut, D., Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning. 2021 Google Cloud.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y., Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017-Decem (2017), 3147–3155.
Kecman, P., Goverde, R.M.P., Predictive modelling of running and dwell times in railway traffic. Public Transp. 7 (2015), 295–319, 10.1007/s12469-015-0106-7.
Kotliar, M., Kartashov, A.V., Barski, A., CWL-airflow: a lightweight pipeline manager supporting common workflow language. Gigascience, 8, 2019, 10.1093/gigascience/giz084.
Kuflik, T., Minkov, E., Nocera, S., Grant-Muller, S., Gal-Tzur, A., Shoor, I., Automating a framework to extract and analyse transport related social media content: The potential and the challenges. Transp. Res. C 77 (2017), 275–291, 10.1016/j.trc.2017.02.003.
Lessan, J., Fu, L., Wen, C., Huang, P., Jiang, C., Stochastic model of train running time and arrival delay: A case study of Wuhan–Guangzhou high-speed rail. Transp. Res. Rec. J. Transp. Res. Board 2672 (2018), 215–223, 10.1177/0361198118780830.
Li, D., Daamen, W., Goverde, R.M.P., Estimation of train dwell time at short stops based on track occupation event data: A study at a dutch railway station. J. Adv. Transp. 50 (2016), 877–896, 10.1002/atr.1380.
Li, S., Gerver, P., MacMillan, J., Debrunner, D., Marshall, W., Wu, K.-L., Challenges and experiences in building an efficient apache beam runner for IBM streams. Proc. VLDB Endow. 11 (2018), 1742–1754, 10.14778/3229863.3229864.
Luo, J., Peng, Q., Wen, C., Wen, W., Huang, P., Data-driven decision support for rail traffic control: A predictive approach. Expert Syst. Appl., 207, 2022, 118050, 10.1016/j.eswa.2022.118050.
Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H., Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Inf. Softw. Technol., 127, 2020, 106368, 10.1016/j.infsof.2020.106368.
Marković, N., Milinković, S., Tikhonov, K.S., Schonfeld, P., Analyzing passenger train arrival delays with support vector regression. Transp. Res. C 56 (2015), 251–262, 10.1016/j.trc.2015.04.004.
McKinney, W., 2010. Data Structures for Statistical Computing in Python. 56–61. http://dx.doi.org/10.25080/Majora-92bf1922-00a.
Merkel, D., Docker: lightweight linux containers for consistent development and deployment. Linux J., 2014, 2014, 10.5555/2600239.2600241.
Mesa-Arango, R., Pineda-Jaramillo, J., Araujo, D.S.A., Bi, J., Basva, M., Viti, F., Missions and factors determining the demand for affordable mass space tourism in the United States: A machine learning approach. Acta Astronaut. 204 (2023), 307–320, 10.1016/j.actaastro.2023.01.006.
Milinković, S., Marković, M., Vesković, S., Ivić, M., Pavlović, N., A fuzzy Petri net model to estimate train delays. Simul. Model. Pract. Theory 33 (2013), 144–157, 10.1016/j.simpat.2012.12.005.
Minbashi, N., Sipilä, H., Palmqvist, C.-W., Bohlin, M., Kordnejad, B., Machine learning-assisted macro simulation for yard arrival prediction. J. Rail Transp. Plan. Manag., 25, 2023, 100368, 10.1016/j.jrtpm.2022.100368.
Nair, R., Hoang, T.L., Laumanns, M., Chen, B., Cogill, R., Szabó, J., Walter, T., An ensemble prediction model for train delays. Transp. Res. C 104 (2019), 196–209, 10.1016/j.trc.2019.04.026.
Patterson, D., Gonzalez, J., Holzle, U., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D.R., Texier, M., Dean, J., The carbon footprint of machine learning training will plateau, then shrink. Computer (Long. Beach. Calif) 55 (2022), 18–28, 10.1109/MC.2022.3148714.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É, Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825–2830.
Pineda-Jaramillo, J., Barrera-Jiménez, H., Mesa-Arango, R., Unveiling the relevance of traffic enforcement cameras on the severity of vehicle–pedestrian collisions in an urban environment with machine learning models. J. Safety Res., 2022, 10.1016/j.jsr.2022.02.014.
Pineda-Jaramillo, J., Bigi, F., Viti, F., 2022b. A data-driven model for short-term prediction of arrival delay times in freight rail operations. In: Triennial Symposium on Transportation Analysis Conference. Mauritius Island.
Pineda-Jaramillo, J., Viti, F., Identifying the rail operating features associated to intermodal freight rail operation delays. Transp. Res. C, 147, 2023, 103993, 10.1016/j.trc.2022.103993.
Ruf, P., Madan, M., Reich, C., Ould-Abdeslam, D., Demystifying mlops and presenting a recipe for the selection of open-source tools. Appl. Sci., 11, 2021, 10.3390/app11198861.
Subramanya, R., Sierla, S., Vyatkin, V., From DevOps to MLOps: Overview and application to electricity market forecasting. Appl. Sci., 12, 2022, 9851, 10.3390/app12199851.
Talby, D., Why machine learning models crash and burn in production. Forbes, 2019.
Tavares, C., Wang, X., Saha, S., Grasley, Z., Machine learning-based mix design tools to minimize carbon footprint and cost of UHPC. Part 1: Efficient data collection and modeling. Clean. Mater., 4, 2022, 100082, 10.1016/j.clema.2022.100082.
Treveil, M., the Dataiku Team, Introducing MLOps. how to Scale Machine Learning in the Enterprise. 2020, O'Reilly Media, Inc., Sebastopol, CA, USA.
Van der Meer, D., Goverde, R.M.P., Hansen, I.A., 2010. Prediction of train running times using historical track occupation data. In: 12th World Conference on Transport Research. Lisbon.
Wang, X., Li, S., Cao, Y., Xin, T., Yang, L., Dynamic speed trajectory generation and tracking control for autonomous driving of intelligent high-speed trains combining with deep learning and backstepping control methods. Eng. Appl. Artif. Intell., 115, 2022, 105230, 10.1016/j.engappai.2022.105230.
Wen, C., Li, Z., Lessan, J., Fu, L., Huang, P., Jiang, C., Statistical investigation on train primary delay based on real records: evidence from Wuhan–Guangzhou HSR. Int. J. Rail Transp. 5 (2017), 170–189, 10.1080/23248378.2017.1307144.
Xu, J., Mlops in the financial industry: Philosophy, practices, and tools. The Future and FinTech, 2022, WORLD SCIENTIFIC, 451–488, 10.1142/9789811250903_0014.
Yaghini, M., Khoshraftar, M.M., Seyedabadi, M., Railway passenger train delay prediction via neural network model. J. Adv. Transp. 47 (2013), 355–368, 10.1002/atr.193.
Yuan, J., Goverde, R., Hansen, I., Propagation of train delays in stations. Allan, J., Hill, R.J., Brebbia, C.A., Sciutto, G., Sone, S., (eds.) Computers in Railways VIII, 2002, WIT Press, Southhampton, UK, 975–984.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.