Distributed C++-Python embedding for fast predictions and fast prototyping

[en] Python has evolved to become the most popular language for data science. It sports state-of-the-art libraries for analytics and machine learning, like Sci-Kit Learn. However, Python lacks the computational performance that a industrial system requires for high frequency real time predictions. Building upon a year long research project heavily based on SciKit Learn (sklearn), we faced performance issues in deploying to production. Replacing sklearn with a better performing framework would require re-evaluating and tuning hyperparameters from scratch. Instead we developed a python embedding in a C++ based server application that increased performance by up to 20x, achieving linear scalability up to a point of convergence. Our implementation was done for mainstream cost effective hardware, which means we observed similar performance gains on small as well as large systems, from a laptop to an Amazon EC2 instance to a high-end server.

Disciplines :

Sciences informatiques

Auteur, co-auteur :

VARISTEAS, Georgios ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

AVANESOV, Tigran ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

STATE, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Co-auteurs externes :

Langue du document :

Anglais

Titre :

Distributed C++-Python embedding for fast predictions and fast prototyping

Date de publication/diffusion :

2018

Nom de la manifestation :

Second Workshop on Distributed Infrastructures for Deep Learning (DIDL) 2018

Date de la manifestation :

10-12-2018

Titre de l'ouvrage principal :

Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning

ISBN/EAN :

978-1-4503-6119-4

Peer reviewed :

Peer reviewed

Projet FnR :

FNR11822390 - Optimal Scalability And Performance In Programmatic Advertising Platforms, 2017 (01/09/2017-31/08/2019) - Georgios Varisteas

Disponible sur ORBilu :

depuis le 21 décembre 2018

Statistiques

Nombre de vues

205 (dont 8 Unilu)

Nombre de téléchargements

355 (dont 3 Unilu)

Voir plus de statistiques

citations Scopus^®

citations Scopus^®
sans auto-citations

Bibliographie

MartÃŋn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G Murray, Benoit Steiner, Paul Tucker, Vijay Va-sudevan, Pete Warden, Martin Wicke, Yuan Yu, Xiaoqiang Zheng, and Google Brain. 2016. TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16) (2016), 265–284. https://doi.org/10.1038/nn.3331
David Beazley. 2010. Understanding the Python GIL. PyCON Python Conference. Atlanta, Georgia C (2010), 1–62. http://dabeaz.com/python/UnderstandingGIL.pdf
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. 1–6. https://doi.org/10.1145/2532637
Michael J Flynn and Kevin W Rudd. 1996. Parallel architectures. ACM Comput. Surv. 28, 1 (1996), 67–70. https://doi.org/10.1145/234313. 234345
Allison Gray, Chris Gottbrath, Ryan Olson, and Shashank Prasanna. 2017. Deploying deep neural networks with nvidia tensorrt.
Xinran He, Stuart Bowers, Joaquin QuiÃśonero Candela, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, and Ralf Herbrich. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - ADKDD’14 (2014), 1–9. https://doi.org/10.1145/2648584.2648589
Danny Hendler. 2002. Work Dealing [ Extended Abstract ]. Work (2002), 164–172.
T Kraska, A Talwalkar, J Duchi, R Griffith, M Franklin, and M Jordan. 2013. MLbase: A Distributed Machine-learning System. In 6th Biennial Conference on Innovative Data Systems Research (CIDRâĂŹ13).
Donghee Lee, Jongmoo Choi, and Sam H Noh. 1996. LRFU (Least Recently/Frequently Used) Replacement 1 Introduction. IEEE Trans. Comput. 50, 12 (1996), 1352–1361.
Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A Gibson, and Eric P Xing. 2014. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning. In Nips. 1–9.
Mu Li. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 2014 International Conference on Big Data Science and Computing - BigDataScience’14. 1–1. https://doi.org/10.1145/2640087.2644155
S Migacz. 2017. 8-bit inference with TensorRT. In GPU Technology Conference.
The Next and Platform Weekly. 2017. Nvidia Pushes Deep Learning Inference With New Pascal GPUs. Next Platform, September (2017), 1–6.
Fabian Pedregosa, GaÃńl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret-tenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and ÃĽdouard Duchesnay. 2012. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2012), 2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2
Jun Wang, Weinan Zhang, and Shuai Yuan. 2016. Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting. arXiv preprint arXiv:1610.03013 (2016). http://arxiv.org/abs/1610.03013