Abstract :
[en] Python has evolved to become the most popular language for data science. It sports state-of-the-art libraries for analytics and machine learning, like Sci-Kit Learn. However, Python lacks the computational performance that a industrial system requires for high frequency real time predictions.
Building upon a year long research project heavily based on SciKit Learn (sklearn), we faced performance issues in deploying to production. Replacing sklearn with a better performing framework would require re-evaluating and tuning hyperparameters from scratch. Instead we developed a python embedding in a C++ based server application that increased performance by up to 20x, achieving linear scalability up to a point of convergence. Our implementation was done for mainstream cost effective hardware, which means we observed similar performance gains on small as well as large systems, from a laptop to an Amazon EC2 instance to a high-end server.
Scopus citations®
without self-citations
2