This version of the article has been accepted for publication, after peer review but is not the
Version of Record and does not reflect post-acceptance improvements, or any corrections.
The Version of Record is available online at: https://doi.org/10.1007/s10270-022-01052-7
Differentiable programming; Computational graph model; Edge AI
Résumé :
[en] Models based on differential programming, like deep neural networks, are well established in research and able to outperform manually coded counterparts in many applications. Today, there is a rising interest to introduce this flexible modeling to solve real-world problems. A major challenge when moving from research to application is the strict constraints on computational resources (memory and time). It is difficult to determine and contain the resource requirements of differential models,
especially during the early training and hyperparameter exploration stages. In this article, we address this challenge by introducing CalcGraph, a model abstraction of differentiable programming layers. CalcGraph allows to model the computational resources that should be used and then CalcGraph’s model interpreter can automatically schedule the execution respecting the specifications made. We propose a novel way to efficiently switch models from storage to preallocated memory zones and vice versa to maximize the number of model executions given the available resources. We demonstrate the efficiency of our approach by showing that it consumes less resources than state-of-the-art frameworks like TensorFlow and PyTorch for single-model and multi-model execution.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
LORENTZ, Joe ; DataThings S.A. ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 3104–3112. MIT Press, Cambridge, MA, USA (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. (2016). 10.1109/CVPR.2016.90
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, pp. 2722–2730. IEEE Computer Society, Washington, DC, USA (2015)
Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23, 89–109 (2001). 10.1016/S0933-3657(01)00077-X DOI: 10.1016/S0933-3657(01)00077-X
Li, H., Ota, K., Dong, M.: Learning iot in edge: deep learning for the internet of things with edge computing. IEEE Netw. 32(1), 96–101 (2018) DOI: 10.1109/MNET.2018.1700202
Stojanovic, R., Mitropulos, P., Koulamas, C., Karayiannis, Y., Koubias, S., Papadopoulos, G.: Real-time vision-based system for textile fabric inspection. Real-Time Imaging 7(6), 507–518 (2001) DOI: 10.1006/rtim.2001.0231
Plastiras, G., Terzi, M., Kyrkou, C., Theocharidcs, T.: Edge intelligence: challenges and opportunities of near-sensor machine learning applications. In: 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 1–7. (2018). 10.1109/ASAP.2018.8445118
Cabot, J., Clarisó, R., Brambilla, M., Gérard, S.: Cognifying model-driven software engineering. In: Seidl, M., Zschaler, S. (eds.) STAF Workshops, Lecture Notes in Computer Science, vol. 10748, pp. 154–160. Springer, Cham (2017)
Hartmann, T., Moawad, A., Fouquet, F., Le Traon, Y.: The next evolution of mde: a seamless integration of machine learning into domain modeling. Softw. Syst. Model. 18(2), 1285–1304 (2019). 10.1007/s10270-017-0600-2 DOI: 10.1007/s10270-017-0600-2
Hartmann, T., Moawad, A., Schockaert, C., Fouquet, F., Le Traon, Y.: Meta-modelling meta-learning. In: 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 300–305. (2019)
Wimmer, C., Würthinger, T.: Truffle: A self-optimizing runtime system. In: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, SPLASH ’12, pp. 13–14. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2384716.2384723
Deep Learning est Mort. Vive Differentiable Programming?, 2018. [Online]. Available: https://www.facebook.com/yann.lecun/posts/10155003011462143, Accessed from 02 Aug 2021
Innes, M., Edelman, A., Fischer, K., Rackauckas, C., Saba, E., Shah, V.B., Tebbutt, W.: A differentiable programming system to bridge machine learning and scientific computing (2019)
Wang, F., Decker, J., Wu, X., Essertel, G., Rompf, T.: Backpropagation with continuation callbacks: foundations for efficient and expressive differentiable programming. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 10201–10212. Curran Associates Inc., Red Hook, NY, USA (2018)
Karpathy, A.: Software 2.0. [Online]. Available: https://medium.com/@karpathy/software-2-0-a64152b37c35, Accessed from 02 Aug 2021
Hartmann, T., Moawad, A., Fouquet, F., Le Traon, Y.: The next evolution of mde: a seamless integration of machine learning into domain modeling. In: 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 180–180. (2017)
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18(1), 5595–5637 (2017)
Margossian, C.C.: A review of automatic differentiation and its efficient implementation. WIREs Data Min. Knowl. Discov. (2019). 10.1002/widm.1305 DOI: 10.1002/widm.1305
Pearlmutter, B.A., Siskind, J.M.: Reverse-mode ad in a functional framework: lambda the ultimate backpropagator. ACM Trans. Program. Lang. Syst. (2008). 10.1145/1330017.1330018 DOI: 10.1145/1330017.1330018
Wei, R., Zheng, D., Rasi, M., Chrzaszcz, B.: Differentiable programming Manifesto. [Online]. Available: https://github.com/apple/swift/blob/master/docs/DifferentiableProgramming.md, Accessed from 02 Aug 2021
Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009). 10.1561/2200000006 DOI: 10.1561/2200000006
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016). 10.1109/CVPR.2016.90
Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. IEEE, Boston, MA, USA (2015). https://doi.org/10.1109/CVPR.2015.7298594
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. arXiv:1801.04381 [cs] (2019)
Süzen, A.A., Duman, B., Şen, B.: Benchmark analysis of jetson tx2, jetson nano and raspberry pi using deep-cnn. In: 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–5. IEEE (2020)
Qin, Z., Yu, F., Liu, C., Chen, X.: How convolutional neural network see the world - a survey of convolutional neural network visualization methods. arXiv:1804.11191 [cs] (2018)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626. (2017)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: International Conference on Machine Learning, pp. 3145–3153. PMLR (2017)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg (2006)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)
Kim, T., Yoon, J., Dia, O., Kim, S., Bengio, Y., Ahn, S.: Bayesian model-agnostic meta-learning. CoRR abs/1806.03836 (2018). arxiv:1806.03836
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. (2018)
Desell, T.: Large scale evolution of convolutional neural networks using volunteer computing. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’17, pp. 127–128. ACM, New York, NY, USA (2017)
Liang, J., Meyerson, E., Hodjat, B., Fink, D., Mutch, K., Miikkulainen, R.: Evolutionary neural AutoML for deep learning. arXiv e-prints arXiv:1902.06827 (2019)
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv e-prints arXiv:1611.01578 (2016)
Wenzel, F., Snoek, J., Tran, D., Jenatton, R.: Hyperparameter ensembles for robustness and uncertainty quantification (2021)
Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: train 1, get m for free. arXiv preprint arXiv:1704.00109 (2017)
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pp. 265-283. USENIX Association, USA (2016)
ONNX Operators.: [Online]. Available: https://github.com/onnx/onnx/blob/master/docs/Operators.md, Accessed from 02 Aug 2021
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost (2016)
Optimising Lua.: [Online]. Available: http://www.mcours.net/cours/pdf/info/Cours_pdf_Optimising_Lua.pdf, Accessed from 02 Aug 2021
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Mimalloc Memory Allocator.: [Online]. Available: https://github.com/microsoft/mimalloc, Accessed from 02 Aug 2021
Bovet, D.P., Estrin, G.: On static memory allocation in computer systems. IEEE Trans. Comput. 100(6), 492–503 (1970) DOI: 10.1109/T-C.1970.222966
Durner, D., Leis, V., Neumann, T.: On the impact of memory allocation on high-performance query processing. In: Proceedings of the 15th International Workshop on Data Management on New Hardware, DaMoN’19. Association for Computing Machinery, New York, NY, USA (2019). 10.1145/3329785.3329918
Lee, S., Johnson, T., Raman, E.: Feedback directed optimization of tcmalloc. In: Proceedings of the Workshop on Memory Systems Performance and Correctness, MSPC ’14. Association for Computing Machinery, New York, NY, USA (2014). 10.1145/2618128.2618131
Zhang, Y., Tang, H., Jia, K.: Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data (2018)
TensorFlow Operators.: [Online]. Available: https://gist.github.com/dustinvtran/cf34557fb9388da4c9442ae25c2373c9, Accessed from 02 Aug 2021
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). 10.1007/s11263-015-0816-y DOI: 10.1007/s11263-015-0816-y
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 8026–8037. Curran Associates Inc, New York (2019)
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., De Freitas, N.: Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 3981–3989 (2016)
Vanschoren, J.: Meta-learning: a survey. arXiv preprint arXiv:1810.03548 (2018)
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002). 10.1023/A:1019956318069 DOI: 10.1023/A:1019956318069
Zhang, S., Choromanska, A., LeCun, Y.: Deep learning with elastic averaging sgd (2015)
Priya, R., de Souza, B.F., Rossi, A.L.D., de Carvalho, A.C.P.L.F.: Predicting execution time of machine learning tasks using metalearning. In: 2011 World Congress on Information and Communication Technologies, pp. 1193–1198. (2011)
Lee, B.D.: Schopf: run-time prediction of parallel applications on shared environments. In: 2003 Proceedings IEEE International Conference on Cluster Computing, pp. 487–491. (2003)
Murshed, M.G.S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a ssurvey. arXiv:1908.00080 [cs, stat] (2021)
Blalock, D., Ortiz, J.J.G., Frankle, J., Guttag, J.: What is the state of neural network pruning? arXiv:2003.03033 [cs, stat] (2020)
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference (2021)
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2018)
Cho, J.H., Hariharan, B.: On the efficacy of knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), p. 9 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [cs, stat] (2015)