Efficient Hessian-based DNN Optimization via Chain-Rule Approximation

TEMPERONI, Alessandro; DALLE LUCCA TOSI, Mauro; THEOBALD, Martin

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

TEMPERONI, Alessandro; DALLE LUCCA TOSI, Mauro; THEOBALD, Martin

2023 • In Proceedings of the 6th Joint International Conference on Data Science Management of Data (10th ACM IKDD CODS and 28th COMAD)

Peer reviewed

Permalink
https://hdl.handle.net/10993/54533

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Efficient Hessian-based DNN Optimization.pdf

Author preprint (886.4 kB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or the dataset changes and are not affordable by most stakeholders in industry and research. Many attempts have been made to justify and understand the source of the use-case specificity that distinguishes DL problems. To this date, second-order optimization methods have been partially shown to be effective in some cases but have not been sufficiently investigated in the context of learning and optimization. In this work, we present a chain rule for the efficient approximation of the Hessian matrix (i.e., the second-order derivatives) of the weights across the layers of a Deep Neural Network (DNN). We show the application of our approach for weight optimization during DNN training, as we believe that this is a step that particularly suffers from the enormous variety of the optimizers provided by state-of-the-art libraries such as Keras and PyTorch. We demonstrate—both theoretically and empirically—the improved accuracy of our approximation technique and that the Hessian is a useful diagnostic tool which helps to more rigorously optimize training. Our preliminary experiments prove the efficiency as well as the improved convergence of our approach which both are crucial aspects for DNN training.

Disciplines :

Computer science

Author, co-author :

TEMPERONI, Alessandro ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

DALLE LUCCA TOSI, Mauro ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

THEOBALD, Martin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

Language :

English

Title :

Efficient Hessian-based DNN Optimization via Chain-Rule Approximation

Publication date :

2023

Event name :

6th Joint International Conference on Data Science & Management of Data

Event date :

from 04-01-2023 to 07-01-2023

Audience :

International

Main work title :

Proceedings of the 6th Joint International Conference on Data Science Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages :

297--298

Peer reviewed :

Peer reviewed

FnR Project :

FNR12252781 - Data-driven Computational Modelling And Applications, 2017 (01/09/2018-28/02/2025) - Andreas Zilian

Available on ORBilu :

since 06 March 2023

Statistics

Number of views

67 (17 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for Large-Scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265-283.
Prateek Jain and Purushottam Kar. 2017. Non-convex Optimization for Machine Learning. Foundations and Trends in Machine Learning (2017).
Andrej Karpathy. 2015. char-rnn. https://github.com/karpathy/char-rnn.
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).
Herbert E. Robbins. 1951. A Stochastic Approximation Method. Annals of Mathematical Statistics (1951).
Maciej Skorski, Alessandro Temperoni, and Martin Theobald. 2021. Revisiting Weight Initialization of Deep Neural Networks. In Proceedings of Machine Learning Research, Neil Lawrence (Ed.), Vol. 157. 16 pages.