Martín A. et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.
Ashish Agarwal. Static automatic batching in TensorFlow. In ICML, pages 92-101, 2019.
Devansh Arpit, Víctor Campos, and Yoshua Bengio. How to Initialize your Network? Robust Initialization for WeightNorm & ResNets. In NeurIPS, pages 10900-10909, 2019.
Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, and Julian J. McAuley. ReZero is All You Need: Fast Convergence at Large Depth. CoRR, abs/2003.04887, 2020.
David Balduzzi, Marcus Frean, Lennox Leary, J. P. Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. The Shattered Gradients Problem: If resnets are the answer, then what is the question? In ICML, pages 342-350, 2017.
Aleksandar Botev, Hippolyt Ritter, and David Barber. Practical Gauss-Newton optimisation for deep learning. In ICML, pages 557-565. PMLR, 2017.
Yann N. Dauphin and Samuel S. Schoenholz. MetaInit: Initializing learning by learning to initialize. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'AlchéBuc, Emily B. Fox, and Roman Garnett, editors, NeurIPS, pages 12624-12636, 2019.
Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM Journal on Matrix Analysis and Applications, 9(4):543-560, 1988.
Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via Hessian eigenvalue density. In ICML, pages 2232-2241. PMLR, 2019.
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249-256, 2010.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
Boris Hanin and David Rolnick. How to Start Training: The Effect of Initialization and Architecture. In NeurIPS, pages 569-579, 2018.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV, pages 1026-1034, 2015. doi: 10.1109/ICCV.2015.123.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016. doi: 10.1109/CVPR.2016.90.
Dan Hendrycks and Kevin Gimpel. Adjusting for dropout variance in batch normalization and weight initialization. CoRR, abs/1607.02488, 2016.
Arthur Jacot, Franck Gabriel, and Clément Hongler. The asymptotic spectrum of the Hessian of DNN throughout training. In ICLR. OpenReview.net, 2020.
Vladislav Kargin et al. Products of random matrices: Dimension and growth in norm. The Annals of Applied Probability, 20(3):890-906, 2010.
Tom Kocmi and Ondrej Bojar. An Exploration of Word Embedding Initialization in Deep-Learning Tasks. In ICON, pages 56-64, 2017.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, pages 1106-1114, 2012.
James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In ICML, pages 2408-2417. PMLR, 2015.
Levent Sagun, Utku Evci, V. Ugur Güney, Yann N. Dauphin, and Léon Bottou. Empirical analysis of the Hessian of over-parametrized neural networks. CoRR, abs/1706.04454, 2017.
John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci., 2:e55, 2016. doi: 10.7717/peerj-cs.55.
Jack W Silverstein. The spectral radii and norms of large dimensional non-central random matrices. Stochastic Models, 10(3):525-532, 1994.
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
Mingxing Tan and Quoc Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 6105-6114, 2019.
Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation. In NeurIPS, pages 3839-3848, 2018.
Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR, abs/1708.07747, 2017.
Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, and Jeffrey Pennington. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks. In ICML, pages 5389-5398, 2018.
Bing Xu, Ruitong Huang, and Mu Li. Revise Saturated Activation Functions. CoRR, abs/1602.05980, 2016.