R. K. Srivastava, K. Greff, and J. Schmidhuber, "Training very deep networks, " in Advances in neural information processing systems, 2015, pp. 2377-2385.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
D. Balduzzi, M. Frean, L. Leary, J. Lewis, K. W.-D. Ma, and B. McWilliams, "The shattered gradients problem: If resnets are the answer, then what is the question" in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 342-350.
O. K. Oyedotun, A. E. R. Shabayek, D. Aouada, and B. Ottersten, "Training very deep networks via residual learning with stochastic input shortcut connections, " in International Conference on Neural Information Processing. Springer, 2017, pp. 23-33.
O. K. Oyedotun, A. El Rahman Shabayek, D. Aouada, and B. Ottersten, "Highway network block with gates constraints for training very deep networks, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1658-1667.
A. Veit, M. J. Wilber, and S. Belongie, "Residual networks behave like ensembles of relatively shallow networks, " in Advances in Neural Information Processing Systems, 2016, pp. 550-558.
O. K. Oyedotun, G. Demisse, A. El Rahman Shabayek, D. Aouada, and B. Ottersten, "Facial expression recognition via joint deep learning of rgb-depth map latent representations, " in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3161-3168.
G. Larsson, M. Maire, and G. Shakhnarovich, "Fractalnet: Ultra-deep neural networks without residuals, " in International Conference on Learning Representations, 2016.
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning, " in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. AAAI Press, 2017, pp. 4278-4284.
S. Zagoruyko and N. Komodakis, "Diracnets: Training very deep neural networks without skip-connections, " arXiv preprint arXiv: 1706. 00388, 2017.
L. Xiao, Y. Bahri, J. Sohl-Dickstein, S. Schoenholz, and J. Pennington, "Dynamical isometry and a mean field theory of cnns: How to train 10, 000-layer vanilla convolutional neural networks, " in International Conference on Machine Learning, 2018, pp. 5389-5398.
D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus), " in International Conference on Learning Representations, 2015.
X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks, " in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, " in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition, " in International Conference on Learning Representations, 2015.
A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models, " in ICML Workshop on Deep Learning for Audio, Speech and Language Processing. Citeseer, 2013.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting, " The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, " in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition, " in Proceedings of the IEEE, 86 (11): 2278-2324, 1998.
A. Krizhevsky, "Learning multiple layers of features from tiny images, " in Technical Report, 2009.
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning, " in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge, " International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211-252, 2015.
S. Zagoruyko and N. Komodakis, "Wide residual networks, " in British Machine Vision Conference, vol. 8, 2016, pp. 35-67.
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, "Deep networks with stochastic depth, " in European Conference on Computer Vision. Springer, 2016, pp. 646-661.
K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks, " in European conference on computer vision. Springer, 2016, pp. 630-645.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks. " in CVPR, vol. 1, no. 2, 2017, p. 3.
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks, " in Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. JMLR. org, 2013, pp. III-1319.
C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, "Deeplysupervised nets, " in Artificial Intelligence and Statistics, 2015, pp. 562-570.
J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, "Striving for simplicity: The all convolutional net, " in International Conference on Learning Representations Workshop, 2014.
M. Lin, Q. Chen, and S. Yan, "Network in network, " arXiv preprint arXiv: 1312. 4400, 2013.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks, " in Advances in neural information processing systems, 2012, pp. 1097-1105.
M. Simon, E. Rodner, and J. Denzler, "Imagenet pre-trained models with batch normalization, " arXiv preprint arXiv: 1612. 01452, 2016.
F. Chollet, "Xception: Deep learning with depthwise separable convolutions, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1251-1258.