deep learning; neural network; image classification
Abstract :
[en] In this paper, we propose to reformulate the learning of the highway network block to realize both early optimization and improved generalization of very deep networks while preserving the network depth. Gate constraints are duly employed to improve optimization, latent representations and parameterization usage in order to efficiently learn hierarchical feature transformations which are crucial for the success of any deep network. One of the earliest
very deep models with over 30 layers that was successfully trained relied on highway network blocks. Although, highway blocks suffice for alleviating optimization problem via improved information flow, we show for the first time that further in training such highway blocks may result into learning mostly untransformed features and therefore a reduction in the effective depth of the model; this could negatively impact model generalization performance. Using the
proposed approach, 15-layer and 20-layer models are successfully trained with one gate and a 32-layer model using three gates. This leads to a drastic reduction of model parameters as compared to the original highway network. Extensive experiments on CIFAR-10, CIFAR-100, Fashion-MNIST and USPS datasets are performed to validate the effectiveness of the proposed approach. Particularly, we outperform the original highway network and many state-ofthe-
art results. To the best our knowledge, on the Fashion-MNIST and USPS datasets, the achieved results are the best reported in literature.
Disciplines :
Computer science
Author, co-author :
OYEDOTUN, Oyebade ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
SHABAYEK, Abd El Rahman ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
no
Language :
English
Title :
Highway Network Block with Gates Constraints for Training Very Deep Networks
Publication date :
19 June 2018
Event name :
2018 IEEE International Conference on Computer Vision and Pattern Recognition Workshop
Event date :
June 18-22, 2018
Audience :
International
Main work title :
2018 IEEE International Conference on Computer Vision and Pattern Recognition Workshop, June 18-22, 2018
G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, "Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation, " in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1742-1750.
Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang, "Semantic image segmentation via deep parsing network, " in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1377-1385.
L. Hertel, E. Barth, T. Käster, and T. Martinetz, "Deep convolutional neural networks as generic feature extractors, " in Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015, pp. 1-4.
J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid, "Convolutional kernel networks, " in Advances in Neural Information Processing Systems, 2014, pp. 2627-2635.
O. Delalleau and Y. Bengio, "Shallow vs. deep sum-product networks, " in Advances in Neural Information Processing Systems, 2011, pp. 666-674.
H. Mhaskar, Q. Liao, and T. Poggio, "Learning functions: when is deep better than shallow, " arXiv preprint arXiv:1603. 00988, 2016.
M. Bianchini and F. Scarselli, "On the complexity of neural network classifiers: A comparison between shallow and deep architectures, " IEEE transactions on neural networks and learning systems, vol. 25, no. 8, pp. 1553-1565, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, "Deep networks with stochastic depth, " in European Conference on Computer Vision. Springer, 2016, pp. 646-661.
R. K. Srivastava, K. Greff, and J. Schmidhuber, "Training very deep networks, " in Advances in neural information processing systems, 2015, pp. 2377-2385.
O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation, " in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234-241.
A. de Brebisson and G. Montana, "Deep neural networks for anatomical brain segmentation, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 20-28.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks, " in Advances in neural information processing systems, 2012, pp. 1097-1105.
M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks, " in European conference on computer vision. Springer, 2014, pp. 818-833.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, " in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
F. Shen, R. Gan, and G. Zeng, "Weighted residuals for very deep networks, " in Systems and Informatics (ICSAI), 2016 3rd International Conference on. IEEE, 2016, pp. 936-941.
O. K. Oyedotun, A. E. R. Shabayek, D. Aouada, and B. Ottersten, "Training very deep networks via residual learning with stochastic input shortcut connections, " in Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, vol. 10635. Springer, 2017, p. 23.
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
M. Lin, Q. Chen, and S. Yan, "Network in network, " arXiv preprint arXiv:1312. 4400, 2013.
M. Liang and X. Hu, "Recurrent convolutional neural network for object recognition, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3367-3375.
C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, "Deeply-supervised nets, " in Artificial Intelligence and Statistics, 2015, pp. 562-570.
J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, "Striving for simplicity: The all convolutional net, " arXiv preprint arXiv:1412. 6806, 2014.
S. Zagoruyko and N. Komodakis, "Wide residual networks, " arXiv preprint arXiv:1605. 07146, 2016.
G. Larsson, M. Maire, and G. Shakhnarovich, "Fractalnet: Ultra-deep neural networks without residuals, " arXiv preprint arXiv:1605. 07648, 2016.
S. Zagoruyko and N. Komodakis, "Diracnets: Training very deep neural networks without skip-connections, " arXiv preprint arXiv:1706. 00388, 2017.
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks, " arXiv preprint arXiv:1302. 4389, 2013.
G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, "Densely connected convolutional networks, " in Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, 2017, p. 3.
Y. Zhu and S. Newsam, "Densenet for dense flow, " arXiv preprint arXiv:1707. 06316, 2017.
B. Graham, "Fractional max-pooling, " arXiv preprint arXiv:1412. 6071, 2014.
Y. Sun, B. Xue, and M. Zhang, "Evolving deep convolutional neural networks for image classification, " arXiv preprint arXiv:1710. 10741, 2017.
P. Simard, Y. LeCun, and J. S. Denker, "Efficient pattern recognition using a new transformation distance, " in Advances in neural information processing systems, 1993, pp. 50-58.
J. Bruna and S. Mallat, "Invariant scattering convolution networks, " IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1872-1886, 2013.
B. Schölkopf, P. Simard, A. J. Smola, and V. Vapnik, "Prior knowledge in support vector kernels, " in Advances in neural information processing systems, 1998, pp. 640-646.
P. Simard, Y. LeCun, J. Denker, and B. Victorri, "Transformation invariance in pattern recognitiontangent distance and tangent propagation, " Neural networks: tricks of the trade, pp. 549-550, 1998.
B. Zhang, A. Perina, V. Murino, and A. Del Bue, "Sparse representation classification with manifold constraints transfer, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4557-4565.
L. Liu, L. Shao, and X. Li, "Evolutionary compact embedding for large-scale image classification, " Information Sciences, vol. 316, pp. 567-581, 2015.
S. Maji, A. C. Berg, and J. Malik, "Efficient classification for additive kernel svms, " IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 66-77, 2013.