Image classification; deep networks; residual learning; neural networks
Résumé :
[en] Deep neural networks inherently have large representational power for approximating complex target functions. However, models based on rectified linear units can suffer reduction in representation capacity due to dead units. Moreover, approximating very deep networks trained with dropout at test time can be more inexact due to the several layers of non-linearities. To address the aforementioned problems, we propose to learn the activation functions of hidden units for very deep networks via maxout. However, maxout units increase the model parameters, and therefore model may suffer from overfitting; we alleviate this problem by employing elastic net regularization. In this paper, we propose very deep networks with maxout units and elastic net regularization and show that the features learned are quite linearly separable. We perform extensive experiments and reach state-of-the-art results on the USPS and MNIST datasets. Particularly, we reach an error rate of 2.19% on the USPS dataset, surpassing the human performance error rate of 2.5% and all previously reported results, including those that employed training data augmentation. On the MNIST dataset, we reach an error rate of 0.36% which is competitive with the state-of-the-art results.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM
Disciplines :
Sciences informatiques
Auteur, co-auteur :
OYEDOTUN, Oyebade ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
SHABAYEK, Abd El Rahman ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
IMPROVING THE CAPACITY OF VERY DEEP NETWORKS WITH MAXOUT UNITS
Date de publication/diffusion :
21 février 2018
Nom de la manifestation :
2018 IEEE International Conference on Acoustics, Speech and Signal Processing
Organisateur de la manifestation :
IEEE
Lieu de la manifestation :
Calgary, Alberta, Canada
Date de la manifestation :
15–20 April 2018
Manifestation à portée :
International
Titre de l'ouvrage principal :
2018 IEEE International Conference on Acoustics, Speech and Signal Processing
Peer reviewed :
Peer reviewed
Organisme subsidiant :
This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Bjorn Ottersten
G. M. Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, P. Vincent et al., "Unsupervised and transfer learning challenge: A deep learning approach, " in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 97-110.
O. Delalleau and Y. Bengio, "Shallow vs. deep sum-product networks, " in Advances in Neural Information Processing Systems, 2011, pp. 666-674.
Y. Sun, X. Wang, and X. Tang, "Hybrid deep learning for face verification, " in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1489-1496.
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, "Regularization of neural networks using dropconnect, " in Proceedings of the 30th international conference on machine learning (ICML-13), 2013, pp. 1058-1066.
M. Telgarsky, "Benefits of depth in neural networks, " arXiv preprint arXiv:1602. 04485, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
R. K. Srivastava, K. Greff, and J. Schmidhuber, "Training very deep networks, " in Advances in neural information processing systems, 2015, pp. 2377-2385.
O. K. Oyedotun, A. E. R. Shabayek, D. Aouada, and B. Ottersten, "Training very deep networks via residual learning with stochastic input shortcut connections, " in International Conference on Neural Information Processing. Springer, 2017, pp. 23-33.
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, "Deep networks with stochastic depth, " in European Conference on Computer Vision. Springer, 2016, pp. 646-661.
H. Zhao, F. Liu, L. Li, and C. Luo, "A novel softplus linear unit for deep convolutional neural networks, " Applied Intelligence, pp. 1-14, 2017.
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks, " arXiv preprint arXiv:1302. 4389, 2013.
P. Simard, Y. LeCun, J. Denker, and B. Victorri, "Transformation invariance in pattern recognitiontangent distance and tangent propagation, " Neural networks: tricks of the trade, pp. 549-550, 1998.
P. Simard, Y. LeCun, and J. S. Denker, "Efficient pattern recognition using a new transformation distance, " in Advances in neural information processing systems, 1993, pp. 50-58.
M. Lin, Q. Chen, and S. Yan, "Network in network, " arXiv preprint arXiv:1312. 4400, 2013.
M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks, " arXiv preprint arXiv:1301. 3557, 2013.
K. Greff, R. K. Srivastava, and J. Schmidhuber, "Highway and residual networks learn unrolled iterative estimation, " arXiv preprint arXiv:1612. 07771, 2016.
N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, "On large-batch training for deep learning: Generalization gap and sharp minima, " arXiv preprint arXiv:1609. 04836, 2016.
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting. " Journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
H. Zou and T. Hastie, "Regularization and variable selection via the elastic net, " Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301-320, 2005.
D. Bollegala, "Dynamic feature scaling for online learning of binary classifiers, " Knowledge-Based Systems, vol. 129, pp. 97-105, 2017.
B. Schölkopf, P. Simard, A. J. Smola, and V. Vapnik, "Prior knowledge in support vector kernels, " in Advances in neural information processing systems, 1998, pp. 640-646.
B. Zhang, A. Perina, V. Murino, and A. Del Bue, "Sparse representation classification with manifold constraints transfer, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4557-4565.
L. Liu, L. Shao, and X. Li, "Evolutionary compact embedding for large-scale image classification, " Information Sciences, vol. 316, pp. 567-581, 2015.
S. Maji, A. C. Berg, and J. Malik, "Efficient classification for additive kernel svms, " IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 66-77, 2013.
D. Keysers, J. Dahmen, T. Theiner, and H. Ney, "Experiments with an extended tangent distance, " in Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2. IEEE, 2000, pp. 38-42.
Z. Yang, M. Moczulski, M. Denil, N. de Freitas, A. Smola, L. Song, and Z. Wang, "Deep fried convnets, " in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1476-1483.
T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, "Pcanet: A simple deep learning baseline for image classification?" IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5017-5032, 2015.
C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, "Deeplysupervised nets, " in Artificial Intelligence and Statistics, 2015, pp. 562-570.
J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng, "On optimization methods for deep learning, " in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 265-272.