IMPROVING THE CAPACITY OF VERY DEEP NETWORKS WITH MAXOUT UNITS

OYEDOTUN, Oyebade; SHABAYEK, Abd El Rahman; AOUADA, Djamila; OTTERSTEN, Björn

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

IMPROVING THE CAPACITY OF VERY DEEP NETWORKS WITH MAXOUT UNITS

OYEDOTUN, Oyebade; SHABAYEK, Abd El Rahman; AOUADA, Djamila et al.

2018 • In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing

Peer reviewed

Permalink
https://hdl.handle.net/10993/34968

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Oyebade_ICASSP2018_CR_V01.pdf

Author preprint (622.42 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Image classification; deep networks; residual learning; neural networks

Abstract :

[en] Deep neural networks inherently have large representational power for approximating complex target functions. However, models based on rectified linear units can suffer reduction in representation capacity due to dead units. Moreover, approximating very deep networks trained with dropout at test time can be more inexact due to the several layers of non-linearities. To address the aforementioned problems, we propose to learn the activation functions of hidden units for very deep networks via maxout. However, maxout units increase the model parameters, and therefore model may suffer from overfitting; we alleviate this problem by employing elastic net regularization. In this paper, we propose very deep networks with maxout units and elastic net regularization and show that the features learned are quite linearly separable. We perform extensive experiments and reach state-of-the-art results on the USPS and MNIST datasets. Particularly, we reach an error rate of 2.19% on the USPS dataset, surpassing the human performance error rate of 2.5% and all previously reported results, including those that employed training data augmentation. On the MNIST dataset, we reach an error rate of 0.36% which is competitive with the state-of-the-art results.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM

Disciplines :

Computer science

Author, co-author :

OYEDOTUN, Oyebade ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

SHABAYEK, Abd El Rahman ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

External co-authors :

yes

Language :

English

Title :

IMPROVING THE CAPACITY OF VERY DEEP NETWORKS WITH MAXOUT UNITS

Publication date :

21 February 2018

Event name :

2018 IEEE International Conference on Acoustics, Speech and Signal Processing

Event organizer :

IEEE

Event place :

Calgary, Alberta, Canada

Event date :

15–20 April 2018

Audience :

International

Main work title :

2018 IEEE International Conference on Acoustics, Speech and Signal Processing

Peer reviewed :

Peer reviewed

Funders :

This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Bjorn Ottersten

Available on ORBilu :

since 21 February 2018

Statistics

Number of views

281 (24 by Unilu)

Number of downloads

451 (35 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

G. M. Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, P. Vincent et al., "Unsupervised and transfer learning challenge: A deep learning approach, " in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 97-110.
O. Delalleau and Y. Bengio, "Shallow vs. deep sum-product networks, " in Advances in Neural Information Processing Systems, 2011, pp. 666-674.
Y. Sun, X. Wang, and X. Tang, "Hybrid deep learning for face verification, " in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1489-1496.
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, "Regularization of neural networks using dropconnect, " in Proceedings of the 30th international conference on machine learning (ICML-13), 2013, pp. 1058-1066.
M. Telgarsky, "Benefits of depth in neural networks, " arXiv preprint arXiv:1602. 04485, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
R. K. Srivastava, K. Greff, and J. Schmidhuber, "Training very deep networks, " in Advances in neural information processing systems, 2015, pp. 2377-2385.
O. K. Oyedotun, A. E. R. Shabayek, D. Aouada, and B. Ottersten, "Training very deep networks via residual learning with stochastic input shortcut connections, " in International Conference on Neural Information Processing. Springer, 2017, pp. 23-33.
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, "Deep networks with stochastic depth, " in European Conference on Computer Vision. Springer, 2016, pp. 646-661.
H. Zhao, F. Liu, L. Li, and C. Luo, "A novel softplus linear unit for deep convolutional neural networks, " Applied Intelligence, pp. 1-14, 2017.
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks, " arXiv preprint arXiv:1302. 4389, 2013.
P. Simard, Y. LeCun, J. Denker, and B. Victorri, "Transformation invariance in pattern recognitiontangent distance and tangent propagation, " Neural networks: tricks of the trade, pp. 549-550, 1998.
P. Simard, Y. LeCun, and J. S. Denker, "Efficient pattern recognition using a new transformation distance, " in Advances in neural information processing systems, 1993, pp. 50-58.
M. Lin, Q. Chen, and S. Yan, "Network in network, " arXiv preprint arXiv:1312. 4400, 2013.
M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks, " arXiv preprint arXiv:1301. 3557, 2013.
K. Greff, R. K. Srivastava, and J. Schmidhuber, "Highway and residual networks learn unrolled iterative estimation, " arXiv preprint arXiv:1612. 07771, 2016.
N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, "On large-batch training for deep learning: Generalization gap and sharp minima, " arXiv preprint arXiv:1609. 04836, 2016.
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting. " Journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
H. Zou and T. Hastie, "Regularization and variable selection via the elastic net, " Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301-320, 2005.
D. Bollegala, "Dynamic feature scaling for online learning of binary classifiers, " Knowledge-Based Systems, vol. 129, pp. 97-105, 2017.
B. Schölkopf, P. Simard, A. J. Smola, and V. Vapnik, "Prior knowledge in support vector kernels, " in Advances in neural information processing systems, 1998, pp. 640-646.
B. Zhang, A. Perina, V. Murino, and A. Del Bue, "Sparse representation classification with manifold constraints transfer, " in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4557-4565.
L. Liu, L. Shao, and X. Li, "Evolutionary compact embedding for large-scale image classification, " Information Sciences, vol. 316, pp. 567-581, 2015.
S. Maji, A. C. Berg, and J. Malik, "Efficient classification for additive kernel svms, " IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 66-77, 2013.
D. Keysers, J. Dahmen, T. Theiner, and H. Ney, "Experiments with an extended tangent distance, " in Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2. IEEE, 2000, pp. 38-42.
Z. Yang, M. Moczulski, M. Denil, N. de Freitas, A. Smola, L. Song, and Z. Wang, "Deep fried convnets, " in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1476-1483.
T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, "Pcanet: A simple deep learning baseline for image classification?" IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5017-5032, 2015.
C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, "Deeplysupervised nets, " in Artificial Intelligence and Statistics, 2015, pp. 562-570.
J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng, "On optimization methods for deep learning, " in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 265-272.