Machine Learning; Deep Learning; Sparse Neural Networks; Dynamic Sparse Training
Abstract :
[en] Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős-Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.
Disciplines :
Computer science
Author, co-author :
MOCANU, Decebal Constantin ; University of Luxembourg ; Department of Mathematics and Computer Science, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl ; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl
Mocanu, Elena ; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands ; Department of Mechanical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands
Stone, Peter; Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX, 78712-1757, USA
Nguyen, Phuong H; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands
Gibescu, Madeleine; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands
Liotta, Antonio ; Data Science Centre, University of Derby, Lonsdale House, Quaker Way, Derby, DE1 3HD, UK
External co-authors :
yes
Language :
English
Title :
Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science.
Publication date :
19 June 2018
Journal title :
Nature Communications
eISSN :
2041-1723
Publisher :
Nature Publishing Group, Basingstoke, Hampshire, England
Baldi, P., Sadowski, P., Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015).
LeCun, Y., Bengio, Y., Hinton, G. Deep learning. Nature 521, 436-444 (2015).
Strogatz, S. H. Exploring complex networks. Nature 410, 268-276 (2001).
Pessoa, L. Understanding brain networks and brain organization. Phys. Life Rev. 11, 400-435 (2014).
Bullmore, E., Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186-198 (2009).
Barabási, A.-L., Albert, R. Emergence of scaling in random networks. Science 286, 509-512 (1999).
Watts, D. J., Strogatz, S. H. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).
Mocanu, D. C. On the synergy of network science and artificial intelligence. In Proc. 25th International Joint Conference on Artificial Intelligence(ed. Kambhampati, S.) 4020-4021 (AAAI Press, New York, 2016).
Mocanu, D. C., Mocanu, E., Nguyen, P. H., Gibescu, M., Liotta, A. A topological insight into restricted boltzmann machines. Mach. Learn. 104, 243-270 (2016).
Dieleman, S., Schrauwen, B. Accelerating sparse restricted boltzmann machine training using non-gaussianity measures. In Proc. Deep Learning and Unsupervised Feature Learning, Vol. 9 (eds Bengio Y., Bergstra J., Le Q.) http://hdl.handle.net/1854/LU-3118568 (Lake Tahoe, 2012).
Yosinski, J., Lipson, H. Visually debugging restricted boltzmann machine training with a 3d example. In Representation Learning Workshop, 29th International Conference on Machine Learning (Edinburgh, 2012).
Han, S., Pool, J., Tran, J., Dally, W. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R.) Vol. 28, 1135-1143 (MIT Press Cambridge, Montreal, 2015).
Mocanu, D. C. et al. No-reference video quality measurement: Added value of machine learning. J. Electron. Imaging 24, 061208 (2015).
Whiteson, S., Stone, P. Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877-917 (2006).
McDonnell, J. R., Waagen, D. Evolving neural network connectivity. In Proc. IEEE International Conference on Neural Networks, Vol. 2, 863-868 (IEEE, San Francisco, 1993).
Miikkulainen, R. et al. Evolving deep neural networks. Preprint at https://arxiv.org/abs/1703.00548 (2017).
Kowaliw, T., Bredeche, N., Chevallier, S., Doursat, R. Artificial neurogenesis: An introduction and selective review. In Growing Adaptive Machines: Combining Development and Learning in Artificial Neural Networks (Kowaliw, T., Bredeche, N., Doursat, R.) 1-60 (Springer, Berlin, Heidelberg, 2014).
Stanley, K. O., Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput. 10, 99-127 (2002).
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P. A neuroevolution approach to general atari game playing. IEEE Trans. Comput. Intell. AI 6, 355-366 (2014).
Miconi, T. Neural networks with differentiable structure. Preprint at https://arxiv.org/abs/1606.06216 (2016).
Salimans, T., Ho, J., Chen, X., Sidor, S., Openai, I. S. Evolution strategies as a scalable alternative to reinforcement learning. Preprint at https://arxiv.org/abs/1703.03864 (2017).
Such, F. P. et al. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Preprint at https://arxiv.org/abs/1712.06567 (2018).
Erdös, P., Rényi, A. On random graphs i. Publ. Math.-Debr. 6, 290-297 (1959).
Weigend, A. S., Rumelhart, D. E., Huberman, B. A. Generalization by weight-elimination with application to forecasting. In Proc. Advances in Neural Information Processing Systems, Vol. 3, 875-882 (Morgan-Kaufmann, Colorado, 1991).
Diering, G. H. et al. Homer1a drives homeostatic scaling-down of excitatory synapses during sleep. Science 355, 511-515 (2017).
de Vivo, L. et al. Ultrastructural evidence for synaptic scaling across the wake/sleep cycle. Science 355, 507-510 (2017).
Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition (eds Rumelhart, D. E., McClelland, J. L., CORPORATE PDP Research Group) 194-281 (MIT Press, Cambridge, 1986).
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771-1800 (2002).
Bengio, Y. Learning deep architectures for ai. Found. Trends Mach. Learn. 2, 1-127 (2009).
Osogami, T., Otsuka, M. Restricted boltzmann machines modeling human choice. Proc. Adv. Neural Inf. Process. Syst. 27, 73-81 (2014).
Hinton, G. A practical guide to training restricted boltzmann machines. In Neural Networks: Tricks of the Trade, Vol. 7700 of Lecture Notes in Computer Science (eds Montavon, G., Orr, G. B., Müller, K.-R.) 599-619 (Springer, Berlin Heidelberg, 2012).
Salakhutdinov, R., Murray, I. On the quantitative analysis of deep belief networks. In Proc. 25th International Conference on Machine Learning, 872-879 (ACM, Helsinki, 2008).
Everitt, B. The Cambridge Dictionary of Statistics (Cambridge University Press, Cambridge, UK; New York, 2002).
Nuzzo, R. Scientific method: Statistical errors. Nature 506, 150-152 (2014).
Clauset, A., Shalizi, C. R., Newman, M. E. J. Power-law distributions in empirical data. SIAM Rev. 51, 661-703 (2009).
Newman, M. E., Strogatz, S. H., Watts, D. J. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118 (2001).
Al-Rfou, R., et al. Theano: A Python framework for fast computation of mathematical expressions. Preprint at https://arxiv.org/abs/1605.02688 (2016).
Urban, G. et al. Do deep convolutional nets really need to be deep and convolutional? In Proc. 5th International Conference on Learning Representations (OpenReview.net, Toulon, 2016).
Lin, Z., Memisevic, R., Konda, K. How far can we go without convolution: Improving fully-connected networks. Preprint at https://arxiv.org/abs/1511.02580 (2015).
Hinton, G. E., Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504-507 (2006).
Hinton, G. E., Osindero, S., Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527-1554 (2006).
Jin, X. et al. Deep learning with s-shaped rectified linear activation units. In Proc. 30th AAAI Conference on Artificial Intelligence (eds Schuurmans, D., Wellman, M.) 1737-1743 (AAAI Press, Phoenix, 2016).
Nair, V., Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning (eds Fürnkranz, J., Joachims, T.) 807-814 (Omnipress, Haifa, 2010).
Danziger, S. A. et al. Functional census of mutation sequence spaces: The example of p53 cancer rescue mutants. IEEE ACM Trans. Comput. Biol. 3, 114-125 (2006).
Barabási, A.-L. Network Science (Cambridge University Press, Glasgow, 2016).
Albert, R., Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47-97 (2002).
Mocanu, D. C., Exarchakos, G., Liotta, A. Decentralized dynamic understanding of hidden relations in complex networks. Sci. Rep. 8, 1571 (2018).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484-489 (2016).
Lebedev, V., Lempitsky, V. Fast ConvNets using group-wise brain damage. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2554-2564 (IEEE, Las Vegas, 2016).
Changpinyo, S., Sandler, M., Zhmoginov, A. The power of sparsity in convolutional neural networks. Preprint at https://arxiv.org/abs/1702.06257 (2017).
Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag New York, Inc., Secaucus, 2006).
Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning. (Springer New York Inc., New York, NY, USA, 2001).
Sutton, R. S., Barto, A. G. Introduction to Reinforcement Learning. (MIT Press, Cambridge, MA, USA, 1998).
Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Spartan, Washington, 1962).
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278-2324 (1998).
Graves, A. et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855-868 (2009).
Salakhutdinov, R., Mnih, A., Hinton, G. Restricted boltzmann machines for collaborative filtering. In Proc. 24th International Conference on Machine Learning (ed. Ghahramani, Z.) 791-798 (ACM, Corvallis, 2007).
Gehler, P. V., Holub, A. D., Welling, M. The rate adapting poisson model for information retrieval and object recognition. In Proc. 23rd International Conference on Machine Learning (eds Cohen, W., Moore, A.) 337-344 (ACM, Pittsburgh, 2006).
Larochelle, H., Bengio, Y. Classification using discriminative restricted boltzmann machines. In Proc. 25th International Conference on Machine Learning (eds McCallum, A., Roweis, S.) 536-543 (ACM, Helsinki, 2008).
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signal 2, 303-314 (1989).
Rumelhart, D., Hintont, G., Williams, R. Learning representations by back-propagating errors. Nature 323, 533-536 (1986).
Bottou, L., Bousquet, O. The tradeoffs of large scale learning. In Proc. Advances in Neural Information Processing Systems Vol. 20 (eds Platt, J. C., Koller, D., Singer, Y., Roweis, S. T.) 161-168 (NIPS Foundation, Vancouver, 2008).
Del Genio, C. I., Gross, T., Bassler, K. E. All scale-free networks are sparse. Phys. Rev. Lett. 107, 178701 (2011).
Larochelle, H., Murray, I. The neural autoregressive distribution estimator. In Proc. 14th International Conference on Artificial Intelligence and Statistics(eds Gordon, G., Dunson, D., Dudík, M.) 29-37 (JMLR, Fort Lauderdale, 2011).
Marlin, B. M., Swersky, K., Chen, B., de Freitas, N. Inductive principles for restricted boltzmann machine learning. In Proc. 13th International Conference on Artificial Intelligence and Statistics (eds Teh, Y. W., Titterington, M.) 509-516 (JMLR, Sardinia, 2010).
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278-2324 (1998).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master's thesis (2009).
Xiao, H., Rasul, K., Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).