Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science.

[en] Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős-Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.

Disciplines :

Computer science

Author, co-author :

MOCANU, Decebal Constantin ; University of Luxembourg ; Department of Mathematics and Computer Science, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl ; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl

Mocanu, Elena ; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands ; Department of Mechanical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands

Stone, Peter; Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX, 78712-1757, USA

Nguyen, Phuong H; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands

Gibescu, Madeleine; Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands

Liotta, Antonio ; Data Science Centre, University of Derby, Lonsdale House, Quaker Way, Derby, DE1 3HD, UK

External co-authors :

yes

Language :

English

Title :

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science.

Publication date :

19 June 2018

Journal title :

Nature Communications

eISSN :

2041-1723

Publisher :

Nature Publishing Group, Basingstoke, Hampshire, England

Volume :

Issue :

Pages :

2383

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Computational Sciences

Additional URL :

https://www.nature.com/articles/s41467-018-04316-3.pdf

Available on ORBilu :

since 18 October 2023

Statistics

Number of views

28 (7 by Unilu)

Number of downloads

18 (0 by Unilu)

More statistics

Scopus citations^®

458

Scopus citations^®
without self-citations

415

OpenCitations

124

OpenAlex citations

269

WoS citations^™

350

See more details

publications

441

supporting

mentioning

359

contrasting

Smart Citations

441

359

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography

Baldi, P., Sadowski, P., Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015).
LeCun, Y., Bengio, Y., Hinton, G. Deep learning. Nature 521, 436-444 (2015).
Strogatz, S. H. Exploring complex networks. Nature 410, 268-276 (2001).
Pessoa, L. Understanding brain networks and brain organization. Phys. Life Rev. 11, 400-435 (2014).
Bullmore, E., Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186-198 (2009).
Barabási, A.-L., Albert, R. Emergence of scaling in random networks. Science 286, 509-512 (1999).
Watts, D. J., Strogatz, S. H. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).
Mocanu, D. C. On the synergy of network science and artificial intelligence. In Proc. 25th International Joint Conference on Artificial Intelligence(ed. Kambhampati, S.) 4020-4021 (AAAI Press, New York, 2016).
Mocanu, D. C., Mocanu, E., Nguyen, P. H., Gibescu, M., Liotta, A. A topological insight into restricted boltzmann machines. Mach. Learn. 104, 243-270 (2016).
Dieleman, S., Schrauwen, B. Accelerating sparse restricted boltzmann machine training using non-gaussianity measures. In Proc. Deep Learning and Unsupervised Feature Learning, Vol. 9 (eds Bengio Y., Bergstra J., Le Q.) http://hdl.handle.net/1854/LU-3118568 (Lake Tahoe, 2012).
Yosinski, J., Lipson, H. Visually debugging restricted boltzmann machine training with a 3d example. In Representation Learning Workshop, 29th International Conference on Machine Learning (Edinburgh, 2012).
Han, S., Pool, J., Tran, J., Dally, W. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R.) Vol. 28, 1135-1143 (MIT Press Cambridge, Montreal, 2015).
Mocanu, D. C. et al. No-reference video quality measurement: Added value of machine learning. J. Electron. Imaging 24, 061208 (2015).
Whiteson, S., Stone, P. Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877-917 (2006).
McDonnell, J. R., Waagen, D. Evolving neural network connectivity. In Proc. IEEE International Conference on Neural Networks, Vol. 2, 863-868 (IEEE, San Francisco, 1993).
Miikkulainen, R. et al. Evolving deep neural networks. Preprint at https://arxiv.org/abs/1703.00548 (2017).
Kowaliw, T., Bredeche, N., Chevallier, S., Doursat, R. Artificial neurogenesis: An introduction and selective review. In Growing Adaptive Machines: Combining Development and Learning in Artificial Neural Networks (Kowaliw, T., Bredeche, N., Doursat, R.) 1-60 (Springer, Berlin, Heidelberg, 2014).
Stanley, K. O., Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput. 10, 99-127 (2002).
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P. A neuroevolution approach to general atari game playing. IEEE Trans. Comput. Intell. AI 6, 355-366 (2014).
Miconi, T. Neural networks with differentiable structure. Preprint at https://arxiv.org/abs/1606.06216 (2016).
Salimans, T., Ho, J., Chen, X., Sidor, S., Openai, I. S. Evolution strategies as a scalable alternative to reinforcement learning. Preprint at https://arxiv.org/abs/1703.03864 (2017).
Such, F. P. et al. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Preprint at https://arxiv.org/abs/1712.06567 (2018).
Erdös, P., Rényi, A. On random graphs i. Publ. Math.-Debr. 6, 290-297 (1959).
Weigend, A. S., Rumelhart, D. E., Huberman, B. A. Generalization by weight-elimination with application to forecasting. In Proc. Advances in Neural Information Processing Systems, Vol. 3, 875-882 (Morgan-Kaufmann, Colorado, 1991).
Diering, G. H. et al. Homer1a drives homeostatic scaling-down of excitatory synapses during sleep. Science 355, 511-515 (2017).
de Vivo, L. et al. Ultrastructural evidence for synaptic scaling across the wake/sleep cycle. Science 355, 507-510 (2017).
Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition (eds Rumelhart, D. E., McClelland, J. L., CORPORATE PDP Research Group) 194-281 (MIT Press, Cambridge, 1986).
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771-1800 (2002).
Bengio, Y. Learning deep architectures for ai. Found. Trends Mach. Learn. 2, 1-127 (2009).
Osogami, T., Otsuka, M. Restricted boltzmann machines modeling human choice. Proc. Adv. Neural Inf. Process. Syst. 27, 73-81 (2014).
Hinton, G. A practical guide to training restricted boltzmann machines. In Neural Networks: Tricks of the Trade, Vol. 7700 of Lecture Notes in Computer Science (eds Montavon, G., Orr, G. B., Müller, K.-R.) 599-619 (Springer, Berlin Heidelberg, 2012).
Salakhutdinov, R., Murray, I. On the quantitative analysis of deep belief networks. In Proc. 25th International Conference on Machine Learning, 872-879 (ACM, Helsinki, 2008).
Everitt, B. The Cambridge Dictionary of Statistics (Cambridge University Press, Cambridge, UK; New York, 2002).
Nuzzo, R. Scientific method: Statistical errors. Nature 506, 150-152 (2014).
Clauset, A., Shalizi, C. R., Newman, M. E. J. Power-law distributions in empirical data. SIAM Rev. 51, 661-703 (2009).
Newman, M. E., Strogatz, S. H., Watts, D. J. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118 (2001).
Al-Rfou, R., et al. Theano: A Python framework for fast computation of mathematical expressions. Preprint at https://arxiv.org/abs/1605.02688 (2016).
Urban, G. et al. Do deep convolutional nets really need to be deep and convolutional? In Proc. 5th International Conference on Learning Representations (OpenReview.net, Toulon, 2016).
Lin, Z., Memisevic, R., Konda, K. How far can we go without convolution: Improving fully-connected networks. Preprint at https://arxiv.org/abs/1511.02580 (2015).
Hinton, G. E., Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504-507 (2006).
Hinton, G. E., Osindero, S., Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527-1554 (2006).
Jin, X. et al. Deep learning with s-shaped rectified linear activation units. In Proc. 30th AAAI Conference on Artificial Intelligence (eds Schuurmans, D., Wellman, M.) 1737-1743 (AAAI Press, Phoenix, 2016).
Nair, V., Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning (eds Fürnkranz, J., Joachims, T.) 807-814 (Omnipress, Haifa, 2010).
Danziger, S. A. et al. Functional census of mutation sequence spaces: The example of p53 cancer rescue mutants. IEEE ACM Trans. Comput. Biol. 3, 114-125 (2006).
Barabási, A.-L. Network Science (Cambridge University Press, Glasgow, 2016).
Albert, R., Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47-97 (2002).
Mocanu, D. C., Exarchakos, G., Liotta, A. Decentralized dynamic understanding of hidden relations in complex networks. Sci. Rep. 8, 1571 (2018).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484-489 (2016).
Lebedev, V., Lempitsky, V. Fast ConvNets using group-wise brain damage. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2554-2564 (IEEE, Las Vegas, 2016).
Changpinyo, S., Sandler, M., Zhmoginov, A. The power of sparsity in convolutional neural networks. Preprint at https://arxiv.org/abs/1702.06257 (2017).
Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag New York, Inc., Secaucus, 2006).
Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning. (Springer New York Inc., New York, NY, USA, 2001).
Sutton, R. S., Barto, A. G. Introduction to Reinforcement Learning. (MIT Press, Cambridge, MA, USA, 1998).
Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Spartan, Washington, 1962).
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278-2324 (1998).
Graves, A. et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855-868 (2009).
Salakhutdinov, R., Mnih, A., Hinton, G. Restricted boltzmann machines for collaborative filtering. In Proc. 24th International Conference on Machine Learning (ed. Ghahramani, Z.) 791-798 (ACM, Corvallis, 2007).
Gehler, P. V., Holub, A. D., Welling, M. The rate adapting poisson model for information retrieval and object recognition. In Proc. 23rd International Conference on Machine Learning (eds Cohen, W., Moore, A.) 337-344 (ACM, Pittsburgh, 2006).
Larochelle, H., Bengio, Y. Classification using discriminative restricted boltzmann machines. In Proc. 25th International Conference on Machine Learning (eds McCallum, A., Roweis, S.) 536-543 (ACM, Helsinki, 2008).
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signal 2, 303-314 (1989).
Rumelhart, D., Hintont, G., Williams, R. Learning representations by back-propagating errors. Nature 323, 533-536 (1986).
Bottou, L., Bousquet, O. The tradeoffs of large scale learning. In Proc. Advances in Neural Information Processing Systems Vol. 20 (eds Platt, J. C., Koller, D., Singer, Y., Roweis, S. T.) 161-168 (NIPS Foundation, Vancouver, 2008).
Del Genio, C. I., Gross, T., Bassler, K. E. All scale-free networks are sparse. Phys. Rev. Lett. 107, 178701 (2011).
Larochelle, H., Murray, I. The neural autoregressive distribution estimator. In Proc. 14th International Conference on Artificial Intelligence and Statistics(eds Gordon, G., Dunson, D., Dudík, M.) 29-37 (JMLR, Fort Lauderdale, 2011).
Marlin, B. M., Swersky, K., Chen, B., de Freitas, N. Inductive principles for restricted boltzmann machine learning. In Proc. 13th International Conference on Artificial Intelligence and Statistics (eds Teh, Y. W., Titterington, M.) 509-516 (JMLR, Sardinia, 2010).
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278-2324 (1998).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master's thesis (2009).
Xiao, H., Rasul, K., Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).