2024 • In Endriss, Ulle (Ed.) ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
[en] Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist regarding the choice of the DST algorithm for network training, the choice of metric for ranking features/neurons, and the comparative performance of these methods across diverse datasets when compared to dense networks. This paper addresses these gaps by presenting a comprehensive systematic analysis of feature selection with sparse neural networks. Moreover, we introduce a novel metric considering sparse neural network characteristics, which is designed to quantify feature importance within the context of SNNs. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than 50% memory and 55% FLOPs reduction compared to the dense networks, while outperforming them in terms of the quality of the selected features.
Disciplines :
Computer science
Author, co-author :
Atashgahi, Zahra; Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Netherlands
Liu, Tennison; Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom
Pechenizkiy, Mykola; Department of Mathematics and Computer Science, Eindhoven University of Technology, Netherlands
Veldhuis, Raymond; Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Netherlands
MOCANU, Decebal Constantin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
van der Schaar, Mihaela; Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom
External co-authors :
yes
Language :
English
Title :
Unveiling the Power of Sparse Neural Networks for Feature Selection
Publication date :
16 October 2024
Event name :
ECAI 2024: 27th European Conference on Artificial Intelligence
Event place :
Santiago de Compostela, Esp
Event date :
19-10-2024 => 24-10-2024
Audience :
International
Main work title :
ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
M. Ancona, E. Ceolini, C. Öztireli, and M. Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Sy21R9JAW.
S. Ö. Arik and T. Pfister. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021.
Z. Atashgahi, J. Pieterse, S. Liu, D. C. Mocanu, R. Veldhuis, and M. Pechenizkiy. A brain-inspired algorithm for training highly sparse neural networks. Machine Learning, 111(12):4411–4452, 2022.
Z. Atashgahi, G. Sokar, T. van der Lee, E. Mocanu, D. C. Mocanu, R. Veldhuis, and M. Pechenizkiy. Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders. Machine Learning, pages 1–38, 2022.
Z. Atashgahi, X. Zhang, N. Kichler, S. Liu, L. Yin, M. Pechenizkiy, R. Veldhuis, and D. C. Mocanu. Supervised feature selection with neuron evolution in sparse neural networks. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
Z. Atashgahi, L. Tennison, M. Pechenizkiy, R. Veldhuis, D. C. Mocanu, and M. van der Schaar. Unveiling the power of sparse neural networks for feature selection. arXiv preprint arXiv:2408.04583, 2024.
M. F. Balın, A. Abid, and J. Zou. Concrete autoencoders: Differentiable feature selection and reconstruction. In International conference on machine learning, pages 444–453. PMLR, 2019.
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on neural networks, 5(4):537–550, 1994.
V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
B. Chandra and R. K. Sharma. Exploring autoencoders for unsupervised feature selection. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–6. IEEE, 2015.
G. Chandrashekar and F. Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.
G. Doquet and M. Sebag. Agnostic feature selection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 343–358. Springer, 2019.
R. Espinosa, F. Jiménez, and J. Palma. Embedded feature selection in lstm networks with multi-objective evolutionary ensemble learning for time series forecasting. arXiv preprint arXiv:2312.17517, 2023.
U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pages 2943–2952. PMLR, 2020.
Z. Gharibshah and X. Zhu. Local contrastive feature learning for tabular data. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 3963–3967, 2022.
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46(1):389–422, 2002.
K. Han, Y. Wang, C. Zhang, C. Li, and C. Xu. Autoencoder inspired unsupervised feature selection. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2941–2945. IEEE, 2018.
X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. In Advances in neural information processing systems, pages 507–514, 2006.
X. He, K. Zhao, and X. Chu. Automl: A survey of the state-of-the-art. Knowledge-Based Systems, 212:106622, 2021.
T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. The Journal of Machine Learning Research, 22(1): 10882–11005, 2021.
S. Hooker. The hardware lottery. Communications of the ACM, 64(12): 58–65, 2021.
F. Imrie, A. Norcliffe, P. Liò, and M. van der Schaar. Composite feature selection using deep ensembles. Advances in Neural Information Processing Systems, 35:36142–36160, 2022.
S. Jayakumar, R. Pascanu, J. Rae, S. Osindero, and E. Elsen. Topkast: Top-k always sparse training. Advances in Neural Information Processing Systems, 33:20744–20754, 2020.
A. Jeffares, T. Liu, J. Crabbé, F. Imrie, and M. van der Schaar. TANGOS: Regularizing tabular neural networks through gradient orthogonalization and specialization. In International Conference on Learning Representations, 2023.
K. Jia and M. Rinard. Effective neural network l_0 regularization with binmask. arXiv preprint arXiv:2304.11237, 2023.
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273–324, 1997.
I. Lemhadri, F. Ruan, L. Abraham, and R. Tibshirani. Lassonet: A neural network with feature sparsity. The Journal of Machine Learning Research, 22(1):5633–5661, 2021.
J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu. Feature selection: A data perspective. ACM computing surveys (CSUR), 50(6):1–45, 2017.
H. Liu, R. Setiono, et al. A probabilistic approach to feature selection-a filter solution. In ICML, volume 96, pages 319–327. Citeseer, 1996.
Y. Lu, Y. Fan, J. Lv, and W. S. Noble. Deeppink: reproducible feature selection in deep neural networks. In Advances in Neural Information Processing Systems, pages 8676–8686, 2018.
D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):1–12, 2018.
D. C. Mocanu, E. Mocanu, T. Pinto, S. Curci, P. H. Nguyen, M. Gibescu, D. Ernst, and Z. A. Vale. Sparse training theory for scalable and efficient agents. arXiv preprint arXiv:2103.01636, 2021.
S.-M. Moosavi-Dezfooli, A. Fawzi, J. Uesato, and P. Frossard. Robustness via curvature regularization, and vice versa. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9078–9086, 2019.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
H. Peng, F. Long, and C. Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8): 1226–1238, 2005.
H. Peng, G. Fang, and P. Li. Copula for instance-wise feature selection and rank. In Uncertainty in Artificial Intelligence, pages 1651–1661. PMLR, 2023.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on international conference on machine learning, pages 833–840, 2011.
R. Setiono and H. Liu. Neural-network feature selector. IEEE transactions on neural networks, 8(3):654–662, 1997.
D. Singh, H. Climente-González, M. Petrovich, E. Kawakami, and M. Yamada. Fsnet: Feature selection network on high-dimensional biological data. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2023.
G. Sokar, E. Mocanu, D. C. Mocanu, M. Pechenizkiy, and P. Stone. Dynamic sparse training for deep reinforcement learning. arXiv preprint arXiv:2106.04217, 2021.
G. Sokar, Z. Atashgahi, M. Pechenizkiy, and D. C. Mocanu. Where to pay attention in sparse training for feature selection? Advances in Neural Information Processing Systems, 35:1627–1642, 2022.
G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Gold-stein. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342, 2021.
E. Strubell, A. Ganesh, and A. McCallum. Energy and policy considerations for modern deep learning research. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13693–13696, 2020.
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
M. Wojtas and K. Chen. Feature importance ranking for deep learning. Advances in Neural Information Processing Systems, 33:5105–5114, 2020.
Y. Yamada, O. Lindenbaum, S. Negahban, and Y. Kluger. Feature selection using stochastic gates. In International Conference on Machine Learning, pages 10648–10659. PMLR, 2020.
G. Yuan, X. Ma, W. Niu, Z. Li, Z. Kong, N. Liu, Y. Gong, Z. Zhan, C. He, Q. Jin, et al. Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems, 34:20838–20850, 2021.
R. Zhang, F. Nie, X. Li, and X. Wei. Feature selection with multi-view data: A survey. Information Fusion, 50:158–167, 2019.