[en] Feature selection algorithms aim to select a subset of informative features from a dataset to reduce the data dimensionality, consequently saving resource consumption and improving the model's performance and interpretability. In recent years, feature selection based on neural networks has become a new trend, demonstrating superiority over traditional feature selection methods. However, most existing methods use dense neural networks to detect informative features, which requires significant computational and memory overhead. In this paper, taking inspiration from the successful application of local sensitivity analysis on neural networks, we propose a novel resourceefficient supervised feature selection algorithm based on sparse multi-layer perceptron called "GradEnFS". By utilizing the gradient information of various sparse models from different training iterations, our method successfully detects the informative feature subset. We performed extensive experiments on nine classification datasets spanning various domains to evaluate the effectiveness of our method. The results demonstrate that our proposed approach outperforms the state-ofthe-art methods in terms of selecting informative features while saving resource consumption substantially. Moreover, we show that using a sparse neural network for feature selection not only alleviates resource consumption but also has a significant advantage over other methods when performing feature selection on noisy datasets.
Disciplines :
Computer science
Author, co-author :
Liu, Kaiting; Eindhoven University of Technology
Atashgahi, Zahra; University of Twente
Sokar, Ghada; Eindhoven University of Technology
Pechenizkiy, Mykola; Eindhoven University of Technology
MOCANU, Decebal Constantin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; Eindhoven University of Technology
External co-authors :
yes
Language :
English
Title :
Supervised Feature Selection via Ensemble Gradient Information from Sparse Neural Networks
Publication date :
02 May 2024
Event name :
AISTATS 2024: International Conference on Artificial Intelligence and Statistics
Event place :
Valencia, Spain
Event date :
from 2 May to 4 May 2024
Audience :
International
Main work title :
AISTATS 2024: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
A. Abid, M. F. Balin, and J. Zou. Concrete autoencoders for differentiable feature selection and reconstruction. arXiv preprint arXiv:1901.09346, 2019.
Z. Atashgahi, G. Sokar, T. van der Lee, E. Mocanu, D. C. Mocanu, R. Veldhuis, and M. Pechenizkiy. Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders. arXiv preprint arXiv:2012.00560, 2020.
Z. Atashgahi, J. Pieterse, S. Liu, D. C. Mocanu, R. Veldhuis, and M. Pechenizkiy. A brain-inspired algorithm for training highly sparse neural networks. Machine Learning, 111(12):4411–4452, 2022.
S. Bibikar, H. Vikalo, Z. Wang, and X. Chen. Federated dynamic sparse training: Computing less, communicating less, yet learning better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6080–6088, 2022.
G. Chandrashekar and F. Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.
S. Curci, D. C. Mocanu, and M. Pechenizkiyi. Truly sparse neural networks at scale. arXiv preprint arXiv:2102.01732, 2021.
X. Dai, H. Yin, and N. K. Jha. Nest: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Transactions on Computers, 68 (10):1487–1497, 2019.
P. de Jorge, A. Sanyal, H. S. Behl, P. H. Torr, G. Rogez, and P. K. Dokania. Progressive skeletonization: Trimming more fat from a network at initialization. arXiv preprint arXiv:2006.09081, 2020.
T. Dettmers and L. Zettlemoyer. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840, 2019.
S. Ding. Feature selection based f-score and aco algorithm in support vector machine. In 2009 Second International Symposium on Knowledge Acquisition and Modeling, volume 1, pages 19–23. IEEE, 2009.
N. El Aboudi and L. Benhlima. Review on wrapper feature selection approaches. In 2016 International Conference on Engineering MIS (ICEMIS), pages 1–5, 2016. doi: 10.1109/ICEMIS.2016.7745366.
U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pages 2943–2952. PMLR, 2020.
F. Fleuret. Fast binary feature selection with conditional mutual information. Journal of Machine learning research, 5(9), 2004.
J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
Q. Gu, Z. Li, and J. Han. Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725, 2012.
K. Han, Y. Wang, C. Zhang, C. Li, and C. Xu. Autoencoder inspired unsupervised feature selection. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 2941–2945. IEEE, 2018.
S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
A. Jakulin. Machine learning based on attribute interactions. PhD thesis, Univerza v Ljubljani, 2005.
S. A. Janowsky. Pruning versus clipping in neural networks. Physical Review A, 39(12):6600, 1989.
Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. Advances in neural information processing systems, 2, 1989.
N. Lee, T. Ajanthan, and P. H. Torr. Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
S. Lee, Y.-T. Park, B. J. d’Auriol, et al. A novel feature selection method based on normalized mutual information. Applied Intelligence, 37(1):100–120, 2012.
I. Lemhadri, F. Ruan, and R. Tibshirani. Lassonet: Neural networks with feature sparsity. In International Conference on Artificial Intelligence and Statistics, pages 10–18. PMLR, 2021.
J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu. Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6): 94, 2018.
S. Liu, D. C. Mocanu, A. R. R. Matavalam, Y. Pei, and M. Pechenizkiy. Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Computing and Applications, 33(7):2589–2604, 2021.
C. Louizos, M. Welling, and D. P. Kingma. Learning sparse neural networks through l 0 regularization. arXiv preprint arXiv:1712.01312, 2017.
Y. Lu, Y. Fan, J. Lv, and W. Stafford Noble. Deep-pink: reproducible feature selection in deep neural networks. Advances in neural information processing systems, 31, 2018.
S. Maldonado and R. Weber. A wrapper method for feature selection using support vector machines. Information Sciences, 179(13):2208–2217, 2009. ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2009.02.014. URL https://www.sciencedirect.com/science/article/pii/S0020025509000917. Special Section on High Order Fuzzy Sets.
D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):1–12, 2018.
D. C. Mocanu, E. Mocanu, T. Pinto, S. Curci, P. H. Nguyen, M. Gibescu, D. Ernst, and Z. A. Vale. Sparse training theory for scalable and efficient agents. arXiv preprint arXiv:2103.01636, 2021.
D. C. Mocanu et al. Network computations in artificial intelligence. Technische Universiteit Eindhoven, 2017.
P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.
P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz. Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
M. C. Mozer and P. Smolensky. Using relevance to reduce network size automatically. Connection Science, 1(1):3–16, 1989.
F. Nie, H. Huang, X. Cai, and C. Ding. Efficient and robust feature selection via joint 2, 1-norms minimization. Advances in neural information processing systems, 23, 2010.
J. Pizarroso, J. Portela, and A. Muñoz. Neuralsens: sensitivity analysis of neural networks. arXiv preprint arXiv:2002.11423, 2020.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
D. Singh, H. Climente-González, M. Petrovich, E. Kawakami, and M. Yamada. Fsnet: Feature selection network on high-dimensional biological data. arXiv preprint arXiv:2001.08322, 2020.
D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
G. Sokar, D. C. Mocanu, and M. Pechenizkiy. Spacenet: Make free space for continual learning. Neurocomputing, 439:1–11, 2021a.
G. Sokar, E. Mocanu, D. C. Mocanu, M. Pechenizkiy, and P. Stone. Dynamic sparse training for deep reinforcement learning. arXiv preprint arXiv:2106.04217, 2021b.
G. Sokar, Z. Atashgahi, M. Pechenizkiy, and D. C. Mocanu. Where to pay attention in sparse training for feature selection? In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=xWvI9z37Xd.
S. Srinivas, A. Subramanya, and R. Venkatesh Babu. Training sparse neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 138–145, 2017.
N. Strom. Scalable distributed dnn training using commodity gpu cloud computing. In Sixteenth annual conference of the international speech communication association, 2015.
H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in Neural Information Processing Systems, 33:6377–6389, 2020.
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (COLT’92), 1992. URL https://link.springer.com/chapter/10.1007/3-540-57256-3_5.
C. Wang, G. Zhang, and R. Grosse. Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020.
W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
Y. Yamada, O. Lindenbaum, S. Negahban, and Y. Kluger. Feature selection using stochastic gates. In International Conference on Machine Learning, pages 10648–10659. PMLR, 2020.
X. Zhou and H. Lin. Local Sensitivity Analysis, pages 616–616. Springer US, Boston, MA, 2008. ISBN 978-0-387-35973-1. doi: 10.1007/ 978-0-387-35973-1 703. URL https://doi.org/10.1007/978-0-387-35973-1_703.
H. Zhu and Y. Jin. Multi-objective evolutionary federated learning. IEEE transactions on neural networks and learning systems, 31(4):1310–1322, 2019.
M. Zhu and S. Gupta. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.