learning efficiency; Automatic speech recognition; Computational demands; Data performance; Data pruning; Dynamic data; Model training; Signal Processing; Language and Linguistics; Human-Computer Interaction; Machine Learning
Abstract :
[en] The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data.However, this trend has made model training prohibitively costly and imposed computational demands.While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results.To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data.Furthermore, we introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets, going beyond the conventional pruning of entire time sequences.Our intensive experiments show that DDP-ASR can save up to 1.6× training time with negligible performance loss.
Disciplines :
Computer science
Author, co-author :
Xiao, Qiao; Eindhoven University of Technology, Netherlands
Ma, Pingchuan; Meta AI, United Kingdom ; Imperial College London, United Kingdom
Fernandez-Lopez, Adriana; Meta AI, United Kingdom
WU, Boqian ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; University of Twente, Netherlands
Yin, Lu; University of Surrey, United Kingdom
Petridis, Stavros; Meta AI, United Kingdom ; Imperial College London, United Kingdom
Pechenizkiy, Mykola; Eindhoven University of Technology, Netherlands
Pantic, Maja; Meta AI, United Kingdom ; Imperial College London, United Kingdom
MOCANU, Decebal Constantin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Liu, Shiwei; University of Oxford, United Kingdom
External co-authors :
yes
Language :
English
Title :
Dynamic Data Pruning for Automatic Speech Recognition
Publication date :
01 September 2024
Event name :
Interspeech 2024
Event place :
Kos Island, Greece
Event date :
01-09-2024 => 05-09-2024
Audience :
International
Journal title :
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
J.Kaplan, S.McCandlish, et al., “Scaling laws for neural language models,” CoRR, vol.abs/2001.08361, 2020.
H.Touvron, M.Cord, et al., “Training data-efficient image transformers & distillation through attention,” in ICML, 2021, pp.10 347-10 357.
R.Bommasani, D.A.Hudson, et al., “On the opportunities and risks of foundation models,” CoRR, vol.abs/2108.07258, 2021.
A.Katharopoulos and F.Fleuret, “Not all samples are created equal: Deep learning with importance sampling,” in ICML, 2018, pp.2525-2534.
M.Toneva, A.Sordoni, et al., “An empirical study of example forgetting during deep neural network learning,” in ICLR, 2018.
C Coleman, C Yeh, et al., “Selection via proxy: Efficient data selection for deep learning,” in ICLR, 2020.
R.J.N.Baldock, H.Maennel, and B.Neyshabur, “Deep learning through the lens of example difficulty,” in NIPS, 2021, pp.10 876-10 889.
B.Mirzasoleiman, K.Cao, and J.Leskovec, “Coresets for robust training of deep neural networks against noisy labels,” in NIPS, 2020.
S.Mindermann, J.M.Brauner, et al., “Prioritized training on points that are learnable, worth learning, and not yet learnt,” in ICML, 2022, pp.15 630-15 649.
M.Paul, S.Ganguli, and G.K.Dziugaite, “Deep learning on a data diet: Finding important examples early in training,” in NIPS, vol.34, 2021, pp.20 596-20 607.
Z.Qin, K.Wang, et al., “Infobatch: Lossless training speed up by unbiased dynamic data pruning,” in ICLR, 2024.
A.Abbas, K.Tirumala, D.Simig, S.Ganguli, and A.S.Morcos, “Semdedup: Data-efficient learning at web-scale through semantic deduplication,” vol.abs/2303.09540, 2023.
B.Mirzasoleiman, J.Bilmes, and J.Leskovec, “Coresets for data-efficient training of machine learning models,” in ICML, 2020, pp.6950-6960.
M.Xia, S.Malladi, S.Gururangan, S.Arora, and D.Chen, “LESS: selecting influential data for targeted instruction tuning,” CoRR, vol.abs/2402.04333, 2024.
M.Marion, A.Üstün, et al., “When less is more: Investigating data pruning for pretraining llms at scale,” CoRR, vol.abs/2309.04564, 2023.
S.Gunasekar, Y.Zhang, et al., “Textbooks are all you need,” CoRR, vol.abs/2306.11644, 2023.
B.Bergsma, M.Brzezinska, O.V.Yazyev, and M.Cernak, “Cluster-based pruning techniques for audio data,” arXiv preprint arXiv:2309.11922, 2023.
R.S.Raju, K.Daruwalla, and M.H.Lipasti, “Accelerating deep learning with dynamic data pruning,” CoRR, vol.abs/2111.12621, 2021.
M.He, S.Yang, T.Huang, and B.Zhao, “Large-scale dataset pruning with dynamic uncertainty,” CoRR, vol.abs/2306.05175, 2023.
Y.Bengio, J.Louradour, R.Collobert, and J.Weston, “Curriculum learning,” in ICML, vol.382, 2009, pp.41-48.
M.Cilimkovic, “Neural networks and back propagation algorithm,” Institute of Technology Blanchardstown, Blanchardstown Road North Dublin, vol.15, no.1, 2015.
X.Wu, E.Dyer, and B.Neyshabur, “When do curricula work?” In ICLR, 2020.
P.Soviany, R.T.Ionescu, P.Rota, and N.Sebe, “Curriculum learning: A survey,” Int.J.Comput.Vision, vol.130, no.6, pp.1526-1565, 2022.
Y.Li, H.Fan, R.Hu, C.Feichtenhofer, and K.He, “Scaling language-image pre-training via masking,” in CVPR, 2023, pp.23 390-23 400.
V.Panayotov, G.Chen, D.Povey, and S.Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in ICASSP, 2015, pp.5206-5210.
T.Afouras, J.S.Chung, and A.Zisserman, “LRS3-TED: a large-scale dataset for visual speech recognition,” CoRR, vol.abs/1809.00496, 2018.
P.Ma, S.Petridis, and M.Pantic, “End-to-end audiovisual speech recognition with conformers,” in ICASSP, 2021, pp.7613-7617.
P.Ma, S.Petridis, and M.Pantic, “Visual Speech Recognition for Multiple Languages in the Wild,” Nature Machine Intelligence, pp.930-939, 2022.
P.Ma, A.Haliassos, et al., “Auto-avsr: Audio-visual speech recognition with automatic labels,” in ICASSP, 2023, pp.1-5.
I.Loshchilov and F.Hutter, “Decoupled Weight Decay Regularization,” in ICLR, 2019.
D.S.Park, W.Chan, et al., “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Interspeech, 2019, pp.2613-2617.