[en] Sparse linear regression methods generally have a free hyperparameter which controls the amount of sparsity, and is subject to a bias-variance tradeoff. This article considers the use of Aggregated hold-out to aggregate over values of this hyperparameter, in the context of linear regression with the Huber loss function. Aggregated hold-out (Agghoo) is a procedure which averages estimators selected by hold-out (cross-validation with a single split). In the theoretical part of the article, it is proved that Agghoo satisfies a non-asymptotic oracle inequality when it is applied to sparse estimators which are parametrized by their zero-norm. In particular, this includes a variant of the Lasso introduced by Zou, Hastié and Tibshirani \cite{Zou_Has_Tib:2007}. Simulations are used to compare Agghoo with cross-validation. They show that Agghoo performs better than CV when the intrinsic dimension is high and when there are confounders correlated with the predictive covariates.
Disciplines :
Mathématiques
Auteur, co-auteur :
MAILLARD, Guillaume ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Aggregated hold-out for sparse linear regression with a robust loss function
Date de publication/diffusion :
2022
Titre du périodique :
Electronic Journal of Statistics
eISSN :
1935-7524
Maison d'édition :
Institute of Mathematical Statistics, Beachwood, Etats-Unis - Ohio
Volume/Tome :
16
Fascicule/Saison :
1
Pagination :
935-997
Peer reviewed :
Peer reviewed vérifié par ORBi
Projet européen :
H2020 - 811017 - SanDAL - ERA Chair in Mathematical Statistics and Data Science for the University of Luxembourg
Organisme subsidiant :
European Union Horizon 2020 CE - Commission Européenne
Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys 4 40–79. MR2602303
Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares re-gression. Ann. Statist. 39 2766–2794. MR2906886
Bach, F. (2008). Bolasso: Model Consistent Lasso Estimation through the Bootstrap. Proceedings of the 25th international conference on Machine learning 33-40.
Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 329–375. MR1653272
Chatterjee, S. and Jafarov, J. (2015). Prediction error of cross-validated Lasso. arXiv e-prints arXiv:1502.06291.
Chen, X., Wang, Z. J. and McKeown, M. J. (2010). Asymptotic Analysis of Robust LASSOs in the Presence of Noise With Large Variance. IEEE Transactions on Information Theory 56 5131-5149. MR2808669
Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021). On cross-validated Lasso in high dimensions. The Annals of Statistics 49 1300 – 1317. MR4298865
Chinot, G., Lecué, G. and Lerasle, M. (2020). Robust statistical learning with Lipschitz and convex loss functions. Probability Theory and Related Fields 176 897-940. MR4087486
Descloux, P. and Sardy, S. (2018). Model selection with lasso-zero: adding straw to the haystack to better find needles. arXiv e-prints arXiv:1805.05133. MR4313458
DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approxi-mation. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg. MR1261635
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. MR2060166
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software 33 1-22. 20808728[pmid].
Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988. MR2108039
Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer New York.
Hassanieh, H. (2018). The Sparse Fourier Transform: Theory and Prac-tice. Association for Computing Machinery and Morgan & Claypool.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. Springer New York. MR2722294
Homrighausen, D. and McDonald, D. (2013). Risk consistency of cross-validation with Lasso-type procedures. Statistica Sinica 27. MR3699692
Hoyos-Idrobo, A., Schwartz, Y., Varoquaux, G. and Thirion, B. (2015). Improving Sparse Recovery on Structured Images with Bagged Clustering. In 2015 International Workshop on Pattern Recognition in Neu-roImaging. IEEE.
Huber, P. J. (1964). Robust Estimation of a Location Parameter. Ann. Math. Statist. 35 73–101. MR0161415
Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics. John Wiley & Sons, Inc. MR2488795
Koltchinskii, V., Tsybakov, A. and Lounici, K. (2010). Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Annals of Statistics-ANN STATIST 39. MR2906869
Lambert-Lacroix, S. and Zwald, L. (2011). Robust regression through the Huber’s criterion and adaptive lasso penalty. Electron. J. Statist. 5 1015–1053. MR2836768
Lecué, G. and Mitchell, C. (2012). Oracle inequalities for cross-validation type procedures. Electron. J. Statist. 6 1803–1837. MR2988465
Maillard, G. (2020). Hold-out and Aggregated hold-out, PhD the-sis Thèse de doctorat dirigée par Arlot, Sylvain et Lerasle, Matthieu Mathématiques appliquées université Paris-Saclay 2020. MR4253713
Maillard, G., Arlot, S. and Lerasle, M. (2021). Aggregated Hold-Out. Journal of Machine Learning Research 22 1-55. MR4253713
Massart, P. (2007). Concentration Inequalities and Model Selection. Lec-ture Notes in Mathematics 1896. Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard. MR2319879
Meinshausen, N. and Bühlmann, P. (2010). Stability Selection. Journal of the Royal Statistical Society Series B 72 417-473. MR2758523
Mendelson, S. (2014). Learning without Concentration. J. ACM 62 21:1-21:25. MR3367000
Mendelson, S. (2018). Learning without concentration for general loss functions. Probability Theory and Related Fields 171 459-502. MR3800838
Miolane, L. and Montanari, A. (2018). The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning. arXiv e-prints arXiv:1811.01212. MR4319252
Mourtada, J. (2019). Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices. arXiv preprint arXiv:1912.10754.
Navarro, F. and Saumard, A. (2017). Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases. ESAIM: Probability and Statistics 21. MR3743921
Rigollet, P. and Tsybakov, A. (2011). Exponential Screening and op-timal rates of sparse estimation. Ann. Statist. 39 731–771. MR2816337
Rosset, S. and Zhu, J. (2007). Piecewise Linear Regularized Solution Paths. The Annals of Statistics 35 1012–1030. MR2341696
Stone, C. J. (1982). Optimal Global Rates of Convergence for Nonpara-metric Regression. Ann. Statist. 10 1040–1053. MR0673642
Tibshirani, R. (1996a). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Methodological 58 267– 288. MR1379242
Tibshirani, R. (1996b). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 267– 288. MR1379242
Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198–1232. MR2985948
van de Geer, S. and Lederer, J. (2011). The Lasso, correlated de-sign, and improved oracle inequalities. arXiv e-prints arXiv:1107.0189. MR3202642
van der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator. Statist. Decisions 24 373–395. MR2305113 MR2305113
Vapnik, V. N. (1999). An Overview of Statistical Learning Theory. Transactions on Neural Networks 10 988–999. MR1880032
Varoquaux, G., Raamana, P. R., Engemann, D. A., Hoyos-Idrobo, A., Schwartz, Y. and Thirion, B. (2017). Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. NeuroImage 145 166–179.
Wang, H. and Leng, C. (2007). Unified LASSO Estimation by Least Squares Approximation. Journal of the American Statistical Association 102 1039–1048. MR2411663
Wang, H., Li, G. and Jiang, G. (2007). Robust Regression Shrinkage and Consistent Variable Selection through the LAD-Lasso. Journal of Business and Economic Statistics 25 347–355. MR2380753
Wang, S., Nan, B., Rosset, S. and Zhu, J. (2011). Random lasso. Ann. Appl. Stat. 5 468–485. MR2810406
Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273. MR1962506
Xu, H., Caramanis, C. and Mannor, S. (2011). Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem. IEEE transactions on pattern analysis and machine intelligence 34.
Yi, C. and Huang, J. (2017). Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression. Journal of Computational and Graphical Statistics 26 547-557. MR3698665
Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the Lasso. Annals of Statistics 35 2173-2192. MR2363967