Reference : Fast cross-validation for multi-penalty ridge regression
Scientific journals : Article
Physical, chemical, mathematical & earth Sciences : Mathematics
http://hdl.handle.net/10993/45319
Fast cross-validation for multi-penalty ridge regression
English
van de Wiel, Mark A. [Amsterdam University Medical Centers > Department of Epidemiology and Data Science > > ; University of Cambridge > MRC Biostatistics Unit]
van Nee, Mirrelijn M. [Amsterdam University Medical Centers > Epidemiology and Data Science]
Rauschenberger, Armin mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science >]
In press
Journal of Computational and Graphical Statistics
Yes
[en] Prediction based on multiple high-dimensional data types needs to account for the potentially strong differences in predictive signal. Ridge regression is a simple, yet versatile and interpretable model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, in particular in dense settings. Moreover, it allows using a specific penalty per data type to account for differences between those. Then, the largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional loop for fitting the model by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in the low-dimensional sample space. We show that our approach is several orders of magnitude faster than more naive ones. We developed a very flexible framework that includes prediction of several types of response, allows for unpenalized covariates, can optimize several performance criteria and implements repeated CV. Moreover, extensions to pair data types and to allow a preferential order of data types are included and illustrated on several cancer genomics survival prediction problems. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners.
http://hdl.handle.net/10993/45319
10.1080/10618600.2021.1904962

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
2005.09301.pdfAuthor preprint1.72 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.