transfer learning; co-data; prior information; ridge regression
Résumé :
[en] In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package `transreg' (https://github.com/lcsb-bds/transreg).
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group)
Disciplines :
Mathématiques Sciences de la santé humaine: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
RAUSCHENBERGER, Armin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
LANDOULSI, Zied ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
van de Wiel, Mark A. ✱; Amsterdam University Medical Centers > Epidemiology and Data Science ; University of Cambridge > Medical Research Council Biostatistics Unit
GLAAB, Enrico ✱; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
✱ Ces auteurs ont contribué de façon équivalente à la publication.
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Penalized regression with multiple sources of prior effects
Date de publication/diffusion :
10 novembre 2023
Titre du périodique :
Bioinformatics
ISSN :
1367-4803
eISSN :
1367-4811
Maison d'édition :
Oxford University Press, Oxford, Royaume-Uni
Volume/Tome :
39
Fascicule/Saison :
12
Pagination :
btad680
Peer reviewed :
Peer reviewed vérifié par ORBi
Focus Area :
Systems Biomedicine
Projet européen :
H2020 - 779282 - ERA PerMed - ERA-Net Cofund in Personalised Medicine
Projet FnR :
FNR14599012 - Validating Digital Biomarkers For Better Personalized Treatment Of Parkinson'S Disease, 2020 (01/05/2021-30/04/2024) - Enrico Glaab
Intitulé du projet de recherche :
DIGIPD > Validating Digital Biomarkers For Better Personalized Treatment Of Parkinson’s Disease > 01/05/2021 > 30/04/2024 > 2020 U-AGR-7276 - NCER/23/16695277/CLINNOVA (01/01/2023 - 31/12/2025) - GLAAB Enrico
Organisme subsidiant :
FNR - Fonds National de la Recherche CE - Commission Européenne Union Européenne
Bergersen LC, Glad IK, Lyng H. Weighted lasso with data integration. tat Appl Genet Mol Biol 2011;10:39. https://doi.org/10.2202/ 1544-6115.1703.
Blauwendraat C, Faghri F, Pihlstrom L et al.; International Parkinson's Disease Genomics Consortium (IPDGC), COURAGE-PD Consortium. NeuroChip, an updated version of the NeuroX genotyping platform to rapidly screen for variants associated with neurological diseases. Neurobiol Aging 2017;57:247.e9-13. https://doi.org/10.1016/j.neurobiolaging.2017.05.009.
Das S, Forer L, Schönherr S et al. Next-generation genotype imputation service and methods. Nat Genet 2016;48:1284-7. https://doi.org/10.1038/ng.3656.
Dhruba SR. Application of advanced machine learning based approaches in cancer precision medicine. Texas Tech University (TTU), Ph.D. Thesis, 2021. (R Package DMTL).
Erez O, Romero R, Maymon E et al. The prediction of late-onset preeclampsia: results from a longitudinal proteomics study. PLoS One 2017;12:e0181468. https://doi.org/10.1371/journal.pone.0181468.
Farkas SA, Milutin-Gasperov N, Grce M et al. Genome-wide DNA methylation assay reveals novel candidate biomarker genes in cervical cancer. Epigenetics 2013;8:1213-25. https://doi.org/10.4161/ epi.26346.
Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Soft 2010;33:1-22.https://doi.org/10.18637/jss.v033.i01. (R package glmnet).
Gamarnik D, Zadik I. High-dimensional regression with binary coefficients. Estimating squared error and a phase transition. In: Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, July 7-10, 2017, Amsterdam, The Netherlands. Maastricht, The Netherlands: ML Research Press, 2017, 948-53.
Hipp G, Vaillant M, Diederich NJ et al. The Luxembourg Parkinson's study: a comprehensive approach for stratification and early diagnosis. Front Aging Neurosci 2018;10:326. https://doi.org/10.3389/ fnagi.2018.00326.
Jiang Y, He Y, Zhang H. Variable selection with prior information for generalized linear models via the prior LASSO method. J Am Stat Assoc 2016;111:355-76. https://doi.org/10.1080/01621459.2015.1008363. (R script pLASSO).
Kawaguchi ES, Li S, Weaver GM et al. Hierarchical ridge regression for incorporating prior information in genomic studies. J Data Sci 2022;20:34-50. https://doi.org/10.6339/21-JDS1030. (R package xrnet).
Liu Y, Kang Y, Xing C et al. A secure federated transfer learning framework. IEEE Intell Syst 2020;35:70-82. https://doi.org/10.1109/ MIS.2020.2988525.
NallsMA, Blauwendraat C, Vallerga CL et al.; International Parkinson's Disease Genomics Consortium. Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies. Lancet Neurol 2019;18:1091-102.https://doi.org/10.1016/S1474-4422(19)30320-5.
Pavelka L, Rauschenberger A, Landoulsi Z et al.; NCER-PD Consortium. Age at onset as stratifier in idiopathic Parkinson's disease-effect of ageing and polygenic risk score on clinical phenotypes. NPJ Parkinsons Dis 2022;8:102. https://doi.org/10.1038/ s41531-022-00342-7.
Rauschenberger A, Glaab E. Predicting correlated outcomes from molecular data. Bioinformatics 2021;37:3889-95. https://doi.org/10.1093/bioinformatics/btab576. (R package joinet).
Rauschenberger A, Glaab E, van de Wiel MA. Predictive and interpretable models via the stacked elastic net. Bioinformatics 2021;37: 2012-6. https://doi.org/10.1093/bioinformatics/btaa535. (R package starnet).
Tay JK, Aghaeepour N, Hastie T et al. Feature-weighted elastic net: using "features of features" for better prediction. Stat Sin 2023;33:259-79.https://doi.org/10.5705/ss.202020.0226. (Rpackage fwelnet).
Te Beest DE, Mes SW, Wilting SM et al. Improved high-dimensional prediction with random forests by the use of co-data. BMC Bioinformatics 2017;18:584. https://doi.org/10.1186/s12859-017-1993-1. (R package CoRF).
Tian Y, Feng Y. Transfer learning under high-dimensional generalized linear models. J Am Stat Assoc 2022 (in press). https://doi.org/10.1080/01621459.2022.2071278. (R package glmtrans).
van de Wiel MA, Lien TG, Verlaat W et al. Better prediction by use of co-data: adaptive group-regularized ridge regression. Stat Med 2016;35:368-81. https://doi.org/10.1002/sim.6732. (R package GRridge).
van de Wiel MA, van Nee MM, Rauschenberger A. Fast crossvalidation for multi-penalty high-dimensional ridge regression. J Comput Graph Stat 2021;30:835-47. https://doi.org/10.1080/ 10618600.2021.1904962. (R package multiridge).
van Nee MM, Wessels LF, van de Wiel MA. Flexible co-data learning for high-dimensional prediction. Stat Med 2021;40:5910-25.https://doi.org/10.1002/sim.9162. (R package ecpc).