multi-task learning, transfer learning, sparse regression, feature selection, adaptive penalisation
Résumé :
[en] Here we propose a simple two-stage procedure for sharing information between related high-dimensional prediction or classification problems. In both stages, we perform sparse regression separately for each problem. While this is done without prior information in the first stage, we use the coefficients from the first stage as prior information for the second stage. Specifically, we designed feature-specific and sign-specific adaptive weights to share information on feature selection, effect directions and effect sizes between different problems. The proposed approach is applicable to multi-task learning as well as transfer learning. It provides sparse models (i.e.,
with few non-zero coefficients for each problem) that are easy to interpret. We show by
simulation and application that it tends to select fewer features while achieving a similar
predictive performance as compared to available methods. An implementation is available in the R package ‘sparselink’ (https://github.com/rauschenberger/sparselink).
Centre de recherche :
Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group) LIH - Luxembourg Institute of Health
RAUSCHENBERGER, Armin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine > Biomedical Data Science > Team Enrico GLAAB ; Luxembourg Institute of Health > Department of Medical Informatics > Bioinformatics and Artificial Intelligence
NAZAROV, Petr ✱; Luxembourg Institute of Health > Department of Medical Informatics > Bioinformatics and Artificial Intelligence
GLAAB, Enrico ✱; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
✱ Ces auteurs ont contribué de façon équivalente à la publication.
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation
Baker KF, Skelton AJ, Lendrem DW et al. Predicting drug-free remission in rheumatoid arthritis: a prospective interventional cohort study. J Autoimmun 2019; 105: 102298. 10.1016/j.jaut.2019.06.009
Bergersen LC, Glad IK, Lyng H. Weighted lasso with data integration. Stat Appl Genet Mol Biol 2011; 10: 39. 10.2202/1544-6115.1703
Boyd M, Thodberg M, Vitezic M et al. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nat Commun 2018; 9: 1661. 10.1038/s41467-018-03766-z
Chung D, Keleş S. Sparse partial least squares classification for high dimensional data. Stat Appl Genet Mol Biol 2010; 9: 17. 10.2202/1544-6115.1492
Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33. 10.18637/jss.v033.i01
Goldberg G et al. 2018. White blood cells from rheumatoid arthritis patients and matched healthy donors. Available from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117769
Haberman Y, Karns R, Dexheimer PJ et al. Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and personalized mechanisms underlying disease severity and treatment response. Nat Commun 2019; 10: 38. 10.1038/s41467-018-07841-3
Moncrieffe H, Bennett MF, Tsoras M et al. Transcriptional profiles of jia patient blood with subsequent poor response to methotrexate. Rheumatology (Oxford) 2017; 56: 1542-51. 10.1093/rheumatology/kex206
Rauschenberger A, Glaab E. Predicting correlated outcomes from molecular data. Bioinformatics 2021; 37: 3889-95. 10.1093/bioinformatics/btab576
Rauschenberger A, Landoulsi Z, van de Wiel MA et al. Penalized regression with multiple sources of prior effects. Bioinformatics 2023; 39: btad680. 10.1093/bioinformatics/btad680
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of rna-seq data. Genome Biology, 2010; 11: R25. 10.1186/gb-2010-11-3-r25
Simon N, Friedman JH, Hastie T. 2013. A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv, preprint: not peer reviewed. 10.48550/arXiv.1311.6529
Tew GW, Hackney JA, Gibbons D et al. Association between response to etrolizumab and expression of integrin αe and granzyme a in colon biopsies of patients with ulcerative colitis. Gastroenterology 2016; 150: 477-87.e9. 10.1053/j.gastro.2015.10.041
Tian Y, Feng Y. Transfer learning under high-dimensional generalized linear models. J Am Stat Assoc 2023; 118: 2684-97. 10.1080/01621459.2022.2071278
Verstockt B, Verstockt S, Dehairs J et al. Low trem1 expression in whole blood predicts anti-tnf response in inflammatory bowel disease. EBio Med 2019; 40: 733-42. 10.1016/j.ebiom.2019.01.027
Verstockt B, Verstockt S, Veny M et al. Expression levels of 4 genes in colon tissue might be used to predict which patients will enter endoscopic remission after vedolizumab therapy for inflammatory bowel diseases. Clin Gastroenterol Hepatol 2020; 18: 1142-51.e10. 10.1016/j.cgh.2019.08.030
Wilks C, Zheng SC, Chen FY et al. recount3: summaries and queries for large-scale rna-seq expression and splicing. Genome Biol 2021; 22: 323. 10.1186/s13059-021-02533-6
Zou H. The adaptive lasso and its oracle properties. J Am Statist Assoc 2006; 101: 1418-29. 10.1198/016214506000000735
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Statist Soc B (Statist Methodol) 2005; 67: 301-20. 10.1111/j.1467-9868.2005.00503.x