Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation

[en] Here we propose a simple two-stage procedure for sharing information between related high-dimensional prediction or classification problems. In both stages, we perform sparse regression separately for each problem. While this is done without prior information in the first stage, we use the coefficients from the first stage as prior information for the second stage. Specifically, we designed feature-specific and sign-specific adaptive weights to share information on feature selection, effect directions and effect sizes between different problems. The proposed approach is applicable to multi-task learning as well as transfer learning. It provides sparse models (i.e., with few non-zero coefficients for each problem) that are easy to interpret. We show by simulation and application that it tends to select fewer features while achieving a similar predictive performance as compared to available methods. An implementation is available in the R package ‘sparselink’ (https://github.com/rauschenberger/sparselink).

Research center :

Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group)
LIH - Luxembourg Institute of Health

Disciplines :

Computer science
Mathematics
Biochemistry, biophysics & molecular biology

Author, co-author :

RAUSCHENBERGER, Armin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine > Biomedical Data Science > Team Enrico GLAAB ; Luxembourg Institute of Health > Department of Medical Informatics > Bioinformatics and Artificial Intelligence

NAZAROV, Petr ^✱; Luxembourg Institute of Health > Department of Medical Informatics > Bioinformatics and Artificial Intelligence

GLAAB, Enrico ^✱; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science

^✱ These authors have contributed equally to this work.

External co-authors :

Language :

English

Title :

Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation

Publication date :

2025

Journal title :

Bioinformatics

ISSN :

1367-4803

eISSN :

1367-4811

Publisher :

Oxford University Press, Oxford, United Kingdom

Pages :

10.1093/bioinformatics/btaf406

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Systems Biomedicine

Development Goals :

3. Good health and well-being

Additional URL :

https://doi.org/10.1093/bioinformatics/btaf406

FnR Project :

NCER/23/16695277

Name of the research project :

Clinnova - Federating Digital Medicine in Europe

Funders :

FNR - Fonds National de la Recherche

Funding number :

NCER/23/16695277

Available on ORBilu :

since 03 January 2025

Statistics

Number of views

379 (36 by Unilu)

Number of downloads

48 (5 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Baker KF, Skelton AJ, Lendrem DW et al. Predicting drug-free remission in rheumatoid arthritis: a prospective interventional cohort study. J Autoimmun 2019; 105: 102298. 10.1016/j.jaut.2019.06.009
Bergersen LC, Glad IK, Lyng H. Weighted lasso with data integration. Stat Appl Genet Mol Biol 2011; 10: 39. 10.2202/1544-6115.1703
Boyd M, Thodberg M, Vitezic M et al. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nat Commun 2018; 9: 1661. 10.1038/s41467-018-03766-z
Chung D, Keleş S. Sparse partial least squares classification for high dimensional data. Stat Appl Genet Mol Biol 2010; 9: 17. 10.2202/1544-6115.1492
Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33. 10.18637/jss.v033.i01
Goldberg G et al. 2018. White blood cells from rheumatoid arthritis patients and matched healthy donors. Available from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117769
Haberman Y, Karns R, Dexheimer PJ et al. Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and personalized mechanisms underlying disease severity and treatment response. Nat Commun 2019; 10: 38. 10.1038/s41467-018-07841-3
Moncrieffe H, Bennett MF, Tsoras M et al. Transcriptional profiles of jia patient blood with subsequent poor response to methotrexate. Rheumatology (Oxford) 2017; 56: 1542-51. 10.1093/rheumatology/kex206
Rauschenberger A, Glaab E. Predicting correlated outcomes from molecular data. Bioinformatics 2021; 37: 3889-95. 10.1093/bioinformatics/btab576
Rauschenberger A, Landoulsi Z, van de Wiel MA et al. Penalized regression with multiple sources of prior effects. Bioinformatics 2023; 39: btad680. 10.1093/bioinformatics/btad680
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of rna-seq data. Genome Biology, 2010; 11: R25. 10.1186/gb-2010-11-3-r25
Simon N, Friedman JH, Hastie T. 2013. A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv, preprint: not peer reviewed. 10.48550/arXiv.1311.6529
Tew GW, Hackney JA, Gibbons D et al. Association between response to etrolizumab and expression of integrin αe and granzyme a in colon biopsies of patients with ulcerative colitis. Gastroenterology 2016; 150: 477-87.e9. 10.1053/j.gastro.2015.10.041
Tian Y, Feng Y. Transfer learning under high-dimensional generalized linear models. J Am Stat Assoc 2023; 118: 2684-97. 10.1080/01621459.2022.2071278
Verstockt B, Verstockt S, Dehairs J et al. Low trem1 expression in whole blood predicts anti-tnf response in inflammatory bowel disease. EBio Med 2019; 40: 733-42. 10.1016/j.ebiom.2019.01.027
Verstockt B, Verstockt S, Veny M et al. Expression levels of 4 genes in colon tissue might be used to predict which patients will enter endoscopic remission after vedolizumab therapy for inflammatory bowel diseases. Clin Gastroenterol Hepatol 2020; 18: 1142-51.e10. 10.1016/j.cgh.2019.08.030
Weaver GM, Lewinger JP. xrnet: hierarchical regularized regression to incorporate external data. JOSS 2019; 4: 1761. 10.21105/joss.01761
Wilks C, Zheng SC, Chen FY et al. recount3: summaries and queries for large-scale rna-seq expression and splicing. Genome Biol 2021; 22: 323. 10.1186/s13059-021-02533-6
Zou H. The adaptive lasso and its oracle properties. J Am Statist Assoc 2006; 101: 1418-29. 10.1198/016214506000000735
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Statist Soc B (Statist Methodol) 2005; 67: 301-20. 10.1111/j.1467-9868.2005.00503.x