[en] Motivation: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative, and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques.
Results: Here we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularisation. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability.
Availability and Implementation: The R package starnet is available on GitHub: https://github.com/rauschenberger/starnet.
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group)
Disciplines :
Sciences informatiques Sciences du vivant: Multidisciplinaire, généralités & autres Sciences de la santé humaine: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
RAUSCHENBERGER, Armin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
GLAAB, Enrico ✱; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
van de Wiel, Mark ✱
✱ Ces auteurs ont contribué de façon équivalente à la publication.
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Predictive and interpretable models via the stacked elastic net
Alon, U. et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745-6750.
Friedman, J.H. et al. (2010) Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw., 33, 1-22.
Golub, T.R. et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
Hahn, R.P. and Carvalho, C.M. (2015) Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective. J. Am. Stat. Assoc., 110, 435-448.
Khan, J. et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 7, 673-679.
Song, L. et al. (2013) Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics, 14, 5.
The Cancer Genome Atlas Research Network et al. (2013) The cancer genome atlas pan-cancer analysis project. Nat. Genet., 45, 1113-1120.
Tibshirani, R. et al. (2005) Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 91-108.
Wiel, M.A. et al. (2019) Learning from a lot: empirical Bayes for high dimensional model-based prediction. Scand. J. Stat., 46, 2-25.
van der Laan, M.J. et al. (2007) Super learner. Stat. Appl. Genet. Mol. Biol., 6, 25.
Waldron, L. et al. (2011) Optimized application of penalized regression methods to diverse genomic data. Bioinformatics, 27, 3399-3406.