[en] Motivation: Multivariate (multi-target) regression has the potential to outperform univariate (single-target) regression at predicting correlated outcomes, which frequently occur in biomedical and clinical research. Here we implement multivariate lasso and ridge regression using stacked generalisation. Results: Our flexible approach leads to predictive and interpretable models in high-dimensional settings, with a single estimate for each input-output effect. In the simulation, we compare the predictive performance of several state-of-the-art methods for multivariate regression. In the application, we use clinical and genomic data to predict multiple motor and non-motor symptoms in Parkinson’s disease patients. We conclude that stacked multivariate regression, with our adaptations, is a competitive method for predicting correlated outcomes. Availability and Implementation: The R package joinet is available on GitHub (https://github.com/rauschenberger/joinet) and CRAN (https://CRAN.R-project.org/package=joinet).
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group)
Disciplines :
Biotechnologie Sciences de la santé humaine: Multidisciplinaire, généralités & autres Sciences informatiques Sciences du vivant: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
RAUSCHENBERGER, Armin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
GLAAB, Enrico ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Predicting correlated outcomes from molecular data
Biesheuvel, C.J. et al. (2008) Polytomous logistic regression analysis could be applied more often in diagnostic research. J. Clin. Epidemiol., 61, 125-134.
Bostanabad, R. et al. (2018) Leveraging the nugget parameter for efficient Gaussian process modeling. Int. J. Numer. Methods Eng., 114, 501-516.
Breiman, L. (1996) Stacked regressions. Mach. Learn., 24, 49-64.
Breiman, L. and Friedman, J.H. (1997) Predicting multivariate responses in multiple linear regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.), 59, 3-54.
Cao, H. et al. (2019) RMTL: an R library for multi-task learning. Bioinformatics, 35, 1797-1798.
Christie, S.A. et al. (2019) Dynamic multi-outcome prediction after injury: applying adaptive machine learning for precision medicine in trauma. PLoS One, 14, e0213836.
Chung, D. and Keles, S. (2010) Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol., 9, Article 17.
de Jong, V.M. et al. (2019) Sample size considerations and predictive performance of multinomial logistic prediction models. Stat. Med., 38, 1601-1619.
Dudbridge, F. (2020) Criteria for evaluating risk prediction of multiple outcomes. Stat. Methods Med. Res., 29, 3492-3510.
Friedman, J.H. et al. (2010) Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw., 33, 1-22.
Luo, R. and Qi, X. (2017) Signal extraction approach for sparse multivariate response regression. J. Multivar. Stat., 153, 83-97.
Lutz, R.W. and Bühlmann, P. (2006) Boosting for high-multivariate responses in high-dimensional linear regression. Stat. Sin., 16, 471.
Marek, K. et al.; Parkinson Progression Marker Initiative. (2011) The Parkinson Progression Marker Initiative (PPMI). Progress Neurobiol., 95, 629-635.
Martin, G.P. et al. (2021) Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches. Stat. Med., 40, 498-517.
Morris, T.P. et al. (2019) Using simulation studies to evaluate statistical methods. Stat. Med., 38, 2074-2102.
Peng, J. et al. (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat., 4, 53-77.
Price, B.S. and Sherwood, B. (2017) A cluster elastic net for multivariate regression. J. Mach. Learn. Res., 18, 1-39.
Rauschenberger, A. et al. (2021) Predictive and interpretable models via the stacked elastic net. Bioinformatics, 37, 2012-2016.
Rosellini, A.J. et al. (2017) Using self-report surveys at the beginning of service to develop multi-outcome risk models for new soldiers in the US Army. Psychol. Med., 47, 2275-2287.
Rothman, A.J. et al. (2010) Sparse multivariate regression with covariance estimation. J. Comput. Graph. Stat., 19, 947-962.
Segal, M. and Xiao, Y. (2011) Multivariate random forests. Wiley Interdiscip. Rev. Data Min. Knowledge Discov., 1, 80-87.
Teixeira-Pinto, A. et al. (2009) Statistical approaches to modeling multiple outcomes in psychiatric studies. Psychiatric Ann., 39, 729-735.
Tibshirani, R. and Hinton, G. (1998) Coaching variables for regression and classification. Stat. Comput., 8, 25-33.
van Buuren, S. and Groothuis-Oudshoorn, K. (2011) mice: multivariate imputation by chained equations in R. J. Stat. Softw., 45, 1-67.
Vega, C. (2021) From Hume to Wuhan: an epistemological journey on the problem of induction in COVID-19 machine learning models and its impact upon medical research. IEEE Access, 9, 97243-97250.
Waegeman, W. et al. (2019) Multi-target prediction: a unifying view on problems and methods. Data Min. Knowledge Discov., 33, 293-324.
Wang, L.Y. et al. (2019) Multi-outcome predictive modelling of anesthesia patients. J. Biomed. Res., 33, 430.
Wilkinson, D.P. et al. (2021) Defining and evaluating predictions of joint species distribution models. Methods Ecol. Evol., 12, 394-404.