Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Predictive and interpretable models via the stacked elastic net Rauschenberger, Armin ; Glaab, Enrico ; in Bioinformatics (in press) Motivation: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often ... [more ▼] Motivation: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative, and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques. Results: Here we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularisation. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability. Availability and Implementation: The R package starnet is available on GitHub: https://github.com/rauschenberger/starnet. [less ▲] Detailed reference viewed: 217 (15 UL)Fast cross-validation for multi-penalty ridge regression ; ; Rauschenberger, Armin E-print/Working paper (2020) Prediction based on multiple high-dimensional data types needs to account for the potentially strong differences in predictive signal. Ridge regression is a simple, yet versatile and interpretable model ... [more ▼] Prediction based on multiple high-dimensional data types needs to account for the potentially strong differences in predictive signal. Ridge regression is a simple, yet versatile and interpretable model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, in particular in dense settings. Moreover, it allows using a specific penalty per data type to account for differences between those. Then, the largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional loop for fitting the model by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in the low-dimensional sample space. We show that our approach is several orders of magnitude faster than more naive ones. We developed a very flexible framework that includes prediction of several types of response, allows for unpenalized covariates, can optimize several performance criteria and implements repeated CV. Moreover, extensions to pair data types and to allow a preferential order of data types are included and illustrated on several cancer genomics survival prediction problems. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners. [less ▲] Detailed reference viewed: 48 (2 UL)Sparse classification with paired covariates Rauschenberger, Armin ; ; et al in Advances in Data Analysis and Classification (2020), 14 This paper introduces the paired lasso: a generalisation of the lasso for paired covariate settings. Our aim is to predict a single response from two high-dimensional covariate sets. We assume a one-to ... [more ▼] This paper introduces the paired lasso: a generalisation of the lasso for paired covariate settings. Our aim is to predict a single response from two high-dimensional covariate sets. We assume a one-to-one correspondence between the covariate sets, with each covariate in one set forming a pair with a covariate in the other set. Paired covariates arise, for example, when two transformations of the same data are available. It is often unknown which of the two covariate sets leads to better predictions, or whether the two covariate sets complement each other. The paired lasso addresses this problem by weighting the covariates to improve the selection from the covariate sets and the covariate pairs. It thereby combines information from both covariate sets and accounts for the paired structure. We tested the paired lasso on more than 2000 classification problems with experimental genomics data, and found that for estimating sparse but predictive models, the paired lasso outperforms the standard and the adaptive lasso. The R package palasso is available from CRAN. [less ▲] Detailed reference viewed: 155 (18 UL)Testing for association between RNA-Seq and high-dimensional data Rauschenberger, Armin ; ; et al in BMC Bioinformatics (2016), 17 Background: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. Results: Using the negative binomial ... [more ▼] Background: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. Results: Using the negative binomial distribution and a random-effects model, we develop an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size. Conclusions: The proposed test can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression. The R package globalSeq is available from Bioconductor. [less ▲] Detailed reference viewed: 60 (2 UL) |
||