Abstract :
[en] Background Guidelines for the prevention of cardiovascular disease (CVD) have recommended the assessment of the total CVD risk by risk scores. Current risk algorithms are low in sensitivity and specificity and they have not incorporated emerging risk markers for CVD. We suggest that CVD risk assessment can be still improved. We have developed a long-term risk prediction model of cardiovascular mortality in patients with stable coronary artery disease (CAD) based on newly available machine learning and on an extended dataset of new biomarkers.Methods 2953 participants of the Ludwigshafen Risk and Cardiovascular Health (LURIC) study were included. 184 laboratory and 21 demographic markers were ranked according to their contribution to risk of cardiovascular (CV) mortality using different data mining approaches. A self-learning bioinformatics workflow, including seven different machine learning algorithms, was developed for CV risk prediction. The study population was stratified into patients with and without significant CAD. Thereby, significant CAD was defined as a lumen narrowing of 50 or more in at least one of the coronary segments or a history of definite myocardial infarction. The machine learning models in both subpopulations were compared with established CV risk assessment tools.Results After a follow-up of 10 years, 603 (20.4%) patients died of cardiovascular causes. 95% patients without CAD deceased within ten years and 247 (13.2 %) patients with CAD within 5 years. Overall and in patients without CAD, NT-proBNP (N-terminal pro B-type natriuretic peptide), TnT (Troponin T), estimated cystatin c based GFR (glomerular filtration rate) and age were the highest ranked predictors, while in patients with CAD, NT-proBNP, GFR, CT-proAVP (C-terminal pro arginine vasopressin) and TNT were highest predictive. In the comparison with the FRS, PROCAM and ESC risk scores, the machine learning workflow produced more accurate and robust CV mortality prediction in patients without CAD. Equivalent CV risk prediction was obtained in the CAD subpopulation in comparison with the Marschner risk score. Overall, the existing algorithms in general tend to assign more patients into the medium risk groups, while the machine learning algorithms tend to have a clearer risk/no risk assignment. The framework is available upon request.Conclusion We have developed a fully automated and self-validating computational framework of machine learning techniques using an extensive database of clinical, routinely and non-routinely measured laboratory data. Our framework predicts long-term CV mortality at least as accurate as existing CVD risk scores. A combination of four highly ranked biomarkers and the random forest approach showed the best predictive results. Moreover, a dynamic computational model has several advantages over static CVD risk prediction tools: it is freeware, transparent, variable, transferable and expandable to any population, types of events and time frames.