ethnicity; internal validation; machine learning; probable dementia; transfer learning; underrepresented groups
Abstract :
[en] Algorithmic estimations of dementia status are widely used in public health and epidemiological research, however, inadequate algorithm performance across racial/ethnic groups has been a barrier. We present improvements in the accuracy of group-specific "probable dementia" estimation using a transfer learning approach. Transfer learning involves combining models trained on a large "source" dataset with imprecise outcome assessments, alongside models trained on a smaller "target" dataset with high-quality outcome assessments. Transfer learning improves model accuracy by leveraging large source data while refining estimations with smaller, target data. We illustrate with data from the Health and Retirement Study (source data: N=6,630) and the Harmonized Cognitive Assessment Protocol (target data: N=2,388). Models for dementia status estimation were evaluated through overall accuracy (Brier score), calibration (intercept, slope), and discriminative ability (area under the receiver operating characteristic curve, AUR; area under the precision-recall curve, AUPRC). The transfer-learned algorithm showed higher accuracy compared to the best previously reported algorithm among both non-Hispanic Black participants (Brier 0.049 vs. 0.061; AUC 0.84 vs. 0.81; AUPRC 0.52 vs. 0.39) and Hispanic participants (Brier 0.052 vs. 0.056; AUC 0.89 vs. 0.87; AUPRC 0.61 vs. 0.56). Transfer learning can improve dementia status estimation for groups historically underrepresented in research.
Research center :
Integrative Research Unit: Social and Individual Development (INSIDE) > PEARL Institute for Research on Socio-Economic Inequality (IRSEI)
Disciplines :
Public health, health care sciences & services Sociology & social sciences Neurosciences & behavior
Author, co-author :
KIM, Jung Hyun ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences > Department of Social Sciences > Team Anja LEIST
Glymour, M Maria; Department of Epidemiology, Boston University, MA, USA
Langa, Kenneth M; Department of Internal Medicine, School of Medicine, University of Michigan, Ann Arbor, MI, USA
LEIST, Anja ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Social Sciences (DSOC) > Socio-Economic Inequality
External co-authors :
yes
Language :
English
Title :
Improving accuracy in the estimation of probable dementia in racially and ethnically diverse groups with penalized regression and transfer learning.
Publication date :
06 January 2025
Journal title :
American Journal of Epidemiology
ISSN :
0002-9262
eISSN :
1476-6256
Publisher :
Oxford University Press (OUP), United States
Peer reviewed :
Peer Reviewed verified by ORBi
Development Goals :
10. Reduced inequalities 3. Good health and well-being
Nichols E, Steinmetz JD, Vollset SE, et al. Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the global burden of disease study 2019. Lancet Public Health. 2022;7(2):e105-e125. 10.1016/S2468-2667(21)00249-8
Mayeda ER, Glymour MM, Quesenberry CP, et al. Inequalities in dementia incidence between six racial and ethnic groups over 14 years. Alzheimers Dement. 2016;12(3):216-224. 10.1016/j.jalz.2015.12.007
Manly JJ, Jones RN, Langa KM, et al. Estimating the prevalence of dementia and mild cognitive impairment in the US: the 2016 Health and Retirement Study Harmonized Cognitive Assessment Protocol Project. JAMA Neurol. 2022;79(12):1242-1249. 10.1001/jamaneurol.2022.3543
Gianattasio KZ, Wu Q, Glymour MM, et al. Comparison of methods for algorithmic classification of dementia status in the Health and Retirement Study. Epidemiology. 2019;30(2):291-302. 10.1097/EDE.0000000000000945
Gianattasio KZ, Ciarleglio A, Power MC. Development of algorithmic dementia ascertainment for racial/ethnic disparities research in the US Health and Retirement Study. Epidemiology. 2020;31(1):126-133. 10.1097/EDE.0000000000001101
Hudomiet P, Hurd MD, Rohwedder S. Trends in inequalities in the prevalence of dementia in the United States. Proc Natl Acad Sci USA. 2022;119(46):e2212205119. 10.1073/pnas.2212205119
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345-1359. 10.1109/TKDE.2009.191
Bastani H. Predicting with proxies: transfer learning in high dimension. Manag Sci. 2021;67(5):2964-2984. 10.1287/mnsc.2020.3729
Tian Y, Feng Y. Transfer learning under high-dimensional generalized linear models. J Am Stat Assoc. 2023;118(544):2684-2697. 10.1080/01621459.2022.2071278
Vergouwe Y, Moons KGM, Steyerberg EW. External validity of risk models: use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol. 2010;172(8):971-980. 10.1093/aje/kwq223
Steingrimsson JA, Gatsonis C, Li B, et al. Transporting a prediction model for use in a new target population. Am J Epidemiol. 2022;192(2):296-304. 10.1093/aje/kwac128
Gao Y, Cui Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nat Commun. 2020;11(1):5131. 10.1038/s41467-020-18918-3
Sonnega A, Faul JD, Ofstedal MB, et al. Cohort profile: the Health and Retirement Study (HRS). Int J Epidemiol. 2014;43(2):576-585. 10.1093/ije/dyu067
Langa KM, Ryan LH, McCammon R, et al. The Health and Retirement Study Harmonized Cognitive Assessment Protocol (HCAP) project: study design and methods. Neuroepidemiology. 2020;54(1):64-74. 10.1159/000503004
Hurd MD, Martorell P, Delavande A, et al. Monetary costs of dementia in the United States. N Engl J Med. 2013;368(14):1326-1334. 10.1056/NEJMsa1204629
Langa KM, Plassman BL, Wallace RB, et al. The Aging, Demographics, and Memory Study: study design and methods. Neuroepidemiology. 2005;25(4):181-191. 10.1159/000087448
Jorm AF, Jacomb PA. The Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE): socio-demographic correlates, reliability, validity and some norms. Psychol Med. 1989;19(4):1015-1022. 10.1017/S0033291700005742
Jorm AF. Disability in dementia: assessment, prevention, and rehabilitation. Disabil Rehabil. 1994;16(3):98-109. 10.3109/09638289409166286
Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2011;28(1):112-118. 10.1093/bioinformatics/btr597
Torrey L, Shavlik J. Transfer learning. In: Olivas ES et al., eds. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI Global; 2010:242-264.
Ye Tian YF. glmtrans: transfer learning under regularized generalized linear models; 2022. Accessed July, 2023. R package version (>= 3.5.0). Available from: https://cran.r-project.org/web/packages/glmtrans/
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc B Methodol. 1996;58(1):267-288. 10.1111/j.2517-6161.1996.tb02080.x
Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal–external, and external validation. J Clin Epidemiol. 2016;69:245-247. 10.1016/j.jclinepi.2015.04.005
Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. 10.1097/EDE.0b013e3181c30fb2
Leist AK, Klee M, Kim JH, et al. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. Sci Adv. 2022;8(42):1-20. 10.1126/sciadv.abk1942
Rufibach K. Use of brier score to assess binary predictions. J Clin Epidemiol. 2010;63(8):938-939. 10.1016/j.jclinepi.2009.11.009
Janssens ACJW, Martens FK. Reflection on modern methods: revisiting the area under the ROC curve. Int J Epidemiol. 2020;49(4):1397-1403. 10.1093/ije/dyz274
Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning—ICML ‘06. New York: Association for Computing Machinery; 2006.
Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br J Cancer. 2015;112(2):251-259. 10.1038/bjc.2014.639
Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. 10.1186/s12916-019-1466-7
Stevens RJ, Poppe KK. Validation of clinical prediction models: what does the “calibration slope” really measure? J Clin Epidemiol. 2020;118:93-99. 10.1016/j.jclinepi.2019.09.016
Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-1931. 10.1093/eurheartj/ehu207
Nichols E, Ng DK, James BD, et al. Measurement of prevalent versus incident dementia cases in epidemiologic studies. Am J Epidemiol. 2022;192(4):520-534. 10.1093/aje/kwac197