artificial intelligence; exploratory data analysis; feature engineering; machine learning; orthopedic research; Humans; Algorithms; Data Analysis; Machine Learning; Artificial Intelligence; Surgery; Orthopedics and Sports Medicine
Abstract :
[en] Explorative data analysis (EDA) is a critical step in scientific projects, aiming to uncover valuable insights and patterns within data. Traditionally, EDA involves manual inspection, visualization, and various statistical methods. The advent of artificial intelligence (AI) and machine learning (ML) has the potential to improve EDA, offering more sophisticated approaches that enhance its efficacy. This review explores how AI and ML algorithms can improve feature engineering and selection during EDA, leading to more robust predictive models and data-driven decisions. Tree-based models, regularized regression, and clustering algorithms were identified as key techniques. These methods automate feature importance ranking, handle complex interactions, perform feature selection, reveal hidden groupings, and detect anomalies. Real-world applications include risk prediction in total hip arthroplasty and subgroup identification in scoliosis patients. Recent advances in explainable AI and EDA automation show potential for further improvement. The integration of AI and ML into EDA accelerates tasks and uncovers sophisticated insights. However, effective utilization requires a deep understanding of the algorithms, their assumptions, and limitations, along with domain knowledge for proper interpretation. As data continues to grow, AI will play an increasingly pivotal role in EDA when combined with human expertise, driving more informed, data-driven decision-making across various scientific domains. Level of Evidence: Level V - Expert opinion.
Disciplines :
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others
Author, co-author :
Oettl, Felix C ; Hospital for Special Surgery, New York, New York, USA ; Department of Orthopedic Surgery, Balgrist University Hospital, University of Zürich, Zurich, Switzerland
Oeding, Jacob F ; Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden ; Sahlgrenska Sports Medicine Center, Göteborg, Sweden ; Mayo Clinic Alix School of Medicine, Mayo Clinic, Rochester, Minnesota, USA
Feldt, Robert ; Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH)
Hirschmann, Michael T ; Department of Orthopedic Surgery and Traumatology, Kantonspital Baselland, Liestal, Switzerland
Samuelsson, Kristian ; Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden ; Sahlgrenska Sports Medicine Center, Göteborg, Sweden ; Department of Orthopaedics, Sahlgrenska University Hospital, Mölndal, Sweden
ESSKA Artificial Intelligence Working Group
External co-authors :
yes
Language :
English
Title :
The artificial intelligence advantage: Supercharging exploratory data analysis.
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M. & Elhadad, N. (2015) Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission, Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. Sydney, NSW, Australia.
Donnelly, J., Katta, S., Rudin, C. & Browne, E.P. (2024) The Rashomon importance distribution: getting RID of unstable, single model-based variable importance. ArXiv. [Preprint]
Eckhardt, C.M., Madjarova, S.J., Williams, R.J., Ollivier, M., Karlsson, J., Pareek, A. et al. (2023) Unsupervised machine learning methods and emerging applications in healthcare. Knee Surgery, Sports Traumatology, Arthroscopy, 31, 376–381. Available from: https://doi.org/10.1007/s00167-022-07233-7
Liu, Y., Li, Y., Yang, W. & Hu, J. (2023) Exploring nonlinear effects of built environment on jogging behavior using random forest. Applied Geography, 156, 102990. Available from: https://doi.org/10.1016/j.apgeog.2023.102990
Lloyd, S. (1982) Least squares quantization in PCM | IEEE Journals & Magazine | IEEE Xplore. IEEE Transactions on Information Theory, 28(2), 129–137.
Meinshausen, N. (2007) Relaxed lasso. Computational Statistics & Data Analysis, 52, 374–393.
Milo, T. & Somech, A. (2020) Automating exploratory data analysis via machine learning: an overview, Proceedings of the 2020 ACM SIGMOD international conference on management of data. Portland, OR, USA.
Nembrini, S., König, I.R. & Wright, M.N. (2018) The revival of the Gini importance? Bioinformatics, 34, 3711–3718. Available from: https://doi.org/10.1093/bioinformatics/bty373
Pruneski, J.A., Pareek, A., Kunze, K.N., Martin, R.K., Karlsson, J., Oeding, J.F. et al. (2023) Supervised machine learning and associated algorithms: applications in orthopedic surgery. Knee Surgery, Sports Traumatology, Arthroscopy, 31, 1196–1202. Available from: https://doi.org/10.1007/s00167-022-07181-2
Pruneski, J.A., Williams, R.J., Nwachukwu, B.U., Ramkumar, P.N., Kiapour, A.M., Martin, R.K. et al. (2022) The development and deployment of machine learning models. Knee Surgery, Sports Traumatology, Arthroscopy, 30, 3917–3923. Available from: https://doi.org/10.1007/s00167-022-07155-4
Thong, W., Parent, S., Wu, J., Aubin, C.E., Labelle, H. & Kadoury, S. (2016) Three-dimensional morphology study of surgical adolescent idiopathic scoliosis patient from encoded geometric models. European Spine Journal, 25, 3104–3113. Available from: https://doi.org/10.1007/s00586-016-4426-3
Tibshirani, R. (1997) The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385–395. Available from: https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
Venäläinen, M.S., Panula, V.J., Klén, R., Haapakoski, J.J., Eskelinen, A.P. & Manninen, M.J. et al. (2021) Preoperative risk prediction models for short-term revision and death after total hip arthroplasty: data from the Finnish arthroplasty register. JBJS Open Access, 6, e20.00091. Available from:. https://doi.org/10.2106/JBJS.OA.20.00091
Zou, H. & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301–320. Available from: https://doi.org/10.1111/j.1467-9868.2005.00503.x
Zsidai, B., Kaarre, J., Narup, E., Hamrin Senorski, E., Pareek, A. & Grassi, A. et al. (2024) A practical guide to the implementation of artificial intelligence in orthopaedic research-Part 2: a technical introduction. Journal of Experimental Orthopaedics, 11, e12025. Available from: https://doi.org/10.1002/jeo2.12025