Systematic Review and Meta-Analysis of Machine Learning Models for Acute Kidney Injury Risk Classification

imageKey Points

Pooled discrimination metrics were acceptable (area under the receiver operating characteristic curve >0.70) for all AKI risk classification categories in both internal and external validation.Better performance was observed in most recently published studies and those with a low or unclear risk of bias.Significant heterogeneity in patient populations, definitions, clinical predictors, and methods limit implementation in real-world clinical scenarios.

Background

Artificial intelligence through machine learning models seems to provide accurate and precise AKI risk classification in some clinical settings, but their performance and implementation in real-world settings has not been established.

Methods

PubMed, Excerpta Medica (EMBASE) database, Web of Science, and Scopus were searched until August 2023. Articles reporting on externally validated models for prediction of AKI onset, AKI severity, and post-AKI complications in hospitalized adult and pediatric patients were searched using text words related to AKI, artificial intelligence, and machine learning. Two independent reviewers screened article titles, abstracts, and full texts. Areas under the receiver operating characteristic curves (AUCs) were used to compare model discrimination and pooled using a random-effects model.

Results

Of the 4816 articles initially identified and screened, 95 were included, representing 3.8 million admissions. The Kidney Disease Improving Global Outcomes (KDIGO)-AKI criteria were most frequently used to define AKI (72%). We identified 302 models, with the most common being logistic regression (37%), neural networks (10%), random forest (9%), and eXtreme gradient boosting (9%). The most frequently reported predictors of hospitalized incident AKI were age, sex, diabetes, serum creatinine, and hemoglobin. The pooled AUCs for AKI onset were 0.82 (95% confidence interval, 0.80 to 0.84) and 0.78 (95% confidence interval, 0.76 to 0.80) for internal and external validation, respectively. Pooled AUCs across multiple clinical settings, AKI severities, and post-AKI complications ranged from 0.78 to 0.87 for internal validation and 0.73 to 0.84 for external validation. Although data were limited, results in the pediatric population aligned with those observed in adults. Between-study heterogeneity was high for all outcomes (I2>90%), and most studies presented high risk of bias (86%) according to the Prediction Model Risk of Bias Assessment Tool.

Conclusions

Most externally validated models performed well in predicting AKI onset, AKI severity, and post-AKI complications in hospitalized adult and pediatric populations. However, heterogeneity in clinical settings, study populations, and predictors limits their generalizability and implementation at the bedside.