First experiences with machine learning predictions of accelerated declining eGFR slope of living kidney donors 3 years after donation

Abstract

Background

Living kidney donors are screened pre-donation to estimate the risk of end-stage kidney disease (ESKD). We evaluate Machine Learning (ML) to predict the progression of kidney function deterioration over time using the estimated GFR (eGFR) slope as the target variable.


Methods

We included 238 living kidney donors who underwent donor nephrectomy. We divided the dataset based on the eGFR slope in the third follow-up year, resulting in 185 donors with an average eGFR slope and 53 donors with an accelerated declining eGFR-slope. We trained three Machine Learning-models (Random Forest [RF], Extreme Gradient Boosting [XG], Support Vector Machine [SVM]) and Logistic Regression (LR) for predictions. Predefined data subsets served for training to explore whether parameters of an ESKD risk score alone suffice or additional clinical and time-zero biopsy parameters enhance predictions. Machine learning-driven feature selection identified the best predictive parameters.


Results

None of the four models classified the eGFR slope with an AUC greater than 0.6 or an F1 score surpassing 0.41 despite training on different data subsets. Following machine learning-driven feature selection and subsequent retraining on these selected features, random forest and extreme gradient boosting outperformed other models, achieving an AUC of 0.66 and an F1 score of 0.44. After feature selection, two predictive donor attributes consistently appeared in all models: smoking-related features and glomerulitis of the Banff Lesion Score.


Conclusions

Training machine learning-models with distinct predefined data subsets yielded unsatisfactory results. However, the efficacy of random forest and extreme gradient boosting improved when trained exclusively with machine learning-driven selected features, suggesting that the quality, rather than the quantity, of features is crucial for machine learning-model performance. This study offers insights into the application of emerging machine learning-techniques for the screening of living kidney donors.


Graphical abstract