Balancing Accuracy and Interpretability in Credit Risk Modeling: Evidence from Peer-to-Peer Lending
Main Article Content
Abstract
Accurate credit risk assessment is crucial for the stability and growth of peer-to-peer (P2P) lending platforms. This study investigates the effectiveness of machine learning models in predicting loan defaults using historical Lending Club data. We evaluate logistic regression, decision tree, and random forest, employing feature engineering techniques like one-hot and weight of evidence encoding. Model performance is assessed using K-fold cross-validation and metrics such as accuracy and AUC. To enhance model interpretability, we utilize explainable AI techniques like LIME and SHAP, enabling lenders and borrowers to understand the factors driving loan defaults. Our findings demonstrate that while complex models offer higher predictive accuracy, simpler models like logistic regression with WoE encoding provide a suitable balance between accuracy and interpretability, fostering trust and responsible lending within the P2P lending ecosystem.
Article Details
Keywords
Peer-to-Peer Lending Credit Risk Assessment Logistic Regression Random Forest Weight of Evidence Encodings Explainable AI LIME SHAP Model Interpretability Lending Club
References
[2] Liu, H., Qiao, H., Wang, S., Li, Y. Platform competition in peer-topeer lending considering risk control ability. European Journal of Operational Research, 2018, 274(1), 280–290.
[3] Turner, A. After the crisis, the banks are safer but the debt is a danger. Financial Times. (2025, March 1). https://www.ft.com/content/9f481d3cb4de-11e8-a1d8-15c2dd1280ff
[4] Huang, R. H. Online P2P lending and regulatory responses in China: Opportunities and challenges. European Business Organization Law Review (2018), 19(1), 78.
[5] Havrylchyk, O. Regulatory framework for the loan-based crowdfunding platforms. OECD Economics Department Working Papers. (2021)
[6] Chapman, J. M. Factors affecting credit risk in personal lending. In Commercial Banks and Consumer Installment Credit (1940), 109–139. NBER.
[7] Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q., Niu, X. Study on a prediction of P2P network loan default based on the machine learning lightGBM and XGBoost algorithms according to different high dimensional data cleaning. Electronic Commerce Research and Applications (2018), 31, 24–39.
[8] Duan, J. Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction. Journal of the Franklin Institute (2019), 356, 4716–4731.
[9] Sirignano, J., Sadhwani, A., Giesecke, K. Deep learning for mortgage risk (2016) arXiv preprint arXiv:1607.02470.
[10] Kvamme, H., Sellereite, N., Aas, K., Sjursen, S. Predicting mortgage default using convolutional neural networks. Expert Systems with Applications (2018), 102, 207–217.
[11] Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A. W., Siddique, A. Risk and risk management in the credit card industry. (2016) Journal of Banking & Finance, 72, 218–239.
[12] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. Classification and regression trees. Monterey, CA: Wadsworth and Brooks. (1984).
[13] Goodman, B., & Flaxman, S. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine (2017), 38(3), 50–57.
[14] Ribeiro, M. T., Singh, S., & Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1135–1144.
[15] Lundberg, S. M., & Lee, S.-I. A unified approach to interpreting
model predictions. In Advances in Neural Information Processing Systems (2017), 30, 4765–4774.
[16] Hadji Misheva, B., Hirsa, A., Osterrieder, J., Kulkarni, O., & Fung Lin, S. Explainable AI in credit risk management. Credit Risk
Management (2021). https://ssrn.com/abstract=3768437
[17] Albanesi, S., & Vamossy, D. F. Predicting consumer default: A deep learning approach. National Bureau of Economic Research Working Paper. (2019)
[18] Ariza-Garz´on, M. J., Arroyo, J., Caparrini, A., & SegoviaVargas, M.-J. Explainability of a machine learning granting scoring
model in peer-to-peer lending. IEEE Access (2020), 8, 64873–64890.
[19] Breiman, L. Random Forests Machine Learning (2021), 45, 5-32.
http://dx.doi.org/10.1023/A:1010933404324
[20] Cox, D. R. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological) (1958), 20(2), 215–242.