Statistical and AI-Based Hybrid Modeling for Predicting Chronic Disease Progression

Authors

  • Khalid Talib Othman Aliraqia University / College of Medicine / Iraq

DOI:

https://doi.org/10.22399/ijcesen.2791

Keywords:

Chronic Diseases, Artificial Intelligence, Statistical Modeling, Supervised Learning, Health Prediction

Abstract

Non-communicable diseases such as diabetes, hypertension, and cardiovascular diseases are a growing worldwide health concern. Early-stage prediction of disease progression is critical for improving clinical outcomes and reducing healthcare costs. Conventional statistical models may assist in highlighting significant risk predictors, yet cannot suitably model high-dimensional, nonlinear construct the large clinical datasets. Machine learning methods, such as Random Forests, Support Vector Machines and Neural Networks, provide very high prediction power but tend to provide low interpretability.This study aims to develop hybrid model for chronic illness progression forecasting by supervised learning methods using statistical and AI-based approaches. We will use freely available datasets such as the UCI Chronic Disease, NHANES, and MIMIC-III. The study will compare the predictive performance of logistic regression and Cox regression against the existing standard of care (the AI models) but also against a new model that integrates aspects of both approaches.In this study, feature encoding techniques and imbalance data fixing methods such as SMOTE, etc. have been used. The evaluation metrics that will be used to assess model performance include accuracy, precision, recall, F1-score, and ROC-AUC. The result is expected to show that hybrid models can offer better predictive performance while keeping statistical interpretability.This research helps develop transparent and efficient clinical decision-support systems that provide medical staff with realistic opportunities for early detection of risk and the tailored preparation of care

References

[1] K. Ogasawara et al., (2023). A Logistic Regression Model for Predicting the Risk of Subsequent Surgery among Patients with Newly Diagnosed Crohn’s Disease Using a Brute Force Method, Diagnostics, vol. 13(23), 3587. https://www.mdpi.com/2075-4418/13/23/3587

[2] S. Bussy et al., (2018). Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework, arXiv preprint, arXiv:1807.09821, https://arxiv.org/abs/1807.09821

[3] E. W. Steyerberg, (2009). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating, Springer.

[4] S. Shankar et al., (2024). An Overview on the Advancements of Support Vector Machine Applications in Medical Field, Information, vol. 15(4), 235. https://www.mdpi.com/2078-2489/15/4/235

[5] M. H. Ghaffari et al., (2022). Diagnosis of Coronary Artery Disease Based on Machine Learning Algorithms, Advanced Biomedical Research, vol. 11, 383, https://journals.lww.com/10.4103/abr.abr_383_21

[6] A. Sharma et al., (2022). A Systematic Review on Machine Learning and Neural Network Based Approaches for Disease Prediction. Journal of Integrative Science and Technology, vol. 10(1), 1-10. https://pubs.thesciencein.org/journal/index.php/jist/article/view/a787

[7] G. P. Martin et al., (2020). Logistic regression was as good as machine learning for predicting major chronic

diseases, ResearchGate, https://www.researchgate.net/publication/339834336_Logistic_regression_was_as_good_as_machine_learning_for_predicting_major_chronic_diseases

[8] UCI Machine Learning Repository: Chronic Kidney Disease Data Set. https://archive.ics.uci.edu/ml/datasets/chronic+kidney+diseaseUCI Machine Learning Repository+6UCI Machine Learning Repository+6IBM Cloud Pak for Data+6

[9] MIMIC-III Clinical Database v1.4. https://physionet.org/content/mimiciii/PhysioNet+2PhysioNet+2PhysioNet+2

[10] National Health and Nutrition Examination Survey (NHANES). https://www.cdc.gov/nchs/nhanes/index.html

[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, vol. 16, 321–357.

[12] Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, 2825–2830.

[13] Therneau, T. M. (2020). A Package for Survival Analysis in R, R package version 3.2-7.

[14] Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag New York.

Downloads

Published

2025-06-27

How to Cite

Khalid Talib Othman. (2025). Statistical and AI-Based Hybrid Modeling for Predicting Chronic Disease Progression. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.2791

Issue

Section

Research Article