Optimizing Type II Diabetes Prediction Through Hybrid Big Data Analytics and H-SMOTE Tree Methodology
DOI:
https://doi.org/10.22399/ijcesen.727Keywords:
Hybrid Big Data Analytics, Type II Diabetes Prediction, H-SMOTE Tree, Data Preprocessing, Feature Selection, Healthcare Decision makingAbstract
In the last few years, Type II diabetes has become much more common worldwide, presenting major problems for both healthcare systems and individuals. Utilizing big data analytics has shown potential as a means of forecasting and managing persistent illnesses, like Type II diabetes. This paper proposes a novel hybrid approach that combines big data analytics techniques with an H-SMOTE tree algorithm for the prediction of Type II diabetes. The suggested method addresses the problems of class imbalance present in medical datasets and improves prediction accuracy by combining steps of feature selection, data preprocessing, and classification. In order to prepare raw data for analysis, it must first be cleaned, standardised, and transformed. Then, feature selection techniques are used to identify the most important factors that help predict Type II diabetes. This approach streamlines the predictive model and lowers its dimensionality. In the classification phase, an algorithm called the H-SMOTE tree is used. This method combines two existing techniques: the Hoeffding Adaptive Tree (HAT) and Synthetic Minority Oversampling Technique (SMOTE). The H-SMOTE tree tackles imbalanced data by creating synthetic samples for the under-represented class, while also adapting the decision tree structure as it receives new data. Experiments show that this approach is effective in accurately predicting Type II diabetes. The researchers found that the H-SMOTE tree model outperformed other machine learning methods, both classic and recent ones. In other words, it was more accurate in predicting T2DM cases. This was evident in terms of several metrics, including how well it identified true positives (sensitivity), how well it avoided false positives (specificity), and its overall performance captured by the AUC-ROC score. Additionally, the proposed method displays resilience and scalability, rendering it apt for managing extensive medical datasets frequently encountered within healthcare domains.
References
Alberti, K. G., & Zimmet, P. Z. (1998). Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus provisional report of a WHO consultation. Diabetic medicine. 15(7), 539-553. DOI:10.1002/(SICI)1096-9136(199807)15:7<539::AID-DIA668>3.0.CO;2-S
American Diabetes Association. (2019). Classification and diagnosis of diabetes: standards of medical care in diabetes—2019. Diabetes care. 42(Supplement 1), S13-S28. DOI:10.2337/dc19-S002
Balazs, J., & Victor, J. (2016). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Breiman, L. (2001). Random forests. Machine learning. 45(1): 5-32. DOI: 10.1023/A:1010933404324
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of artificial intelligence research. 16: 321- 357. DOI: 10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794). DOI: 10.1145/2939672.2939785
Centers for Disease Control and Prevention. (2021). National diabetes statistics report, 2020. Atlanta, GA: Centers for Disease Control and Prevention, US Department of Health and Human Services. https://stacks.cdc.gov/view/cdc/85309
Harrell Jr, F. E., Lee, K. L., & Mark, D. B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine. 15(4): 361-387. DOI: 10.1002/(SICI)1097- 0258(19960229)15:4:361
P., A. M., & R. GUNASUNDARI. (2024). An Interpretable PyCaret Approach for Alzheimer’s Disease Prediction. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.655
Bandla Raghuramaiah, & Suresh Chittineni. (2025). BCDNet: An Enhanced Convolutional Neural Network in Breast Cancer Detection Using Mammogram Images. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.811
C, A., K, S., N, N. S., & S, P. (2024). Secured Cyber-Internet Security in Intrusion Detection with Machine Learning Techniques. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.491
Tirumanadham, N. S. K. M. K., S. Thaiyalnayaki, & V. Ganesan. (2025). Towards Smarter E-Learning: Real-Time Analytics and Machine Learning for Personalized Education. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.786
guven, mesut. (2024). Dynamic Malware Analysis Using a Sandbox Environment, Network Traffic Logs, and Artificial Intelligence. International Journal of Computational and Experimental Science and Engineering, 10(3). https://doi.org/10.22399/ijcesen.460
P. Padma, & G. Siva Nageswara Rao. (2024). CBDC-Net: Recurrent Bidirectional LSTM Neural Networks Based Cyberbullying Detection with Synonym-Level N-Gram and TSR-SCSOFeatures. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.623
MUTİ, S., & YILDIZ, K. (2023). Using Linear Regression For Used Car Price Prediction. International Journal of Computational and Experimental Science and Engineering, 9(1), 11–16. Retrieved from https://www.ijcesen.com/index.php/ijcesen/article/view/183
M. Venkateswarlu, K. Thilagam, R. Pushpavalli, B. Buvaneswari, Sachin Harne, & Tatiraju.V.Rajani Kanth. (2024). Exploring Deep Computational Intelligence Approaches for Enhanced Predictive Modeling in Big Data Environments. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.676
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal of Computational and Experimental Science and Engineering
This work is licensed under a Creative Commons Attribution 4.0 International License.