ANALISIS PERBANDINGAN ENSEMBLE MACHINE LEARNING DENGAN TEKNIK SMOTE UNTUK PREDIKSI DIABETES

Nur Tri Ramadhanti Adiningrum, Nisa Hanum Harani

Abstract


High blood glucose levels characterize a chronic disease called diabetes. Patients with diabetes will eventually experience health problems. These cases show that early detection and better diagnosis are needed. Although several Machine Learning (ML) models have been widely used in diabetes diagnosis, the algorithm performance is still between 70 - 79%. This study evaluates the use of Ensemble Machine Learning to predict diabetes using the Pima Indian Diabetes dataset. The models compared are Support Vector Machine, Linear Regression, Naive Bayes, Random Forest, AdaBoost, K Nearest Neighbour, and Decision Tree. The dataset will also be balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to reduce accuracy bias. Cross-Industry Standard Process For Data Mining (CRISP-DM) is the methodology used. The accuracy results show that Random Forest with Bagging and Hard-Voting produces the best accuracy of other models. Where Random Forest produces an accuracy of 81.16% and Hard-Voting also produces an accuracy of 81.16%.

Penyakit kronis yang disebut diabetes ditandai dengan kadar glukosa darah yang tinggi. Pasien dengan diabetes pada akhirnya akan mengalami masalah kesehatan. Kasus-kasus ini menunjukkan bahwa deteksi dini dan diagnosis yang lebih baik diperlukan. Meskipun beberapa model Machine Learning (ML) telah banyak digunakan dalam diagnosis diabetes, kinerja algoritmanya masih antara 70 - 79%. Untuk memutuskan apakah seseorang menderita diabetes atau tidak, penelitian ini mengevaluasi penggunaan Ensemble Machine Learning untuk memprediksi diabetes menggunakan dataset Diabetes Pima Indian. Model yang dibandingkan adalah Support Vector Machine, Linear Regression, Naive Bayes, Random Forest, Adaboost, K Nearest Neighbor, dan Decision Tree. Untuk mengurangi bias akurasi, dataset juga akan diseimbangkan menggunakan Synthetic Minority Over-sampling Technique (SMOTE). Cross-Industry Standard Process For Data Mining (CRISP-DM) adalah metodologi yang digunakan. Hasil akurasi menunjukkan bahwa Random Forest dengan Bagging dan Hard-Voting menghasilkan akurasi terbaik dari model lainnya. Dimana Random Forest menghasilkan akurasi sebesar 81,16% dan Hard-Voting juga menghasilkan akurasi sebesar 81,16%.


Full Text:

PDF

References


Carter, J. A., Long, C. S., Smith, B. P., Smith, T. L., & Donati, G. L. (2019). Combining elemental analysis of toenails and machine learning techniques as a non-invasive diagnostic tool for the robust classification of type-2 diabetes. Expert Systems with Applications, 115, 245–255. https://doi.org/10.1016/j.eswa.2018.08.002

Chaki, J., Ganesh, S. T., Cidham, S. ., & Theertan, S. A. (2022). Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review. Journal of King Saud University - Computer and Information Sciences, 34(6), 3204–3225. https://doi.org/10.1016/j.jksuci.2020.06.013

Chang, V., Ganatra, M. A., Hall, K., Golightly, L., & Xu, Q. A. (2022). An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators. Healthcare Analytics, 2, 100118. https://doi.org/10.1016/j.health.2022.100118

Joloudari, J. H., Marefat, A., Nematollahi, M. A., Oyelere, S. S., & Hussain, S. (2023). Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Applied Sciences, 13(6), 4006. https://doi.org/10.3390/app13064006

Joshi, R. D., & Dhakal, C. K. (2021). Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. International Journal of Environmental Research and Public Health, 18(14), 7346. https://doi.org/10.3390/ijerph18147346

Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432–439. https://doi.org/10.1016/j.icte.2021.02.004

Kibria, H. B., Nahiduzzaman, M., Goni, M. O. F., Ahsan, M., & Haider, J. (2022). An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. Sensors, 22(19), 7268. https://doi.org/10.3390/s22197268

Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 4, 40–46. https://doi.org/10.1016/j.ijcce.2021.01.001

Makroum, M. A., Adda, M., Bouzouane, A., & Ibrahim, H. (2022). Machine Learning and Smart Devices for Diabetes Management: Systematic Review. Sensors, 22(5), 1843. https://doi.org/10.3390/s22051843

Mistry, S., Riches, N. O., Gouripeddi, R., & Facelli, J. C. (2023). Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artificial Intelligence in Medicine, 135, 102461. https://doi.org/10.1016/j.artmed.2022.102461

Nicolucci, A., Romeo, L., Bernardini, M., Vespasiani, M., Rossi, M. C., Petrelli, M., Ceriello, A., Bartolo, P. Di, Frontoni, E., & Vespasiani, G. (2022). Prediction of complications of type 2 Diabetes: A Machine learning approach. Diabetes Research and Clinical Practice, 190, 110013. https://doi.org/10.1016/j.diabres.2022.110013

Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773

Plotnikova, V., Dumas, M., & Milani, F. P. (2022). Applying the CRISP-DM data mining process in the financial services industry: Elicitation of adaptation requirements. Data & Knowledge Engineering, 139, 102013. https://doi.org/10.1016/j.datak.2022.102013

Raghuwanshi, B. S., & Shukla, S. (2020). SMOTE based class-specific extreme learning machine for imbalanced learning. Knowledge-Based Systems, 187, 104814. https://doi.org/10.1016/j.knosys.2019.06.022

Rajendra, P., & Latifi, S. (2021). Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update, 1, 100032. https://doi.org/10.1016/j.cmpbup.2021.100032

Rastogi, R., & Bansal, M. (2023). Diabetes prediction model using data mining techniques. Measurement: Sensors, 25, 100605. https://doi.org/10.1016/j.measen.2022.100605

Retta, E., Kusumajaya, H., & Arjuna, A. (2023). Faktor – faktor yang Berhubungan dengan Pemilihan Pengobatan Herbal pada Pasien Diabetes Mellitus. Jurnal Penelitian Perawat Profesional, 5(4), 1541–1552. https://doi.org/10.37287/jppp.v5i4.1891

Su, Y., Huang, C., Yin, W., Lyu, X., Ma, L., & Tao, Z. (2023). Diabetes Mellitus risk prediction using age adaptation models. Biomedical Signal Processing and Control, 80(2), 104381. https://doi.org/10.1016/j.bspc.2022.104381

Suprayitna, M., Hajri, Z., Fatmawati, B. R., Prihatin, K., & Nadrati, B. (2023). Deteksi Dini Diabetes Mellitus (DM) Melalui “Mawas DM.” BERNAS: Jurnal Pengabdian Kepada Masyarakat, 4(3), 2291–2296. https://doi.org/10.31949/jb.v4i3.5655

Tran, V.-L., & Kim, J.-K. (2023). Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams. Expert Systems with Applications, 221, 119768. https://doi.org/10.1016/j.eswa.2023.119768

Tsybikova, E. B., Kotlovsky, M. Y., & Kaigorodova, T. V. (2024). Diabetes Mellitus and Its Complications: Current State. Analytical Review. Social Aspects of Population Health, 70(3), 13. https://doi.org/10.21045/2071-5021-2024-70-3-13




DOI: https://doi.org/10.56486/jeis.vol5no1.681

Article Metrics

Abstract view : 106 times
PDF - 92 times

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Nur Tri Ramadhanti Adiningrum, Nisa Hanum Harani

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TERINDEKS OLEH :

Â