ANALISIS PERBANDINGAN ENSEMBLE MACHINE LEARNING DENGAN TEKNIK SMOTE UNTUK PREDIKSI DIABETES
Abstract
High blood glucose levels characterize a chronic disease called diabetes. Patients with diabetes will eventually experience health problems. These cases show that early detection and better diagnosis are needed. Although several Machine Learning (ML) models have been widely used in diabetes diagnosis, the algorithm performance is still between 70 - 79%. This study evaluates the use of Ensemble Machine Learning to predict diabetes using the Pima Indian Diabetes dataset. The models compared are Support Vector Machine, Linear Regression, Naive Bayes, Random Forest, AdaBoost, K Nearest Neighbour, and Decision Tree. The dataset will also be balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to reduce accuracy bias. Cross-Industry Standard Process For Data Mining (CRISP-DM) is the methodology used. The accuracy results show that Random Forest with Bagging and Hard-Voting produces the best accuracy of other models. Where Random Forest produces an accuracy of 81.16% and Hard-Voting also produces an accuracy of 81.16%.
Penyakit kronis yang disebut diabetes ditandai dengan kadar glukosa darah yang tinggi. Pasien dengan diabetes pada akhirnya akan mengalami masalah kesehatan. Kasus-kasus ini menunjukkan bahwa deteksi dini dan diagnosis yang lebih baik diperlukan. Meskipun beberapa model Machine Learning (ML) telah banyak digunakan dalam diagnosis diabetes, kinerja algoritmanya masih antara 70 - 79%. Untuk memutuskan apakah seseorang menderita diabetes atau tidak, penelitian ini mengevaluasi penggunaan Ensemble Machine Learning untuk memprediksi diabetes menggunakan dataset Diabetes Pima Indian. Model yang dibandingkan adalah Support Vector Machine, Linear Regression, Naive Bayes, Random Forest, Adaboost, K Nearest Neighbor, dan Decision Tree. Untuk mengurangi bias akurasi, dataset juga akan diseimbangkan menggunakan Synthetic Minority Over-sampling Technique (SMOTE). Cross-Industry Standard Process For Data Mining (CRISP-DM) adalah metodologi yang digunakan. Hasil akurasi menunjukkan bahwa Random Forest dengan Bagging dan Hard-Voting menghasilkan akurasi terbaik dari model lainnya. Dimana Random Forest menghasilkan akurasi sebesar 81,16% dan Hard-Voting juga menghasilkan akurasi sebesar 81,16%.
Full Text:
PDFReferences
Carter, J. A., Long, C. S., Smith, B. P., Smith, T. L., & Donati, G. L. (2019). Combining elemental analysis of toenails and machine learning techniques as a non-invasive diagnostic tool for the robust classification of type-2 diabetes. Expert Systems with Applications, 115, 245–255. https://doi.org/10.1016/j.eswa.2018.08.002
Chaki, J., Ganesh, S. T., Cidham, S. ., & Theertan, S. A. (2022). Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review. Journal of King Saud University - Computer and Information Sciences, 34(6), 3204–3225. https://doi.org/10.1016/j.jksuci.2020.06.013
Chang, V., Ganatra, M. A., Hall, K., Golightly, L., & Xu, Q. A. (2022). An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators. Healthcare Analytics, 2, 100118. https://doi.org/10.1016/j.health.2022.100118
Joloudari, J. H., Marefat, A., Nematollahi, M. A., Oyelere, S. S., & Hussain, S. (2023). Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Applied Sciences, 13(6), 4006. https://doi.org/10.3390/app13064006
Joshi, R. D., & Dhakal, C. K. (2021). Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. International Journal of Environmental Research and Public Health, 18(14), 7346. https://doi.org/10.3390/ijerph18147346
Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432–439. https://doi.org/10.1016/j.icte.2021.02.004
Kibria, H. B., Nahiduzzaman, M., Goni, M. O. F., Ahsan, M., & Haider, J. (2022). An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. Sensors, 22(19), 7268. https://doi.org/10.3390/s22197268
Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 4, 40–46. https://doi.org/10.1016/j.ijcce.2021.01.001
Makroum, M. A., Adda, M., Bouzouane, A., & Ibrahim, H. (2022). Machine Learning and Smart Devices for Diabetes Management: Systematic Review. Sensors, 22(5), 1843. https://doi.org/10.3390/s22051843
Mistry, S., Riches, N. O., Gouripeddi, R., & Facelli, J. C. (2023). Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artificial Intelligence in Medicine, 135, 102461. https://doi.org/10.1016/j.artmed.2022.102461
Nicolucci, A., Romeo, L., Bernardini, M., Vespasiani, M., Rossi, M. C., Petrelli, M., Ceriello, A., Bartolo, P. Di, Frontoni, E., & Vespasiani, G. (2022). Prediction of complications of type 2 Diabetes: A Machine learning approach. Diabetes Research and Clinical Practice, 190, 110013. https://doi.org/10.1016/j.diabres.2022.110013
Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
Plotnikova, V., Dumas, M., & Milani, F. P. (2022). Applying the CRISP-DM data mining process in the financial services industry: Elicitation of adaptation requirements. Data & Knowledge Engineering, 139, 102013. https://doi.org/10.1016/j.datak.2022.102013
Raghuwanshi, B. S., & Shukla, S. (2020). SMOTE based class-specific extreme learning machine for imbalanced learning. Knowledge-Based Systems, 187, 104814. https://doi.org/10.1016/j.knosys.2019.06.022
Rajendra, P., & Latifi, S. (2021). Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update, 1, 100032. https://doi.org/10.1016/j.cmpbup.2021.100032
Rastogi, R., & Bansal, M. (2023). Diabetes prediction model using data mining techniques. Measurement: Sensors, 25, 100605. https://doi.org/10.1016/j.measen.2022.100605
Retta, E., Kusumajaya, H., & Arjuna, A. (2023). Faktor – faktor yang Berhubungan dengan Pemilihan Pengobatan Herbal pada Pasien Diabetes Mellitus. Jurnal Penelitian Perawat Profesional, 5(4), 1541–1552. https://doi.org/10.37287/jppp.v5i4.1891
Su, Y., Huang, C., Yin, W., Lyu, X., Ma, L., & Tao, Z. (2023). Diabetes Mellitus risk prediction using age adaptation models. Biomedical Signal Processing and Control, 80(2), 104381. https://doi.org/10.1016/j.bspc.2022.104381
Suprayitna, M., Hajri, Z., Fatmawati, B. R., Prihatin, K., & Nadrati, B. (2023). Deteksi Dini Diabetes Mellitus (DM) Melalui “Mawas DM.” BERNAS: Jurnal Pengabdian Kepada Masyarakat, 4(3), 2291–2296. https://doi.org/10.31949/jb.v4i3.5655
Tran, V.-L., & Kim, J.-K. (2023). Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams. Expert Systems with Applications, 221, 119768. https://doi.org/10.1016/j.eswa.2023.119768
Tsybikova, E. B., Kotlovsky, M. Y., & Kaigorodova, T. V. (2024). Diabetes Mellitus and Its Complications: Current State. Analytical Review. Social Aspects of Population Health, 70(3), 13. https://doi.org/10.21045/2071-5021-2024-70-3-13
DOI: https://doi.org/10.56486/jeis.vol5no1.681
Article Metrics
Abstract view : 106 timesPDF - 92 times
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Nur Tri Ramadhanti Adiningrum, Nisa Hanum Harani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
TERINDEKS OLEH :






