Analisis Perbandingan Algoritma Random Forest, Decision Tree Dan Naive Bayes Dalam Mendeteksi Spam SMS

Main Article Content

Vincentius Jason Nyoto
Yosefina Finsensia Riti
Rafael Valentino Patrick Tantokusumo

Abstract

Dalam penelitian ini, dilakukan analisis bandingan yang mendalam terhadap tiga algoritma machine learning untuk mendeteksi spam SMS, yaitu Random Forest, Decision Tree, dan Naive Bayes. Dataset UCI SMS Spam Collection yang memiliki 5.572 pesan digunakan, dan pipeline penuh dijalankan mulai dari menghapus duplikat, ekstraksi fitur TF-IDF, sampai augmentasi data apabila perlu. Pada tahap prapemrosesan, terdapat 403 pesan duplikat yang dibuang (sekitar 7,2%), sehingga akhirnya tersisa 5.169 sampel unik. Model-model ini dilatih dengan split data 80-20 untuk latih dan uji, dan dievaluasi secara menyeluruh menggunakan metrik seperti akurasi, presisi, recall, F1-score, serta matriks kebingungan. Hasilnya menunjukkan ketiga algoritma ini memiliki performa yang sangat tinggi, dengan Naive Bayes yang paling unggul dengan akurasi mencapai 98,1%, lalu Random Forest 97,8%, dan Decision Tree 96,4%. Dari analisis kurva pembelajaran, model konvergen dengan baik dan tidak terlalu overfit. Penelitian ini juga memberikan enam visualisasi lengkap, mulai dari analisis duplikat, distribusi data, word cloud, breakdown split data latih-uji, kurva pembelajaran, sampai matriks kebingungan. Pendekatan machine learning klasik ternyata masih sangat efektif untuk deteksi spam SMS.

Article Details

Section
Articles

References

L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.

G. Salton and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

A. Sauddin, N. Amir, and M. T. Ismail, “Klasifikasi Spam SMS Menggunakan Naïve Bayes Classifier dan K-Nearest Neighbors,” Jurnal Manajemen Sains dan Aplikasi (MSA), 2025.

Available: https://journal.uin-alauddin.ac.id/index.php/msa/article/view/46192

D. Irawan, F. Sari, and H. Kurniawan, “Perbandingan Klasifikasi SMS Berbasis SVM, Naive Bayes Classifier, Random Forest dan Bagging Classifier,” Jurnal SISFOKOM, 2024.

Available: https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/1302

G. Airlangga, “Optimizing SMS Spam Detection Using Machine Learning: A Comparative Analysis of Ensemble and Traditional Classifiers,” Journal of Computer Networks, Architecture and High Performance Computing, 2024.

Available:

https://jurnal.itscience.org/index.php/CNAPC/article/view/4822

M. R. Bishi, S. K. Sahoo, and P. Dash, “Optimizing SMS Spam Detection: Leveraging the Strength of a Voting Classifier Ensemble,” International Journal of Intelligent Systems and Applications in Engineering, 2024.

Available: https://www.ijisae.org/index.php/IJISAE/article/view/5717

D. A. Oyeyemi and A. K. Ojo, “SMS Spam Detection and Classification Using Natural Language Processing,” Journal of Advances in Mathematics and Computer Science, 2023.

Available: https://journaljamcs.com/index.php/JAMCS/article/view/1832

S. K. D. Sharma, “A Comparative Study of Machine Learning Classifiers for Different Language Spam SMS Detection,” Advances in Artificial Intelligence Research, 2024.

Available: https://dergipark.org.tr/en/pub/aair/issue/89140/1549781

Anonymous, “Naive Bayes SMS Spam Filtration System,” Journal of Computer Networks, Architecture and High Performance Computing, 2023.

Available: https://jurnal.itscience.org/index.php/CNAPC/article/view/3875

H. Ramadhan and E. Simatupang, “Analisis Feature Representation untuk Klasifikasi SMS Menggunakan TF-IDF dan Word2Vec,” Jurnal Sistem Informasi TGD, 2025.

Available: https://ojs.trigunadharma.ac.id/index.php/jsi/article/view/10582/3014

“Machine Learning-Based SMS Spam Detection,” International Journal of Computers (IARAS), 2025.

Available: https://www.iaras.org/home/cijc/machine-learning-based-sms-spam-detection

“Klasifikasi SMS Spam Menggunakan Naive Bayes dan SVM,” Jurnal Komputer dan Informatika (JKI), 2023.

Available: https://journal.untar.ac.id/index.php/JKI/article/view/34451

“A Comparative Analysis of Learning Techniques in Turkish SMS Spam Detection,” Bulletin of Advanced Computing Research (BUYASAMBID), 2024.

Available: https://dergipark.org.tr/en/pub/buyasambid/issue/86055/1501609

I. Rish, “An Empirical Study of the Naive Bayes Classifier,” IJCAI Workshop on Empirical Methods in Artificial Intelligence, 2001.

Available: https://www.dors.it/documentazione/testo/201911/10.1.1.330.2788.pdf

T. A. Almeida and J. M. G. Hidalgo, “SMS Spam Collection Dataset,” UCI Machine Learning Repository, 2011.

Available: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

T. A. Almeida and J. M. G. Hidalgo, “SMS Spam Collection Dataset,” UCI Machine Learning Repository, 2011.

Available: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

T. C. Au, “Random Forests, Decision Trees, and Categorical Predictors: The ‘Absent Levels’ Problem,” Journal of Machine Learning Research, vol. 19, pp. 1–30, 2018.

B. Adusumalli, K. K. Kousalya, G. Ramudu, K. B. Sankar, and M. Susmitha, LSTM-Powered Spam Detection: A Deep Learning Approach for Sequential Text Classification, International Journal of Recent Advances in Engineering and Technology, 2025.

Available: International Journal on Research and Development - A Management Review

S. Rojas-Galeano, Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection, ArXiv, Jul. 2021.

Available: https://arxiv.org/abs/2107.06400

N. J. Saputra, Analysis of SMS Spam Detection using TF-IDF: A Study on SMS Spam Collection Dataset, Jurnal Sosial Teknologi, vol. 4, no. 4, Apr. 2024.

Available: https://doi.org/10.59188/jurnalsostech.v4i4.1214