Patient Prevention Prediction and Diagnosis Using Data Mining in Healthcare Quality Management

Main Article Content

Noviyanty
juvinal Ximenes guterres
Adozinda Soares Gusmao
Domingas Soares
Anita Guterres
Recardina Freitas da Silva

Abstract

The expansion of digital medical records and clinical data has strengthened the development of intelligent analytical systems to support early disease detection and improve diagnostic accuracy. This study aims to evaluate the performance of three classification algorithms, namely Random Forest, Support Vector Machine, and Logistic Regression, in predicting stroke risk using multidimensional patient clinical information. The dataset consists of 224 patient records derived from the Kaggle Stroke Dataset and additional questionnaire data collected from hospitals and primary health centers. The variables include demographic characteristics, clinical history, lifestyle factors, and physiological indicators. The research methodology involves several stages, including data preprocessing, feature selection using ANOVA F value, class balancing through the Synthetic Minority Oversampling Technique, model training, and performance evaluation using Accuracy, Precision, Recall, F1 Score, Matthews Correlation Coefficient, and Area Under the Curve. The results indicate that the Random Forest model achieves the highest performance, with an accuracy of 0.91 and an Area Under the Curve of 0.91, outperforming Support Vector Machine and Logistic Regression. This outcome confirms the effectiveness of ensemble based approaches in identifying complex nonlinear patterns and managing imbalanced data. The study contributes to healthcare quality improvement by providing a reliable prediction framework that supports early clinical decision making, reduces diagnostic delays, and enhances patient care outcomes.

Article Details

Section
Articles

References

F. A. H. Airi, T. Suprapti, and A. Bahtiar, “Komparasi Metode Klasifikasi Data Mining Untuk Prediksi Penyakit Stroke,” E-Link J. Tek. Elektro dan Inform., vol. 18, no. 1, p. 73, 2023, doi: 10.30587/e-link.v18i1.5271.

A. I. Qureshi et al., “Management of acute ischemic stroke in patients with COVID-19 infection: Report of an international panel,” Int. J. Stroke, vol. 15, no. 5, pp. 540–554, 2020, doi: 10.1177/1747493020923234.

T. Mora, D. Roche, and B. Rodríguez-Sánchez, “Predicting The Onset Of Diabetes-Related Complications After A Diabetes Diagnosis With Machine Learning Algorithms,” Diabetes Res. Clin. Pract., vol. 204, no. August, p. 110910, 2023, doi: 10.1016/j.diabres.2023.110910.

O. Ali, W. Abdelbaki, A. Shrestha, E. Elbasi, M. A. A. Alryalat, and Y. K. Dwivedi, “A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities,” J. Innov. Knowl., vol. 8, no. 1, 2023, doi: 10.1016/j.jik.2023.100333.

F. Ojadi, S. Kusi-Sarpong, I. J. Orji, C. Bai, H. Gupta, and U. K. Okwara, “A decision support framework for socially responsible supplier selection in the Nigerian banking industry,” J. Bus. Ind. Mark., vol. 38, no. 10, pp. 2220–2239, 2023, doi: 10.1108/JBIM-03-2022-0139.

K. SaThierbach et al., No 主観的健康感を中心とした在宅高齢者における 健康関連指標に関する共分散構造分析Title, vol. 3, no. 1. 2015. [Online]. Available: http://dx.doi.org/10.1016/j.bpj.2015.06.056%0Ahttps://academic.oup.com/bioinformatics/article-abstract/34/13/2201/4852827%0Ainternal-pdf://semisupervised-3254828305/semisupervised.ppt%0Ahttp://dx.doi.org/10.1016/j.str.2013.02.005%0Ahttp://dx.doi.org/10.10

O. E. Akinbowale, H. E. Klingelhöfer, and M. F. Zerihun, “Development of a multi-objectives integer programming model for allocation of anti-fraud capacities during cyberfraud mitigation,” Journal of Financial Crime. Emerald, 2022. doi: 10.1108/jfc-10-2022-0245.

I. Data, M. Untuk, M. Kinerja, K. Karyawan, M. Metode, and R. Linier, “Implementasi Data Mining Untuk Memprediksi Kinerja,” vol. 02, no. 1, pp. 127–135, 2022.

I. A. Piana and R. Hidayat, “Analisis Prediksi Kebangkrutan Perusahaan Transportasi Menggunakan Altman, Grover Dan Springate Di Bursa Efek Indonesia,” Ekobis, vol. 24, no. 2, pp. 72–82, 2023.

J. B, J. A. K. R, and D. P. S. Ganesh, “Credit Card Fraud Detection with Unbalanced Real and Synthetic dataset using Machine Learning models,” 2022 Int. Conf. Electron. Syst. Intell. Comput., 2022, doi: 10.1109/icesic53714.2022.9783529.

M. S., “Survey Paper on Fraud Detection in Medicare Using Machine Learning,” Int. J. Psychosoc. Rehabil., vol. 24, no. 5, pp. 4170–4174, 2020, doi: 10.37200/ijpr/v24i5/pr2020130.

H. Hajishah et al., “Evaluation of machine learning methods for prediction of heart failure mortality and readmission: meta-analysis,” BMC Cardiovasc. Disord., vol. 25, no. 1, 2025, doi: 10.1186/s12872-025-04700-0.

A. Amirahmadi, M. Ohlsson, and K. Etminani, “Deep learning prediction models based on EHR trajectories: A systematic review,” J. Biomed. Inform., vol. 144, no. June, 2023, doi: 10.1016/j.jbi.2023.104430.

D. Zhou, H. Qiu, L. Wang, and M. Shen, “Risk prediction of heart failure in patients with ischemic heart disease using network analytics and stacking ensemble learning,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–12, 2023, doi: 10.1186/s12911-023-02196-2.

R. E. Pratley, L. G. Kanapka, M. R. Rickels, A. Ahmann, and ..., “Effect of continuous glucose monitoring on hypoglycemia in older adults with type 1 diabetes: a randomized clinical trial,” Jama, 2020, [Online]. Available: https://jamanetwork.com/journals/jama/article-abstract/2767159

A. K. Davis, F. S. Barrett, D. G. May, M. P. Cosimano, and ..., “Effects of psilocybin-assisted therapy on major depressive disorder: a randomized clinical trial,” JAMA …, 2021, [Online]. Available: https://jamanetwork.com/journals/jamapsychiatry/article-abstract/2772630

A. Atmoko, “Desain dan Analisis Data Penelitian,” no. September, 2019, doi: 10.13140/RG.2.2.13297.86884.

R. Kaur and N. Gupta, “Harnessing Decision Tree-guided Dynamic Oversampling for Intrusion Detection,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 5, pp. 17456–17463, 2024, doi: 10.48084/etasr.8244.

B. R. Santoso, E. E. M. Gaghauna, and I. Akbar, “Prediksi Kejadian Penyakit Jantung Dan Pembuluh Darah Di Upt Puskesmas Rawat Inap Alabio,” J. Persat. Perawat Nas. Indones., vol. 8, no. 1, p. 1, 2023, doi: 10.32419/jppni.v8i1.360.

M. A. Wiratama and W. M. Pradnya, “Optimasi Algoritma Data Mining Menggunakan Backward Elimination untuk Klasifikasi Penyakit Diabetes,” J. Nas. Pendidik. Tek. Inform., vol. 11, no. 1, p. 1, 2022, doi: 10.23887/janapati.v11i1.45282.

H. El-Sofany, B. Bouallegue, and Y. M. A. El-Latif, “A proposed technique for predicting heart disease using machine learning algorithms and an explainable AI method,” Sci. Rep., vol. 14, no. 1, pp. 1–18, 2024, doi: 10.1038/s41598-024-74656-2.

P. T. S. Ningsih, M. Gusvarizon, and R. Hermawan, “Analisis Sistem Pendeteksi Penipuan Transaksi Kartu Kredit dengan Algoritma Machine Learning,” J. Teknol. Inform. dan Komput., vol. 8, no. 2, pp. 386–401, 2022, doi: 10.37012/jtik.v8i2.1306.

U. G. Ketenci, T. Kurt, S. Onal, C. Erbil, S. Akturkoglu, and H. S. Ilhan, “A Time-Frequency Based Suspicious Activity Detection for Anti-Money Laundering,” IEEE Access, vol. 9, pp. 59957–59967, 2021, doi: 10.1109/ACCESS.2021.3072114.

U. Pujianto, I. A. E. Zaeni, and K. I. Rasyida, “Comparison of Naive Bayes and Random Forests Classifier in the Classification of News Article Popularity as Learning Material,” Proc. 1st UMGESHIC Int. Semin. Heal. Soc. Sci. Humanit. (UMGESHIC-ISHSSH 2020), vol. 585, pp. 229–242, 2021, doi: 10.2991/assehr.k.211020.036.

V. Uma Rani, V. Saravanan, and J. J. Tamilselvi, “A Hybrid Grey Wolf-Meta Heuristic Optimization and Random Forest Classifier for Handling Imbalanced Credit Card Fraud Data,” Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 9s, pp. 718–734, 2023.

M. Salem, M. EL-Sayed Gabr, M. Mossad, and H. Mahanna, “Random Forest modelling and evaluation of the performance of a full-scale subsurface constructed wetland plant in Egypt,” Ain Shams Eng. J., vol. 13, no. 6, p. 101778, 2022, doi: 10.1016/j.asej.2022.101778.