Pengembangan Model Prediksi Kelulusan Siswa SMA ke Perguruan Tinggi Negeri (PTN) Menggunakan Machine learning Berbasis CRISP-DM
DOI:
https://doi.org/10.51278/aj.v8i2.2632Keywords:
CRISP-DM, Decision Tree, machine learning, Naive BayesAbstract
The development of educational technology has witnessed the use of machine learning as a basis for data-driven decision making in the education sector. The objective of this research is to build a predictive model of admission of high school students at State Universities (PTN) using a machine learning method based on the Cross Industry Standard Process for Data Mining (CRISP-DM) framework. The data was obtained from the Student Center tutoring institution and consisted of academic, non-academic, and socio-economic attributes. Three classification algorithms i.e. Decision Tree, Random Forest, Naive Bayes were used. The split percentage method was employed in the model evaluation with the training and testing data division schemes of 70:30, 80:20, and 90:10. The results indicated that the Random Forest algorithm had the highest average accuracy at 96.63%, followed by the Decision Tree with 95.48% and the Naive Bayes with 94.06%. It was found that class ranking, try-out scores, school accreditation and attendance rate are several variables that significantly affect the students’ chances to be accepted in PTN. These findings demonstrate that academic success is determined not only by learning achievement, but also by study discipline and educational environment quality. This research contributes to the advancement of educational data mining and supports educational innovation by using machine learning as a decision support system for student guidance and academic evaluation.
References
Abuzinadah, N., Umer, M., Ishaq, A., Al Hejaili, A., Alsubai, S., Eshmawi, A. A., Mohamed, A., & Ashraf, I. (2023). Role of convolutional features and machine learning for predicting student academic performance from MOODLE data. PLOS ONE, 18(11), e0293061. https://doi.org/10.1371/journal.pone.0293061
Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75. https://doi.org/10.1108/JARHE-09-2017-0113
Ahmed, E. (2024). Student Performance Prediction Using Machine Learning Algorithms. Applied Computational Intelligence and Soft Computing, 2024(1), 1–15. https://doi.org/10.1155/2024/4067721
Ahmed, W., Wani, M. A., Plawiak, P., Meshoul, S., Mahmoud, A., & Hammad, M. (2025). Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Scientific Reports, 15(1), 26879. https://doi.org/10.1038/s41598-025-12353-4
Airlangga, G. (2024). Predicting Student Performance Using Deep Learning Models: A Comparative Study of MLP, CNN, BiLSTM, and LSTM with Attention. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(4), 1561–1567. https://doi.org/10.57152/malcom.v4i4.1668
Al-Tameemi, G., Xue, J., Ali, I. H., & Ajit, S. (2024). A Hybrid Machine Learning Approach for Predicting Student Performance Using Multi-class Educational Datasets. Procedia Computer Science, 238, 888–895. https://doi.org/10.1016/j.procs.2024.06.108
Almalawi, A., Soh, B., Li, A., & Samra, H. (2024). Predictive Models for Educational Purposes: A Systematic Review. Big Data and Cognitive Computing, 8(12), 187. https://doi.org/10.3390/bdcc8120187
AlOtaibi, N. F. (2025). A Web-based Decision Support Platform for Student Performance Prediction using Machine Learning. International Journal of Advances in Artificial Intelligence and Machine Learning, 2(3), 170–177. https://doi.org/10.58723/ijaaiml.v2i3.542
Andrianof, H., Aggy Pramana Gusman, & Okta Andrica Putra. (2025). Implementasi Algoritma Random Forest untuk Prediksi Kelulusan Mahasiswa Berdasarkan Data Akademik: Studi Kasus di Perguruan Tinggi Indonesia. Jurnal Sains Informatika Terapan, 4(1), 24–28. https://doi.org/10.62357/jsit.v4i2.464
Arifin, M., Nugraha, F., & Fithri, D. L. (2026). Student Performance Classification Using Academic, Socioeconomic, and Digital Behavior Features: A Comparative Study. Journal of Information Systems and Informatics, 8(1), 223–239. https://doi.org/10.63158/journalisi.v8i1.1460
Azizah, Z., Ohyama, T., Zhao, X., Ohkawa, Y., & Mitsuishi, T. (2024). Predicting at-risk students in the early stage of a blended learning course via machine learning using limited data. Computers and Education: Artificial Intelligence, 7, 100261. https://doi.org/10.1016/j.caeai.2024.100261
Berka, P., & Marek, L. (2021). Bachelor’s degree student dropouts: Who tend to stay and who tend to leave? Studies in Educational Evaluation, 70, 100999. https://doi.org/10.1016/j.stueduc.2021.100999
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Caday, M. B., Dalaorao, G. A., & Abalorio, C. C. (2025). Academic Performance Analysis of Diploma Students Using Predictive Analytics. Procedia Computer Science, 257, 913–920. https://doi.org/10.1016/j.procs.2025.03.117
Chen, J., Zhou, X., Yao, J., & Tang, S.-K. (2025). Application of machine learning in higher education to predict students’ performance, learning engagement and self-efficacy: a systematic literature review. Asian Education and Development Studies, 14(2), 205–240. https://doi.org/10.1108/AEDS-08-2024-0166
Cheng, B., Liu, Y., & Jia, Y. (2024). Evaluation of students’ performance during the academic period using the XG-Boost Classifier-Enhanced AEO hybrid model. Expert Systems with Applications, 238, 122136. https://doi.org/10.1016/j.eswa.2023.122136
Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., & Gulam, S. B. (2025). Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors. Systems and Soft Computing, 7, 200352. https://doi.org/10.1016/j.sasc.2025.200352
Delogu, M., Lagravinese, R., Paolini, D., & Resce, G. (2024). Predicting dropout from higher education: Evidence from Italy. Economic Modelling, 130, 106583. https://doi.org/10.1016/j.econmod.2023.106583
de Oliveira, C. F., Sobral, S. R., Ferreira, M. J., & Moreira, F. (2024). Interpretable Success Prediction in a Computer Networks Curricular Unit Using Machine Learning. Procedia Computer Science, 239, 598–605. https://doi.org/10.1016/j.procs.2024.06.212
Diponegoro, M. H., Kusumawardani, S. S., & Hidayah, I. (2021). Tinjauan Pustaka Sistematis: Implementasi Metode Deep Learning pada Prediksi Kinerja Murid. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 10(2), 131–138. https://doi.org/10.22146/jnteti.v10i2.1417
Dutt, A., Ismail, M. A., & Herawan, T. (2017). A Systematic Review on Educational Data Mining. IEEE Access, 5, 15991–16005. https://doi.org/10.1109/ACCESS.2017.2654247
Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting Academic Success of College Students Using Machine Learning Techniques. Data, 9(4), 60. https://doi.org/10.3390/data9040060
Hussain, S., & Khan, M. Q. (2023). Student-Performulator: Predicting Students’ Academic Performance at Secondary and Intermediate Level Using Machine Learning. Annals of Data Science, 10(3), 637–655. https://doi.org/10.1007/s40745-021-00341-0
Issah, I., Appiah, O., Appiahene, P., & Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, 7, 100204. https://doi.org/10.1016/j.dajour.2023.100204
Jang, Y., Choi, S., Jung, H., & Kim, H. (2022). Practical early prediction of students’ performance using machine learning and eXplainable AI. Education and Information Technologies, 27(9), 12855–12889. https://doi.org/10.1007/s10639-022-11120-6
Kaensar, C., & Wongnin, W. (2023). Predicting new student performances and identifying important attributes of admission data using machine learning techniques with hyperparameter tuning. Eurasia Journal of Mathematics, Science and Technology Education, 19(12), em2369. https://doi.org/10.29333/ejmste/13863
Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x
Maniyan, S., Ghousi, R., & Haeri, A. (2024). Data mining-based decision support system for educational decision makers: Extracting rules to enhance academic efficiency. Computers and Education: Artificial Intelligence, 6, 100242. https://doi.org/10.1016/j.caeai.2024.100242
Matzavela, V., & Alepis, E. (2021). Decision tree learning through a Predictive Model for Student Academic Performance in Intelligent M-Learning environments. Computers and Education: Artificial Intelligence, 2, 100035. https://doi.org/10.1016/j.caeai.2021.100035
Nachouki, M., & Abou Naaj, M. (2022). Predicting Student Performance to Improve Academic Advising Using the Random Forest Algorithm. International Journal of Distance Education Technologies, 20(1), 1–17. https://doi.org/10.4018/IJDET.296702
Nakhipova, V., Kerimbekov, Y., Umarova, Z., Bulbul, H. ibrahim, Suleimenova, L., & Adylbekova, E. (2024). Integration of Collaborative Filtering Into Naive Bayes Method to Enhance Student Performance Prediction. International Journal of Information and Communication Technology Education, 20(1), 1–18. https://doi.org/10.4018/IJICTE.352512
Nimy, E., Mosia, M., & Chibaya, C. (2023). Identifying At-Risk Students for Early Intervention—A Probabilistic Machine Learning Approach. Applied Sciences, 13(6), 3869. https://doi.org/10.3390/app13063869
Okoye, K., Nganji, J. T., Escamilla, J., & Hosseini, S. (2024). Machine learning model (RG-DMML) and ensemble algorithm for prediction of students’ retention and graduation in education. Computers and Education: Artificial Intelligence, 6, 100205. https://doi.org/10.1016/j.caeai.2024.100205
Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80, 3782–3785. https://doi.org/10.1016/j.matpr.2021.07.382
Pecuchova, J., & Drlik, M. (2023). Predicting Students at Risk of Early Dropping Out from Course Using Ensemble Classification Methods. Procedia Computer Science, 225, 3223–3232. https://doi.org/10.1016/j.procs.2023.10.316
Ravi, K., Kotecha, K., & Pawar, A. (2026). Predicting student performance: A comprehensive review of machine learning, deep learning, and explainable AI approaches. Computers and Education: Artificial Intelligence, 9, 100093. https://doi.org/10.1016/j.caeai.2026.100093
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3). https://doi.org/10.1002/widm.1355
Saputri, R. A., & Rosnita, L. (2025). A random forest-based predictive model for student academic performance: A case study in Indonesian public high schools. Journal of Applied Informatics and Computing, 9(1), 1–12. https://doi.org/10.30871/jaic.v9i1.9460
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199
Solano, J. A., Lancheros Cuesta, D. J., Umaña Ibáñez, S. F., & Coronado-Hernández, J. R. (2022). Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test. Procedia Computer Science, 198, 512–517. https://doi.org/10.1016/j.procs.2021.12.278
Sultana, S., Khan, S., & Abbas, M. A. (2017). Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. International Journal of Electrical Engineering & Education, 54(2), 105–118. https://doi.org/10.1177/0020720916688484
Sun, X., Pelet, J.-É., Dai, S., & Ma, Y. (2023). The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce Context. Sustainability, 15(23), 16320. https://doi.org/10.3390/su152316320
Tan, C. J., & Tamayo, A. R. (2025). Academic performance analysis of diploma students using predictive analytics. Procedia Computer Science, 258, 1625–1636. https://doi.org/10.1016/j.procs.2025.02.854.
Thakur, S. N., & Rawat, B. (2025). Prediction of Student Performance in E-Learning Environment using Machine Learning Techniques. Procedia Computer Science, 259, 1416–1425. https://doi.org/10.1016/j.procs.2025.04.096
Tran, T., Nguyen, G., Nguyen, D., & Le, A. (2023). Decision tree and random forest algorithms for student performance prediction: A comparative study. Journal of Computer Science, 19(2), 105–118. https://doi.org/10.3844/jcssp.2023.105.118
Waheed, H., Hassan, S.-U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104(1), 106189. https://doi.org/10.1016/j.chb.2019.106189
Wang, J., & Yu, Y. (2025). Machine learning approach to student performance prediction of online learning. PLOS ONE, 20(1), e0299018. https://doi.org/10.1371/journal.pone.0299018
Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 137–144. https://doi.org/10.1016/j.chb.2019.04.015
Yao, H., Tian, L., & Guo, L. (2023). Student course grade prediction using the random forest algorithm: Analysis of predictors' importance. Social Sciences & Humanities Open, 8(1), 100521. https://doi.org/10.1016/j.ssaho.2023.100521
Zeineddine, H., Braendle, U., & Farah, A. (2021). Enhancing prediction of student success: Automated machine learning approach. Computers & Electrical Engineering, 89(1), 106903. https://doi.org/10.1016/j.compeleceng.2020.106903
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Achmad Arif Mulyana, Samidi Samidi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

