Pengembangan Model Prediksi Kelulusan Siswa SMA ke Perguruan Tinggi Negeri (PTN) Menggunakan Machine learning Berbasis CRISP-DM

Authors

  • Achmad Arif Mulyana Universitas Padjadjaran, Indonesia
  • Samidi Samidi Universitas Padjadjaran, Indonesia | Universitas Budi Luhur, Indonesia

DOI:

https://doi.org/10.51278/aj.v8i2.2632

Keywords:

CRISP-DM, Decision Tree, machine learning, Naive Bayes

Abstract

The development of educational technology has witnessed the use of machine learning as a basis for data-driven decision making in the education sector. The objective of this research is to build a predictive model of admission of high school students at State Universities (PTN) using a machine learning method based on the Cross Industry Standard Process for Data Mining (CRISP-DM) framework. The data was obtained from the Student Center tutoring institution and consisted of academic, non-academic, and socio-economic attributes. Three classification algorithms i.e. Decision Tree, Random Forest, Naive Bayes were used. The split percentage method was employed in the model evaluation with the training and testing data division schemes of 70:30, 80:20, and 90:10. The results indicated that the Random Forest algorithm had the highest average accuracy at 96.63%, followed by the Decision Tree with 95.48% and the Naive Bayes with 94.06%. It was found that class ranking, try-out scores, school accreditation and attendance rate are several variables that significantly affect the students’ chances to be accepted in PTN. These findings demonstrate that academic success is determined not only by learning achievement, but also by study discipline and educational environment quality. This research contributes to the advancement of educational data mining and supports educational innovation by using machine learning as a decision support system for student guidance and academic evaluation.

References

Abuzinadah, N., Umer, M., Ishaq, A., Al Hejaili, A., Alsubai, S., Eshmawi, A. A., Mohamed, A., & Ashraf, I. (2023). Role of convolutional features and machine learning for predicting student academic performance from MOODLE data. PLOS ONE, 18(11), e0293061. https://doi.org/10.1371/journal.pone.0293061

Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75. https://doi.org/10.1108/JARHE-09-2017-0113

Ahmed, E. (2024). Student Performance Prediction Using Machine Learning Algorithms. Applied Computational Intelligence and Soft Computing, 2024(1), 1–15. https://doi.org/10.1155/2024/4067721

Ahmed, W., Wani, M. A., Plawiak, P., Meshoul, S., Mahmoud, A., & Hammad, M. (2025). Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Scientific Reports, 15(1), 26879. https://doi.org/10.1038/s41598-025-12353-4

Airlangga, G. (2024). Predicting Student Performance Using Deep Learning Models: A Comparative Study of MLP, CNN, BiLSTM, and LSTM with Attention. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(4), 1561–1567. https://doi.org/10.57152/malcom.v4i4.1668

Al-Tameemi, G., Xue, J., Ali, I. H., & Ajit, S. (2024). A Hybrid Machine Learning Approach for Predicting Student Performance Using Multi-class Educational Datasets. Procedia Computer Science, 238, 888–895. https://doi.org/10.1016/j.procs.2024.06.108

Almalawi, A., Soh, B., Li, A., & Samra, H. (2024). Predictive Models for Educational Purposes: A Systematic Review. Big Data and Cognitive Computing, 8(12), 187. https://doi.org/10.3390/bdcc8120187

AlOtaibi, N. F. (2025). A Web-based Decision Support Platform for Student Performance Prediction using Machine Learning. International Journal of Advances in Artificial Intelligence and Machine Learning, 2(3), 170–177. https://doi.org/10.58723/ijaaiml.v2i3.542

Andrianof, H., Aggy Pramana Gusman, & Okta Andrica Putra. (2025). Implementasi Algoritma Random Forest untuk Prediksi Kelulusan Mahasiswa Berdasarkan Data Akademik: Studi Kasus di Perguruan Tinggi Indonesia. Jurnal Sains Informatika Terapan, 4(1), 24–28. https://doi.org/10.62357/jsit.v4i2.464

Arifin, M., Nugraha, F., & Fithri, D. L. (2026). Student Performance Classification Using Academic, Socioeconomic, and Digital Behavior Features: A Comparative Study. Journal of Information Systems and Informatics, 8(1), 223–239. https://doi.org/10.63158/journalisi.v8i1.1460

Azizah, Z., Ohyama, T., Zhao, X., Ohkawa, Y., & Mitsuishi, T. (2024). Predicting at-risk students in the early stage of a blended learning course via machine learning using limited data. Computers and Education: Artificial Intelligence, 7, 100261. https://doi.org/10.1016/j.caeai.2024.100261

Berka, P., & Marek, L. (2021). Bachelor’s degree student dropouts: Who tend to stay and who tend to leave? Studies in Educational Evaluation, 70, 100999. https://doi.org/10.1016/j.stueduc.2021.100999

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Caday, M. B., Dalaorao, G. A., & Abalorio, C. C. (2025). Academic Performance Analysis of Diploma Students Using Predictive Analytics. Procedia Computer Science, 257, 913–920. https://doi.org/10.1016/j.procs.2025.03.117

Chen, J., Zhou, X., Yao, J., & Tang, S.-K. (2025). Application of machine learning in higher education to predict students’ performance, learning engagement and self-efficacy: a systematic literature review. Asian Education and Development Studies, 14(2), 205–240. https://doi.org/10.1108/AEDS-08-2024-0166

Cheng, B., Liu, Y., & Jia, Y. (2024). Evaluation of students’ performance during the academic period using the XG-Boost Classifier-Enhanced AEO hybrid model. Expert Systems with Applications, 238, 122136. https://doi.org/10.1016/j.eswa.2023.122136

Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., & Gulam, S. B. (2025). Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors. Systems and Soft Computing, 7, 200352. https://doi.org/10.1016/j.sasc.2025.200352

Delogu, M., Lagravinese, R., Paolini, D., & Resce, G. (2024). Predicting dropout from higher education: Evidence from Italy. Economic Modelling, 130, 106583. https://doi.org/10.1016/j.econmod.2023.106583

de Oliveira, C. F., Sobral, S. R., Ferreira, M. J., & Moreira, F. (2024). Interpretable Success Prediction in a Computer Networks Curricular Unit Using Machine Learning. Procedia Computer Science, 239, 598–605. https://doi.org/10.1016/j.procs.2024.06.212

Diponegoro, M. H., Kusumawardani, S. S., & Hidayah, I. (2021). Tinjauan Pustaka Sistematis: Implementasi Metode Deep Learning pada Prediksi Kinerja Murid. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 10(2), 131–138. https://doi.org/10.22146/jnteti.v10i2.1417

Dutt, A., Ismail, M. A., & Herawan, T. (2017). A Systematic Review on Educational Data Mining. IEEE Access, 5, 15991–16005. https://doi.org/10.1109/ACCESS.2017.2654247

Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting Academic Success of College Students Using Machine Learning Techniques. Data, 9(4), 60. https://doi.org/10.3390/data9040060

Hussain, S., & Khan, M. Q. (2023). Student-Performulator: Predicting Students’ Academic Performance at Secondary and Intermediate Level Using Machine Learning. Annals of Data Science, 10(3), 637–655. https://doi.org/10.1007/s40745-021-00341-0

Issah, I., Appiah, O., Appiahene, P., & Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, 7, 100204. https://doi.org/10.1016/j.dajour.2023.100204

Jang, Y., Choi, S., Jung, H., & Kim, H. (2022). Practical early prediction of students’ performance using machine learning and eXplainable AI. Education and Information Technologies, 27(9), 12855–12889. https://doi.org/10.1007/s10639-022-11120-6

Kaensar, C., & Wongnin, W. (2023). Predicting new student performances and identifying important attributes of admission data using machine learning techniques with hyperparameter tuning. Eurasia Journal of Mathematics, Science and Technology Education, 19(12), em2369. https://doi.org/10.29333/ejmste/13863

Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x

Maniyan, S., Ghousi, R., & Haeri, A. (2024). Data mining-based decision support system for educational decision makers: Extracting rules to enhance academic efficiency. Computers and Education: Artificial Intelligence, 6, 100242. https://doi.org/10.1016/j.caeai.2024.100242

Matzavela, V., & Alepis, E. (2021). Decision tree learning through a Predictive Model for Student Academic Performance in Intelligent M-Learning environments. Computers and Education: Artificial Intelligence, 2, 100035. https://doi.org/10.1016/j.caeai.2021.100035

Nachouki, M., & Abou Naaj, M. (2022). Predicting Student Performance to Improve Academic Advising Using the Random Forest Algorithm. International Journal of Distance Education Technologies, 20(1), 1–17. https://doi.org/10.4018/IJDET.296702

Nakhipova, V., Kerimbekov, Y., Umarova, Z., Bulbul, H. ibrahim, Suleimenova, L., & Adylbekova, E. (2024). Integration of Collaborative Filtering Into Naive Bayes Method to Enhance Student Performance Prediction. International Journal of Information and Communication Technology Education, 20(1), 1–18. https://doi.org/10.4018/IJICTE.352512

Nimy, E., Mosia, M., & Chibaya, C. (2023). Identifying At-Risk Students for Early Intervention—A Probabilistic Machine Learning Approach. Applied Sciences, 13(6), 3869. https://doi.org/10.3390/app13063869

Okoye, K., Nganji, J. T., Escamilla, J., & Hosseini, S. (2024). Machine learning model (RG-DMML) and ensemble algorithm for prediction of students’ retention and graduation in education. Computers and Education: Artificial Intelligence, 6, 100205. https://doi.org/10.1016/j.caeai.2024.100205

Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80, 3782–3785. https://doi.org/10.1016/j.matpr.2021.07.382

Pecuchova, J., & Drlik, M. (2023). Predicting Students at Risk of Early Dropping Out from Course Using Ensemble Classification Methods. Procedia Computer Science, 225, 3223–3232. https://doi.org/10.1016/j.procs.2023.10.316

Ravi, K., Kotecha, K., & Pawar, A. (2026). Predicting student performance: A comprehensive review of machine learning, deep learning, and explainable AI approaches. Computers and Education: Artificial Intelligence, 9, 100093. https://doi.org/10.1016/j.caeai.2026.100093

Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3). https://doi.org/10.1002/widm.1355

Saputri, R. A., & Rosnita, L. (2025). A random forest-based predictive model for student academic performance: A case study in Indonesian public high schools. Journal of Applied Informatics and Computing, 9(1), 1–12. https://doi.org/10.30871/jaic.v9i1.9460

Schröer, C., Kruse, F., & Gómez, J. M. (2021). A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199

Solano, J. A., Lancheros Cuesta, D. J., Umaña Ibáñez, S. F., & Coronado-Hernández, J. R. (2022). Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test. Procedia Computer Science, 198, 512–517. https://doi.org/10.1016/j.procs.2021.12.278

Sultana, S., Khan, S., & Abbas, M. A. (2017). Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. International Journal of Electrical Engineering & Education, 54(2), 105–118. https://doi.org/10.1177/0020720916688484

Sun, X., Pelet, J.-É., Dai, S., & Ma, Y. (2023). The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce Context. Sustainability, 15(23), 16320. https://doi.org/10.3390/su152316320

Tan, C. J., & Tamayo, A. R. (2025). Academic performance analysis of diploma students using predictive analytics. Procedia Computer Science, 258, 1625–1636. https://doi.org/10.1016/j.procs.2025.02.854.

Thakur, S. N., & Rawat, B. (2025). Prediction of Student Performance in E-Learning Environment using Machine Learning Techniques. Procedia Computer Science, 259, 1416–1425. https://doi.org/10.1016/j.procs.2025.04.096

Tran, T., Nguyen, G., Nguyen, D., & Le, A. (2023). Decision tree and random forest algorithms for student performance prediction: A comparative study. Journal of Computer Science, 19(2), 105–118. https://doi.org/10.3844/jcssp.2023.105.118

Waheed, H., Hassan, S.-U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104(1), 106189. https://doi.org/10.1016/j.chb.2019.106189

Wang, J., & Yu, Y. (2025). Machine learning approach to student performance prediction of online learning. PLOS ONE, 20(1), e0299018. https://doi.org/10.1371/journal.pone.0299018

Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 137–144. https://doi.org/10.1016/j.chb.2019.04.015

Yao, H., Tian, L., & Guo, L. (2023). Student course grade prediction using the random forest algorithm: Analysis of predictors' importance. Social Sciences & Humanities Open, 8(1), 100521. https://doi.org/10.1016/j.ssaho.2023.100521

Zeineddine, H., Braendle, U., & Farah, A. (2021). Enhancing prediction of student success: Automated machine learning approach. Computers & Electrical Engineering, 89(1), 106903. https://doi.org/10.1016/j.compeleceng.2020.106903

Downloads

Published

2026-06-21

How to Cite

Achmad Arif Mulyana, & Samidi, S. (2026). Pengembangan Model Prediksi Kelulusan Siswa SMA ke Perguruan Tinggi Negeri (PTN) Menggunakan Machine learning Berbasis CRISP-DM. Attractive : Innovative Education Journal, 8(2), 46–60. https://doi.org/10.51278/aj.v8i2.2632

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.