Improving assessment quality: Development of evaluation instruments for sixth-grade Mathematics learnin
Abstract
Valid and reliable assessment instruments are crucial for accurately measuring student competencies, yet many fail. This study aims to develop a Mathematics learning outcome assessment instrument based on the Kurikulum Merdeka for sixth-grade elementary school students, focusing on validity and reliability using the Rasch model. This instrument is designed to encompass cognitive, affective, and psychomotor competencies following the needs of the Kurikulum Merdeka. Analysis was conducted on 213 students using a quantitative approach with Winsteps software to evaluate item quality, unidimensionality, reliability, difficulty level, and potential differential item functioning (DIF). The research results indicate that the instrument is highly reliable and meets the unidimensionality criteria. The distribution of question difficulty levels varies from very easy to very difficult, reflecting the instrument's ability to measure students with a range of abilities. All items meet the fit criteria based on fit statistics (Outfit MNSQ, ZSTD, and Point Measure Correlation). However, two items (Item 1 and Item 5) show significant DIF bias based on gender analysis. This study concludes that this assessment instrument based on the Kurikulum Merdeka is valid, reliable, and suitable for assessing students' mathematical abilities.
Abstrak
Instrumen penilaian yang valid dan reliabel sangat penting untuk mengukur kemampuan peserta didik secara objektif, namun banyak instrumen yang ada belum merepresentasikan kompetensi peserta didik dengan akurat. Penelitian ini bertujuan untuk mengembangkan instrumen penilaian hasil belajar matematika berbasis Kurikulum Merdeka untuk peserta didik kelas VI Sekolah Dasar, dengan fokus pada validitas dan reliabilitas menggunakan model Rasch. Analisis dilakukan pada 213 peserta didik menggunakan pendekatan kuantitatif dengan software Winsteps untuk mengevaluasi kualitas item, unidimensionalitas, reliabilitas, tingkat kesulitan, dan potensi bias diferensial (Differential Item Functioning/ DIF). Hasil penelitian menunjukkan bahwa instrumen memiliki reliabilitas tinggi dan memenuhi kriteria unidimensionalitas. Sebaran tingkat kesulitan soal cukup bervariasi dari sangat mudah hingga sangat sulit, mencerminkan kemampuan instrumen dalam mengukur peserta didik dengan beragam tingkat kemampuan. Semua item memenuhi kriteria kesesuaian berdasarkan statistik kecocokan (Outfit MNSQ, ZSTD, dan Point Measure Correlation), namun terdapat dua item (Item 1 dan Item 5) yang menunjukkan bias DIF signifikan berdasarkan analisis gender. Kesimpulan dari penelitian ini adalah bahwa instrumen penilaian berbasis Kurikulum Merdeka ini valid, reliabel, dan cocok untuk digunakan dalam menilai kemampuan matematika peserta didik.
Kata Kunci: evaluasi instrumen; Kurikulum Merdeka; Model Rasch; penilaian pendidikan; reliabilitas; validitas
Keywords
Full Text:
Download PDF (Bahasa Indonesia)References
Abdullaev, D., Shukhratovna, D. L., Rasulovna, J. O., Umirzakovich, J. U., & Staroverova, O. V. (2024). Examining local item dependence in a cloze test with the Rasch Model. International Journal of Language Testing, 14(1), 75-81.
Adiyana, S. (2024). Peningkatan kemampuan menghitung pecahan melalui Model Problem Based Learning pada siswa kelas VI SDN 01 Ngunut. Jurnal Edukasi Indonesia, 12(3), 120-136.
Azizah, & Wahyuningsih, S. (2020). Penggunaan model Rasch untuk analisis instrumen tes pada Mata kuliah Matematika Aktuaria. Jurnal Pendidikan Matematika (Jupitek), 3(1), 45-50.
Bialo, J. A., & Li, H. (2024). An analysis of DIF and sources of DIF in achievement motivation items using anchoring vignettes. Educational Assessment, 29(4), 293-318.
Dwilesanti, W. G., & Yudiarso, A. (2022). Rasch analysis of the Indonesian version of INDIVIDUAL Work Performance Questionnaire (IWPQ). JP3I (Jurnal Pengukuran Psikologi dan Pendidikan Indonesia), 11(2), 153-167.
El Fahmi, E., Khoirot, U., & Astutik, F. (2021). Analisis psikometri aitem need of agression tes EPPS pada remaja akhir. Psikoislamika: Jurnal Psikologi dan Psikologi Islam, 18(2), 295-306.
Eliza, W., & Yusmaita, E. (2021). Pengembangan butir soal literasi Kimia pada materi sistem koloid kelas XI IPA SMA/MA. Jurnal Eksakta Pendidikan (JEP), 5(2), 197-204.
Erfan, M., Maulyda, M. A., Hidayati, V. R., Astria, F. P., & Ratu, T. (2020). Analisis kualitas soal kemampuan membedakan rangkaian seri dan paralel melalui teori tes klasik dan model rasch. Indonesian Journal of Educational Research and Review, 3(1), 11-19.
Firdaus, F., Huda, A., Irfan, D., & Hebdriyani, Y. (2022). Pengembangan sistem Computer Adaptive Test (CAT) dengan pendekatan Item Response Theory (IRT). EduTech: Jurnal Teknologi Pendidikan, 21(3), 272-286.
Hidayat, R., Patras, Y. E., Harijanto, S., & Hasanah, L. (2020). Analisis instrumen dan prioritas tindakan untuk kepuasan kerja guru di Indonesia berdasarkan pemodelan Rasch. Kelola: Jurnal Manajemen Pendidikan, 7(2), 110-130.
Jones, R. J., Brown, D. E., & Smith, T. L. (2021). Competency-based assessment in modern curriculum: A contextual approach. Educational Measurement Quarterly, 45(3), 150-168.
Juliani, R. P., & Erita, S. (2023). Analisis validitas dan reliabilitas instrumen penilaian kemampuan berpikir kritis dalam konteks sekolah menengah. JEID: Journal of Educational Integration and Development, 3(3), 169-179.
Jumini, J., & Retnawati, H. (2022). Estimating item parameters and student abilities: An IRT 2PL analysis of mathematics examination. Al-Ishlah: Jurnal Pendidikan, 14(1), 385-398.
Kennedy-Shaffer, L., Qiu, X., & Hanage, W. P. (2021). Snowball sampling study design for serosurveys early in disease outbreaks. American Journal of Epidemiology, 190(9), 1918-1927.
Kim, S., & Kim, J. (2022). Advancing Rasch analysis for holistic student assessment. Journal of Educational Measurement, 59(1), 78-95.
Komisia, F., Tukan, M. I. B., & Leba, M. A. U. (2021). Pengembangan perangkat pembelajaran berbasis pendekatan kontekstual untuk siswa SMA. Indonesian Journal of Educational Science (IJES), 3(2), 98-104.
Latifah, M., Saripah, I., Suryana, D., & Sunarya, Y. (2024). Validity and reliability of self-concept instrument using Rasch Model. Jurnal Kajian Bimbingan dan Konseling, 9(1), 26-35.
Marwa, N. W. S., Pitria, P. R., & Madani, F. (2024). Development of authentic assessment of 21st-century skills in kurikulum merdeka. Inovasi Kurikulum, 21(2), 635-646.
Maulana, A. (2022). Analisis validitas, reliabilitas, dan kelayakan instrumen penilaian rasa percaya diri siswa. Jurnal Kualita Pendidikan, 3(3), 133-139.
Natanael, Y., Salsabilla, R., Aulia, D., Khoirunnisa, D., Munawar, H. N., Hidayat, N. S., & Firdaus, R. F. (2022). Rasch rating scale model: Bias detection and validation test of Indonesian-adolescent life satisfaction scale. Psympathic: Jurnal Ilmiah Psikologi, 9(1), 31-44.
Nguyen, T., Pham, L., & Tran, H. (2023). Context-based learning and its impact on problem-solving skills. Educational Research Review, 58(1), 45-67.
Nizaruddin, N., Muhtarom, M., Murtianto, Y. H., & Sutrisno, S. (2024). Examining the self-regulated learning scale using the Rasch model approach. Indonesian Journal of Science and Mathematics Education, 7(3), 518-530.
Noben, I., Maulana, R., Deinum, J. F., & Hofman, W. A. (2021). Measuring university teachers’ teaching quality: A Rasch modelling approach. Learning Environments Research, 24(1), 87-107.
Novriyanti, E., & Arthur, R. (2024). Analisis kualitas butir soal ujian tengah semester Biologi umum menggunakan Model Rasch. JagoMIPA: Jurnal Pendidikan Matematika dan IPA, 4(4), 718-733.
Nudin, I., & Hidayatullah, R. S. (2023). Analisis butir soal penilaian tengah semester menggunakan model Rasch di SMK Negeri 5 Surabaya. JPTM, 12(2), 132-139.
Nurdiana, N. (2023). Meningkatkan hasil belajar operasi hitung bilangan pecahan dengan kartu bilangan siswa kelas VI SD Negeri Krueng Baung. Jurnal Bima: Pusat Publikasi Ilmu Pendidikan Bahasa dan Sastra, 1(3), 338-348.
Ocy, D. R., Rahayu, W., & Makmuri, M. (2023). Rasch model analysis: Development of hots-based mathematical abstraction ability instrument according to Riau Islands Culture. Aksioma: Jurnal Program Studi Pendidikan Matematika, 12(4), 3542-3560.
Oktaviyanthi, R., Agus, R. N., Garcia, M. L. B., & Lertdechapat, K. (2024). Cognitive load scale in learning formal definition of limit: A Rasch model approach. Infinity Journal of Mathematics Education, 13(1), 99-118.
Peng, K., Chen, M., Zhou, L., & Weng, X. (2024). Differential item functioning in the autism behavior checklist in children with autism spectrum disorder based on a machine learning approach. Frontiers in Psychiatry, 15(1), 1-14.
Ray, J. V., Baker, T., & Peck, J. H. (2024). An examination of differential item functioning in a measure of self-reported offending across race and ethnicity among a sample of justice-involved youth. Justice Quarterly, 1(1), 1-25.
Rustiati, T. (2023). Upaya meningkatkan hasil belajar siswa kelas VI SD pada konsep operasi hitung pecahan pada mata pelajaran Matematika melalui metode demostrasi. Jurnal Pendidikan Abad Ke-21, 1(1), 17-29.
Ruswan, R. (2020). Penggunaan pendekatan kooperatif dalam pembelajaran Matematika tentang operasi hitung pecahan untuk meningkatkan hasil belajar siswa sekolah dasar. Pedadidaktika: Jurnal Ilmiah Pendidikan Guru Sekolah Dasar, 7(3), 58-67.
Safitri, E., & Widyanti, E. (2024). Analisis penilaian guru yang efektif pada pencapaian kompetensi pengetahuan siswa. Ihsan: Jurnal Pendidikan Islam, 2(2), 227-235.
Saputri, R. E., Firmansyah, R., & Silfiya, S. (2024). Pentingnya evaluasi pembelajaran untuk meningkatkan kompetensi peserta didik di sekolah dasar. Sindoro: Cendikia Pendidikan, 3(8), 21-30.
Smith, J. K., Lee, M., & Davis, K. (2022). Integrating real-life scenarios into classroom assessments. Journal of Educational Innovation, 20(4), 101-120.
Sholikhah, M., & Hidayati, Y. M. (2024). Summative assessment planning in the kurikulum merdeka on two-dimensional figure materials. Inovasi Kurikulum, 21(1), 467-480.
Tarigan, E. F., Nilmarito, S., Islamiyah, K., Darmana, A., & Suyanti, R. D. (2022). Analisis instrumen tes menggunakan Rasch model dan Software SPSS 22.0. Jurnal Inovasi Pendidikan Kimia, 16(2), 92-96.
Wahyuni, A. (2022). Detection of gender biased using DIF (Differential Item Functioning) analysis on item test of school examination Yogyakarta. Jurnal Evaluasi Pendidikan, 13(1), 46-49.
Wang, X., & Zheng, Y. (2023). Improving adaptive testing through psychometric modeling. International Journal of Educational Technology, 14(2), 112-126.
Wibowo, S. A., Degeng, M. D. K., & Praherdhiono, H. (2024). Interactive video for learning Mathematics element of measurement in elementary school. Inovasi Kurikulum, 21(2), 723-736.
Widodo, H. (2020). Penilaian kontekstual untuk meningkatkan kompetensi numerasi. Jurnal Pendidikan dan Kebudayaan, 26(4), 127-140.
Wijayanto, F., Bucur, I. G., Mul, K., Groot, P., van Engelen, B. G., & Heskes, T. (2023). Semi-automated Rasch analysis with differential item functioning. Behavior Research Methods, 55(6), 3129-3148.
Yektiana, N., & Nursikin, M. (2023). Konsep dasar pengukuran, penilaian, dan evaluasi hasil belajar pendidikan agama Islam. J-Ceki: Jurnal Cendekia Ilmiah, 2(2), 263-266.
Yusuf, S., Budiman, N., Yudha, E. S., Suryana, D., & Yusof, S. M. J. B. (2021). Rasch analysis of the Indonesian mental health screening tooals. The Open Psychology Journal, 14(1), 198-203.
DOI: https://doi.org/10.17509/jik.v22i1.77242
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Eny Cahyaningsih, Wardani - Rahayu, Riyan Arthur
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Inovasi Kurikulum
Published by Himpunan Pengembang Kurikulum Indonesia (HIPKIN)
in collaboration with Curriculum Development Study Program
Faculty of Education - Universitas Pendidikan Indonesia
Gedung FIP UPI Lt. 9 Jl. Dr. Setiabudhi Bandung 40154
Indexed By:
Google Scholar p. ISSN 1829-6750 | Google Scholar e. ISSN 2798-1363