Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity
Küçük Resim Yok
Tarih
2025
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Izzet Kara
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
In measurement and evaluation processes, natural language responses are often avoided due to time, workload, and reliability concerns. However, the increasing popularity of automatic short-answer grading studies for natural language responses means such answers can now be measured more quickly and reliably. This study aims to build models for predicting automatic short answer scores using the pre-trained BERT deep learning language model and to reveal their effectiveness. For this purpose, two different score prediction models were created using an answerbased approach that aligns student answers with expert judgements and a referencebased approach that matches student answers with reference answers. The dataset includes answers from 246 Physics department students responding to 4 physicsrelated questions. The performance of these models was evaluated on four physics questions representing varying levels of cognitive complexity, using Cohen's Kappa for statistical comparison of agreement with expert scores. Our findings reveal a clear interaction between model architecture and task complexity. The answer-based model was unequivocally superior for the most complex, multi-class task, effectively capturing diverse, nuanced responses. Conversely, the reference-based model demonstrated a statistically significant advantage for a well-defined, mediumcomplexity binary task. This study concludes that the optimal model for ASAG in Turkish is contingent on the cognitive demands of the assessment task, suggesting that a onesize-fits-all solution may not be the most effective approach. This provides a critical framework for practitioners, demonstrating not only that effective models are feasible for complex languages, but that their selection must be guided by task complexity.
Açıklama
Anahtar Kelimeler
Artificial intelligence, Deep learning, Automatic scoring, Short answer, BERT LLM models
Kaynak
International Journal of Assessment Tools in Education
WoS Q Değeri
Q3
Scopus Q Değeri
Cilt
12
Sayı
4












