Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Izzet Kara

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

In measurement and evaluation processes, natural language responses are often avoided due to time, workload, and reliability concerns. However, the increasing popularity of automatic short-answer grading studies for natural language responses means such answers can now be measured more quickly and reliably. This study aims to build models for predicting automatic short answer scores using the pre-trained BERT deep learning language model and to reveal their effectiveness. For this purpose, two different score prediction models were created using an answerbased approach that aligns student answers with expert judgements and a referencebased approach that matches student answers with reference answers. The dataset includes answers from 246 Physics department students responding to 4 physicsrelated questions. The performance of these models was evaluated on four physics questions representing varying levels of cognitive complexity, using Cohen's Kappa for statistical comparison of agreement with expert scores. Our findings reveal a clear interaction between model architecture and task complexity. The answer-based model was unequivocally superior for the most complex, multi-class task, effectively capturing diverse, nuanced responses. Conversely, the reference-based model demonstrated a statistically significant advantage for a well-defined, mediumcomplexity binary task. This study concludes that the optimal model for ASAG in Turkish is contingent on the cognitive demands of the assessment task, suggesting that a onesize-fits-all solution may not be the most effective approach. This provides a critical framework for practitioners, demonstrating not only that effective models are feasible for complex languages, but that their selection must be guided by task complexity.

Açıklama

Anahtar Kelimeler

Artificial intelligence, Deep learning, Automatic scoring, Short answer, BERT LLM models

Kaynak

International Journal of Assessment Tools in Education

WoS Q Değeri

Q3

Scopus Q Değeri

Cilt

12

Sayı

4

Künye