Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Kara, Abdulkadir; Kara, Zeynep Avinc; Yildirim, Serkan

Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Tarih

2025

Yazarlar

Kara, Abdulkadir

Kara, Zeynep Avinc

Yildirim, Serkan

Yayıncı

Izzet Kara

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

In measurement and evaluation processes, natural language responses are often avoided due to time, workload, and reliability concerns. However, the increasing popularity of automatic short-answer grading studies for natural language responses means such answers can now be measured more quickly and reliably. This study aims to build models for predicting automatic short answer scores using the pre-trained BERT deep learning language model and to reveal their effectiveness. For this purpose, two different score prediction models were created using an answerbased approach that aligns student answers with expert judgements and a referencebased approach that matches student answers with reference answers. The dataset includes answers from 246 Physics department students responding to 4 physicsrelated questions. The performance of these models was evaluated on four physics questions representing varying levels of cognitive complexity, using Cohen's Kappa for statistical comparison of agreement with expert scores. Our findings reveal a clear interaction between model architecture and task complexity. The answer-based model was unequivocally superior for the most complex, multi-class task, effectively capturing diverse, nuanced responses. Conversely, the reference-based model demonstrated a statistically significant advantage for a well-defined, mediumcomplexity binary task. This study concludes that the optimal model for ASAG in Turkish is contingent on the cognitive demands of the assessment task, suggesting that a onesize-fits-all solution may not be the most effective approach. This provides a critical framework for practitioners, demonstrating not only that effective models are feasible for complex languages, but that their selection must be guided by task complexity.

Anahtar Kelimeler

Artificial intelligence, Deep learning, Automatic scoring, Short answer, BERT LLM models

Kaynak

International Journal of Assessment Tools in Education

WoS Q Değeri

Q3

Cilt

12

Sayı

4

Bağlantı

https://doi.org/10.21449/ijate.1687429
https://search.trdizin.gov.tr/tr/yayin/detay/1364894
https://hdl.handle.net/20.500.12403/6150

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
TR-Dizin İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon