Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

dc.authorid0000-0003-3255-1408
dc.contributor.authorKara, Abdulkadir
dc.contributor.authorKara, Zeynep Avinc
dc.contributor.authorYildirim, Serkan
dc.date.accessioned2026-02-28T12:18:11Z
dc.date.available2026-02-28T12:18:11Z
dc.date.issued2025
dc.departmentBayburt Üniversitesi
dc.description.abstractIn measurement and evaluation processes, natural language responses are often avoided due to time, workload, and reliability concerns. However, the increasing popularity of automatic short-answer grading studies for natural language responses means such answers can now be measured more quickly and reliably. This study aims to build models for predicting automatic short answer scores using the pre-trained BERT deep learning language model and to reveal their effectiveness. For this purpose, two different score prediction models were created using an answerbased approach that aligns student answers with expert judgements and a referencebased approach that matches student answers with reference answers. The dataset includes answers from 246 Physics department students responding to 4 physicsrelated questions. The performance of these models was evaluated on four physics questions representing varying levels of cognitive complexity, using Cohen's Kappa for statistical comparison of agreement with expert scores. Our findings reveal a clear interaction between model architecture and task complexity. The answer-based model was unequivocally superior for the most complex, multi-class task, effectively capturing diverse, nuanced responses. Conversely, the reference-based model demonstrated a statistically significant advantage for a well-defined, mediumcomplexity binary task. This study concludes that the optimal model for ASAG in Turkish is contingent on the cognitive demands of the assessment task, suggesting that a onesize-fits-all solution may not be the most effective approach. This provides a critical framework for practitioners, demonstrating not only that effective models are feasible for complex languages, but that their selection must be guided by task complexity.
dc.identifier.doi10.21449/ijate.1687429
dc.identifier.endpage925
dc.identifier.issn2148-7456
dc.identifier.issue4
dc.identifier.startpage905
dc.identifier.trdizinid1364894
dc.identifier.urihttps://doi.org/10.21449/ijate.1687429
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1364894
dc.identifier.urihttps://hdl.handle.net/20.500.12403/6150
dc.identifier.volume12
dc.identifier.wosWOS:001649220600003
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.publisherIzzet Kara
dc.relation.ispartofInternational Journal of Assessment Tools in Education
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260218
dc.subjectArtificial intelligence
dc.subjectDeep learning
dc.subjectAutomatic scoring
dc.subjectShort answer
dc.subjectBERT LLM models
dc.titleAnswer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity
dc.typeArticle

Dosyalar