Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Kara, Abdulkadir; Kara, Zeynep Avinc; Yildirim, Serkan

Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

dc.authorid	0000-0003-3255-1408
dc.contributor.author	Kara, Abdulkadir
dc.contributor.author	Kara, Zeynep Avinc
dc.contributor.author	Yildirim, Serkan
dc.date.accessioned	2026-02-28T12:18:11Z
dc.date.available	2026-02-28T12:18:11Z
dc.date.issued	2025
dc.department	Bayburt Üniversitesi
dc.description.abstract	In measurement and evaluation processes, natural language responses are often avoided due to time, workload, and reliability concerns. However, the increasing popularity of automatic short-answer grading studies for natural language responses means such answers can now be measured more quickly and reliably. This study aims to build models for predicting automatic short answer scores using the pre-trained BERT deep learning language model and to reveal their effectiveness. For this purpose, two different score prediction models were created using an answerbased approach that aligns student answers with expert judgements and a referencebased approach that matches student answers with reference answers. The dataset includes answers from 246 Physics department students responding to 4 physicsrelated questions. The performance of these models was evaluated on four physics questions representing varying levels of cognitive complexity, using Cohen's Kappa for statistical comparison of agreement with expert scores. Our findings reveal a clear interaction between model architecture and task complexity. The answer-based model was unequivocally superior for the most complex, multi-class task, effectively capturing diverse, nuanced responses. Conversely, the reference-based model demonstrated a statistically significant advantage for a well-defined, mediumcomplexity binary task. This study concludes that the optimal model for ASAG in Turkish is contingent on the cognitive demands of the assessment task, suggesting that a onesize-fits-all solution may not be the most effective approach. This provides a critical framework for practitioners, demonstrating not only that effective models are feasible for complex languages, but that their selection must be guided by task complexity.
dc.identifier.doi	10.21449/ijate.1687429
dc.identifier.endpage	925
dc.identifier.issn	2148-7456
dc.identifier.issue	4
dc.identifier.startpage	905
dc.identifier.trdizinid	1364894
dc.identifier.uri	https://doi.org/10.21449/ijate.1687429
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1364894
dc.identifier.uri	https://hdl.handle.net/20.500.12403/6150
dc.identifier.volume	12
dc.identifier.wos	WOS:001649220600003
dc.identifier.wosquality	Q3
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	TR-Dizin
dc.language.iso	en
dc.publisher	Izzet Kara
dc.relation.ispartof	International Journal of Assessment Tools in Education
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WoS_20260218
dc.subject	Artificial intelligence
dc.subject	Deep learning
dc.subject	Automatic scoring
dc.subject	Short answer
dc.subject	BERT LLM models
dc.title	Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
TR-Dizin İndeksli Yayınlar Koleksiyonu

Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity

Dosyalar

Koleksiyon