The Effect of Data Types' on the Performance of Machine Learning Algorithms for Cryptocurrency Prediction

dc.authorid0000-0003-2267-5175
dc.authorid0000-0002-5994-2874
dc.contributor.authorTanrikulu, Hulusi Mehmet
dc.contributor.authorPabuccu, Hakan
dc.date.accessioned2026-02-28T12:17:41Z
dc.date.available2026-02-28T12:17:41Z
dc.date.issued2025
dc.departmentBayburt Üniversitesi
dc.description.abstractForecasting cryptocurrencies as a financial issue is crucial as it provides investors with possible financial benefits. A slight improvement in forecasting performance can lead to increased profitability; Therefore, obtaining a realistic forecast is very important for investors. Bitcoin, frequently mentioned in recent due to its volatility and chaotic behavior, has become an investment tool, especially during and after the COVID-19 pandemic. In this study, selected ML techniques were investigated for predicting cryptocurrency movements by using technical indicator-based data sets and measuring the applicability of the techniques to cryptocurrencies that do not have sufficient historical data. In order to measure the effect of data size, Bitcoin's last 1 year and 7 years of data were used. Following the related literature, Google trends and the number of tweets were used as input features, in addition to the most commonly used twelve technical indicators. Random Forest, K-Nearest Neighbors, Extreme Gradient Boosting (XGBoost-XGB), Support Vector Machine (SVM), Naive Bayes (NB), Artificial Neural Networks (ANN), and Long-Short-Term Memory (LSTM) network were optimized for best results. Accuracy, F1, and area under the ROC curve values were used to compare the model performance. For continuous data, ANN and SVM performed the best with the highest accuracy and outperformed the other ML models for complete and reduced sets. LSTM reached the best accuracy for trend data, but SVM, NB, and XGB models showed similar performance. The research shows that some indicators significantly affect prediction performance, and the data discretization process also improved the model's accuracy. While the number of samples affects the results of many ML models, correctly optimized and fine-tuned models may also give excellent results even with less data.
dc.description.sponsorshipScientific and Technological Research Council of Turkiye (TUBITAK)
dc.description.sponsorshipOpen access funding provided by the Scientific and Technological Research Council of Turkiye (TUBITAK). We declare that we have no relevant or material financial interests that relate to the research described in this paper. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
dc.identifier.doi10.1007/s10614-025-10919-y
dc.identifier.issn0927-7099
dc.identifier.issn1572-9974
dc.identifier.scopus2-s2.0-105000615246
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1007/s10614-025-10919-y
dc.identifier.urihttps://hdl.handle.net/20.500.12403/5925
dc.identifier.wosWOS:001449431000001
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofComputational Economics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260218
dc.subjectFinancial prediction
dc.subjectMachine learning
dc.subjectBitcoin
dc.subjectContinuous data
dc.subjectTrend data
dc.titleThe Effect of Data Types' on the Performance of Machine Learning Algorithms for Cryptocurrency Prediction
dc.typeArticle

Dosyalar