Kochliaridis, V., Pierros, I., Romanos, G., & Vlahavas, I. (2025). ViT $$^{2} $$-Pre-training Vision Transformers for Visual Times Series Forecasting. In International Conference on Pattern Recognition (pp. 217-231). Springer, Cham.
Vision Transformer, Time Series Forecasting, Visual Time Series, Gramian Angular Fields
Tags:
Abstract:
Computer Vision has witnessed remarkable advancements through the utilization of large Transformer architectures, such as Vision Transformer (ViT). These models achieve impressive performance and generalization capability when trained on large datasets and can be fine-tuned on custom image datasets through transfer learning techniques. On the other hand, time series forecasting models have struggled to achieve a similar level of generalization across diverse datasets. This paper presents ViT2, a framework composed of four modules, that addresses probabilistic price forecasting and generalization for cryptocurrency markets. The first module injects noise into the time series data to increase sample availability. The second module transforms the time series data into visual data, using Gramian Angular Fields. The third module converts the ViT architecture into a probabilistic forecasting model. Finally, the fourth module employs Transfer Learning and fine-tuning techniques to enhance its performance on low-resource datasets. Our findings reveal that ViT2 outperforms State-Of-The-Art time series forecasting models across the majority of the datasets evaluated, highlighting the potential of Computer Vision models in the probabilistic time series forecasting domain. The code and models are publicly available at: https://github.com/kochlisGit/VIT2.