Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework
CSTR:
Author:
Affiliation:

1.College of Business, Suzhou University of Technology, Suzhou 215500,China;2.Institute of Artificial Intelligence, Xiamen University, Xiamen 361005,China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data, including text, images and structured numerical information, to achieve robust inference and explainable decision-making in complex environments. Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment, information complementation and cross-source reasoning, while the inherent black-box nature of deep models limits the result interpretability. Meanwhile, the large language model(LLM) has demonstrated strong capabilities in semantic understanding, instruction following, and reasoning, yet a gap remains in their performance for time series modeling, cross-modal alignment, and real-time knowledge integration. To address these challenges, this paper proposes a LLM-guided multi-modal time series-semantic prediction framework. By combining variational inference-based time series modeling with LLM -driven semantic analysis, the approach establishes a collaborative “temporal-semantic-decision” mechanism: The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms; the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders; and both components are jointly optimized via a learnable fusion module, which also provides uncertainty annotations and explainable reports. Experiments on the StockNet, CMIN-US, and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%, an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient (MCC) elevated to 0.223. This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.

    Reference
    Related
    Cited by
Get Citation

YE Shimin, LIU Feifei, ZHANG Yan. Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework[J].,2025,40(5):1193-1206.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 15,2025
  • Revised:August 10,2025
  • Adopted:
  • Online: October 15,2025
  • Published:
Article QR Code