大语言模型指导的多模态时序-语义预测框架
作者:
作者单位:

1.苏州工学院商学院,苏州 215500;2.厦门大学人工智能研究院,厦门 361005

作者简介:

通讯作者:

基金项目:


Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework
Author:
Affiliation:

1.College of Business, Suzhou University of Technology, Suzhou 215500,China;2.Institute of Artificial Intelligence, Xiamen University, Xiamen 361005,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    多模态预测任务通常需要同时对文本、图像与结构化数值等异构数据进行建模,以在复杂环境中实现稳健的时序建模、跨模态语义对齐与可解释推理。传统单模态或弱融合方法难以在语义对齐、信息互补与跨源推理方面取得一致性,且深度模型的黑箱特性限制了结果的可解释性。与此同时,大语言模型(Large language model, LLM)在语义理解、指令跟随与推理方面展现出强大能力,但其与时序建模、跨模态对齐及实时知识整合之间仍存在鸿沟。因此,提出LLM指导的多模态时序-语义预测框架,通过将变分推理的时序建模与LLM的语义分析相结合,构建“时序-语义-决策”的协同机制:时序模块利用递归潜变量与注意力机制提取历史行为模式;语义模块利用领域化语言模型与多模态编码器提炼高层语义与解释;两者在可学习融合器中联合优化,并提供不确定性标注与可解释报告。在StockNet、CMIN-US和CMIN-CN数据集上的实验表明,本文方法准确率达63.54%,较最优基线提升5.31个百分点,马修斯相关系数(Matthews correlation coefficient, MCC)提升至0.223。本文研究为多模态时序预测提供了统一范式,并在金融科技领域展现出应用潜力。

    Abstract:

    Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data, including text, images and structured numerical information, to achieve robust inference and explainable decision-making in complex environments. Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment, information complementation and cross-source reasoning, while the inherent black-box nature of deep models limits the result interpretability. Meanwhile, the large language model(LLM) has demonstrated strong capabilities in semantic understanding, instruction following, and reasoning, yet a gap remains in their performance for time series modeling, cross-modal alignment, and real-time knowledge integration. To address these challenges, this paper proposes a LLM-guided multi-modal time series-semantic prediction framework. By combining variational inference-based time series modeling with LLM -driven semantic analysis, the approach establishes a collaborative “temporal-semantic-decision” mechanism: The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms; the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders; and both components are jointly optimized via a learnable fusion module, which also provides uncertainty annotations and explainable reports. Experiments on the StockNet, CMIN-US, and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%, an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient (MCC) elevated to 0.223. This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.

    参考文献
    相似文献
    引证文献
引用本文

叶诗敏,刘非菲,张岩.大语言模型指导的多模态时序-语义预测框架[J].数据采集与处理,2025,40(5):1193-1206

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-06-15
  • 最后修改日期:2025-08-10
  • 录用日期:
  • 在线发布日期: 2025-10-15