大语言模型指导的多模态时序-语义预测框架

doi:10.16337/j.1004-9037.2025.05.007

首页 > 按月查看>2025年第5月 >1193-1206. DOI:10.16337/j.1004-9037.2025.05.007

大语言模型指导的多模态时序-语义预测框架
DOI:
                        10.16337/j.1004-9037.2025.05.007
                    
作者:
                        
                        
                    
作者单位:1.苏州工学院商学院，苏州 215500;2.厦门大学人工智能研究院，厦门 361005
作者简介:
通讯作者:
基金项目:

Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework

Author:

Affiliation:

1.College of Business, Suzhou University of Technology, Suzhou 215500,China;2.Institute of Artificial Intelligence, Xiamen University, Xiamen 361005,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

多模态预测任务通常需要同时对文本、图像与结构化数值等异构数据进行建模，以在复杂环境中实现稳健的时序建模、跨模态语义对齐与可解释推理。传统单模态或弱融合方法难以在语义对齐、信息互补与跨源推理方面取得一致性，且深度模型的黑箱特性限制了结果的可解释性。与此同时，大语言模型（Large language model， LLM）在语义理解、指令跟随与推理方面展现出强大能力，但其与时序建模、跨模态对齐及实时知识整合之间仍存在鸿沟。因此，提出LLM指导的多模态时序-语义预测框架，通过将变分推理的时序建模与LLM的语义分析相结合，构建“时序-语义-决策”的协同机制：时序模块利用递归潜变量与注意力机制提取历史行为模式；语义模块利用领域化语言模型与多模态编码器提炼高层语义与解释；两者在可学习融合器中联合优化，并提供不确定性标注与可解释报告。在StockNet、CMIN-US和CMIN-CN数据集上的实验表明，本文方法准确率达63.54%，较最优基线提升5.31个百分点，马修斯相关系数（Matthews correlation coefficient， MCC）提升至0.223。本文研究为多模态时序预测提供了统一范式，并在金融科技领域展现出应用潜力。

Abstract:

Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data， including text， images and structured numerical information， to achieve robust inference and explainable decision-making in complex environments. Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment， information complementation and cross-source reasoning， while the inherent black-box nature of deep models limits the result interpretability. Meanwhile， the large language model（LLM） has demonstrated strong capabilities in semantic understanding， instruction following， and reasoning， yet a gap remains in their performance for time series modeling， cross-modal alignment， and real-time knowledge integration. To address these challenges， this paper proposes a LLM-guided multi-modal time series-semantic prediction framework. By combining variational inference-based time series modeling with LLM -driven semantic analysis， the approach establishes a collaborative “temporal-semantic-decision” mechanism： The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms； the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders； and both components are jointly optimized via a learnable fusion module， which also provides uncertainty annotations and explainable reports. Experiments on the StockNet， CMIN-US， and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%， an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient （MCC） elevated to 0.223. This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.

参考文献

相似文献

引证文献

引用本文

叶诗敏,刘非菲,张岩.大语言模型指导的多模态时序-语义预测框架[J].数据采集与处理,2025,40(5):1193-1206

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2025-06-15
最后修改日期:2025-08-10
录用日期:
在线发布日期: 2025-10-15

引用本文

分享

文章指标

历史