基于GCN和目标视觉特征增强的多模态方面级情感分析
DOI:
作者:
作者单位:

江苏海洋大学

作者简介:

通讯作者:

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Multimodal Aspect-Level Sentiment Analysis Based on GCN and Target Visual Feature Enhancement
Author:
Affiliation:

Jiangsu Ocean University

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    多模态方面级情感分析旨在整合图文模态数据,以精准预测方面词的情感极性。然而,现有方法在精确定位文本相关的图像区域特征及有效处理模态间信息交互方面仍存在显著局限,同时模态内的上下文信息理解存在偏差,导致额外的噪声产生。为了解决上述问题,提出一种基于GCN和目标视觉特征增强的多模态方面级情感分析模型(GCN and Target Visual Feature Enhancement,GCN-TVFE)。首先,本文采用CLIP模型处理文本、方面词和图像数据,通过计算文本与图像之间的相似度以及方面词与图像之间的相似度,并结合这两项相似度,实现对文本与图像、方面词与图像匹配程度的量化评估。再通过Faster R-CNN模型去快速且精确地识别并定位图像中的目标区域,进一步增强模型提取与文本相关的图像特征能力。其次,通过图文GCN网络,利用文本之间的依存句法关系构建文本图结构,同时借助KNN算法生成图像图结构,从而深入挖掘模态内的特征信息。最后,采用多模态交互注意力机制,有效捕捉方面词与文本之间、目标视觉特征与图像生成文本描述特征之间的关联信息,显著减少噪声干扰,增强模态间的特征交互。实验结果表明,本文提出的模型在公共数据集 Twitter2015 和 Twitter2017 上的综合性能优越,验证了该模型在多模态情感分析领域的有效性。

    Abstract:

    Multimodal aspect level sentiment analysis aims to integrate graphic modal data to accurately predict the emotional polarity of aspect words.However, the existing methods still have significant limitations in accurately locating text-related image region features and effectively processing the information interaction between modalities. At the same time, the understanding of context information within modalities is biased, which leads to additional noise.In order to solve the above problems, a multi-modal aspect-level sentiment analysis model based on GCN and Target Visual Feature Enhancement (GCN-TVFE) is proposed. First of all, this paper uses CLIP model to process text, aspect words and image data. By calculating the similarity between text and image and the similarity between aspect words and image, and combining these two similarities, the quantitative evaluation of the matching degree between text and image and aspect words and image is realized. Then, the Faster R-CNN model is used to quickly and accurately identify and locate the target region in the image, which further enhances the ability of the model to extract image features related to text. Secondly, through the GCN network, the text graph structure is constructed by using the dependency syntactic relationship between texts, and the image graph structure is generated by KNN algorithm, so as to dig the feature information in the mode deeply. Finally, the multi-layer and multi-modal interactive attention mechanism is used to effectively capture the correlation information between aspect words and text, and between target visual features and image-generated text description features, which significantly reduces noise interference and enhances feature interaction between modes. The experimental results show that the model proposed in this paper has superior comprehensive performance on the public data sets Twitter-2015 and Twitter-2017, which verifies the effectiveness of the model in the field of multimodal sentiment analysis.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-12-21
  • 最后修改日期:2025-02-17
  • 录用日期:2025-03-31
  • 在线发布日期: