基于图像篡改感知与多视图融合网络的虚假新闻检测方法
DOI:
作者:
作者单位:

南京邮电大学

作者简介:

通讯作者:

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Image Tampering Perception and Multi-view Fusion Network based Fake News Detection
Author:
Affiliation:

Nanjing University of Posts and Telecommunications

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    AI与图像修改技术的不断革新导致了篡改图像的激增,这对假新闻检测检测带了新的挑战。针对当前主流多模态假新闻检测方法聚焦图文一致性建模,忽视图像本身可能被篡改,导致模型在面对篡改图像时鲁棒性不足的问题,提出了一种基于图像篡改感知与多视图融合网络(Image Tampering Perception and Multi-view Fusion Network, ITPMFN)的假新闻检测方法。该方法包括三个部分:1)多视图特征提取模块:利用BERT与Swin-T捕获文本与图像的模态特征,并借助CLIP提取图文跨模态对齐的语义特征,构建四通道多视图表征;2)基于协同推理的图像篡改感知与可解释分析生成模块:采用轻量级模型提取低层篡改统计特征,结合统计特征,设计增强提示,指导多模态大模型生成包含篡改对象、操作类型等信息的结构化高层篡改推理解释,从而提供人类可理解的决策依据,并将该语义解释编码为可融合特征用于下游的假新闻判别;3)跨模态交互与融合模块:分别在模态内与模态间层面基于注意力机制充分交互并融合多视图特征,获得更具鉴别性的融合特征,结合篡改推理特征进行多模态假新闻检测。在Weibo与Fakeddit两个广泛使用公开数据集上的实验表明,所提方法均优于现有主流方法,消融实验进一步验证了各模块的有效性。

    Abstract:

    The rapid advancement of AI and image editing technologies has led to a surge in tampered images, posing new challenges for Fake News Detection (FND). To address the limitation of prevailing multimodal FND methods, which primarily focus on modeling text-image semantic consistency while neglecting the possibility that images themselves may be manipulated, thereby suffering from insufficient robustness against tampered content, this paper proposes a FND approach based on Image Tampering Perception and Multi-view Fusion Network (ITPMFN). The method consists of three components: (1) a multi-view feature extraction and interaction module that employs BERT and Swin-T to capture modality-specific features from text and images, respectively, and leverages CLIP to extract cross-modal aligned semantic features, constructing a four-channel multi-view representation; (2) a collaborative reasoning-based image tampering perception and interpretable analysis generation module, which first uses a lightweight model to extract low-level statistical tampering cues and then designs enhanced prompts based on these cues to guide a multimodal large language model in generating structured, high-level tampering explanations—including manipulated objects and manipulation types—thereby providing human-interpretable decision rationales, whose semantic embeddings are encoded as fusion-ready features for downstream fake news classification; and (3) a cross-modal interaction and fusion module that applies attention mechanisms to thoroughly interact and fuse multi-view features at both intra- and inter-modal levels, yielding more discriminative representations, which are combined with tampering reasoning features for final multimodal fake news detection. Experiments on two widely used public benchmarks, Weibo and Fakeddit, demonstrate that the proposed method consistently outperforms existing state-of-the-art approaches, and ablation studies further validate the effectiveness of each component.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2026-02-09
  • 最后修改日期:2026-04-28
  • 录用日期:2026-05-21
  • 在线发布日期: