视觉大模型生成内容风险与治理研究综述
作者:
作者单位:

1天津大学电气自动化与信息工程学院, 天津 300072;2天津大学新媒体学院, 天津 300072

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62425307,62572346,U21B2024)。


A Survey on Risks and Governance of Content Generated by Visual Generation Models
Author:
Affiliation:

1School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China;2School of New Media and Communication, Tianjin University, Tianjin 300072, China

Fund Project:

National Natural Science Foundation of China (Nos.62425307, 62572346, U21B2024).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着扩散模型等深度生成技术的突破性进展,视觉大模型在图像生成质量与语义一致性上取得了显著飞跃,被广泛应用于艺术创作与工业设计等领域。然而,其强大的生成能力也引发了严峻的内容安全风险,恶意用户可诱导模型生成色情、暴力或侵权图像,对人工智能的安全治理提出了迫切需求。本文聚焦于视觉大模型面临的两大核心攻防任务进行了系统综述:(1)旨在诱导模型突破安全防线的越狱攻击;(2)旨在移除模型内部风险知识的概念擦除。首先,本文构建了越狱攻击的分类体系,从技术划分、扰动方式、查询类型及攻击者知识4个层面,揭示了攻击手段从特征空间对抗向语义空间推理演进的趋势。其次,针对风险治理,深入探讨了概念擦除技术,对比分析了模型微调、模型编辑与推理引导3类主流技术路线,阐述了不同方法在擦除有效性、计算效率以及通用生成能力保留之间的权衡关系。最后,梳理了该领域常用的基准数据集,并指出了当前研究在对抗鲁棒性以及多概念联合治理等方面面临的挑战与未来发展方向,旨在为构建安全可控的生成式视觉大系统提供理论参考与技术指引。

    Abstract:

    With breakthroughs in deep generative technologies such as diffusion models, visual generation models have achieved significant leaps in generation quality and semantic consistency, finding extensive applications in fields like artistic creation and industrial design. However, the powerful generative capability has also triggered severe content safety risks. Malicious users can induce models to generate pornographic, violent, or copyright-infringing images, posing an urgent need for the safety governance of generative AI. This paper provides a systematic review that focuses on two core adversarial tasks of T2I models: (1) Jailbreak attacks, which aim to induce models to breach safety guardrails; (2) Concept erasure, which aims to eliminate internal risk knowledge from the models. First, we establish a taxonomy of jailbreak attacks. By analyzing them across four dimensions: Technical category, perturbation strategy, query type, and adversary knowledge, we reveal the evolutionary trend of attack methods shifting from feature-space perturbations to semantic-space reasoning. Second, regarding risk governance, this paper delves into concept erasure technologies, comparatively analyzing three mainstream technical routes: Model fine-tuning, model editing, and inference guidance. We elucidate the trade-offs among erasure effectiveness, computational efficiency, and the preservation of general generation capabilities. Finally, we summarize the commonly used benchmark datasets in this field and identify the current challenges and future directions regarding adversarial robustness and multi-concept joint governance, aiming to provide theoretical references and technical guidance for building safe and controllable T2I systems.

    参考文献
    相似文献
    引证文献
引用本文

刘安安,张晨宇,王岚君,李文辉.视觉大模型生成内容风险与治理研究综述[J].数据采集与处理,2026,(2):620-640

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2026-01-10
  • 最后修改日期:2026-02-25
  • 录用日期:
  • 在线发布日期: 2026-04-15