基于映射融合嵌入扩散模型的文本引导图像编辑方法

doi:10.16337/j.1004-9037.2025.04.016

首页 > 按月查看>2025年第4月 >1035-1045. DOI:10.16337/j.1004-9037.2025.04.016

基于映射融合嵌入扩散模型的文本引导图像编辑方法
DOI:
                        10.16337/j.1004-9037.2025.04.016
                    
作者:
                        
                        
                    
作者单位:1.南京邮电大学人工智能学院，南京 210023;2.南京邮电大学计算机学院，南京 210023;3.武汉大学计算机学院，武汉 430072
作者简介:
通讯作者:
基金项目:国家自然科学基金（62076139）；信息系统工程全国重点实验室开放基金（05202305）；南京邮电大学1311人才计划。

Text-Guided Image Editing Method Based on Diffusion Model with Mapping-Fusion Embedding

Author:

Affiliation:

1.College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;2.School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;3.School of Computer Science, Wuhan University, Wuhan 430072, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

在只有图像和目标文本提示作为输入的情况下，对真实图像进行基于文本引导的编辑是一项极具挑战性的任务。以往基于微调大型预训练扩散模型的方法，往往对源文本特征和目标文本特征进行简单的插值组合，用于引导图像生成过程，这限制了其编辑能力，同时微调大型扩散模型极易出现过拟合且耗时长的问题。提出了一种基于映射融合嵌入扩散模型的文本引导图像编辑方法（Text-guided image editing method based on diffusion model with mapping-fusion embedding， MFE-Diffusion）。该方法由两部分组成：（1）大型预训练扩散模型与源文本特征向量联合学习框架，使模型可以快速学习以重建给定的原图像；（2）特征映射融合模块，深度融合目标文本与原图像的特征信息，生成条件嵌入，用于引导图像编辑过程。在具有挑战性的文本引导图像编辑基准TEdBench上进行实验验证，结果表明所提方法在图像编辑性能上具有优势。

Abstract:

Text-guided editing of real images with only images and target text prompts as input is an extremely challenging problem. Previous approaches based on fine-tuning large pre-trained diffusion models often simply interpolate and combine source and target text features to guide the image generation process， which limits their editing capabilities， while fine-tuning large diffusion models is highly susceptible to overfitting and time-consuming problems. In this paper， we propose a text-guided image editing method based on diffusion model with mapping-fusion embedding（MFE-Diffusion）. The method consists of the following two components：（1） A large pre-trained diffusion model and source text feature vectors joint learning framework， which enables the model to quickly learn to reconstruct the original image. （2） A feature mapping-fusion module， which deeply fuses the feature information of the target text and the original image to generate conditional embedding that is used to guide the image editing process. Experimental validation on the challenging text-guided image editing benchmark TEdBench shows that the proposed method has advantages in image editing performance.

参考文献

相似文献

引证文献

引用本文

吴飞,马永恒,邓哲颖,王银杰,季一木,荆晓远.基于映射融合嵌入扩散模型的文本引导图像编辑方法[J].数据采集与处理,2025,40(4):1035-1045

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2024-06-19
最后修改日期:2024-10-29
录用日期:
在线发布日期: 2025-08-15

引用本文

分享

文章指标

历史