低码率生成式无人机视频编码算法
作者:
作者单位:

1.北京交通大学信息科学研究所,北京 100044;2.北京交通大学视觉智能交叉创新教育部国际合作联合实验室,北京 100044

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62372036)。


Low Bit Rate Generative Drone Video Compression
Author:
Affiliation:

1.Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China;2.Visual Intelligence+X International Cooperation Joint Laboratory, Beijing Jiaotong University, Beijing 100044, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    空天地海复杂环境下海量的视频数据给有限的传输带宽和存储设备带来了巨大的压力,因此如何提高视频编码技术在低码率条件下的编码效率显得尤为关键。近年来,基于深度学习的视频编码算法取得了良好的进展,却因优化目标与感知质量失配、训练数据分布偏差等问题,降低了极低码率下的视觉感知质量。生成式编码通过学习数据分布有效提升了低码率下的纹理与结构复原能力,缓解了深度视频压缩的模糊伪影问题。然而,现有研究仍存在两大瓶颈:一是时域相关性建模不足,帧间关联缺失;二是动态比特分配机制欠缺,难以实现关键信息的自适应提取。为此,提出一种基于条件引导扩散模型的视频编码算法(Conditional guided diffusion model-video compression, CGDM-VC),旨在改善低码率条件下视频感知质量的同时,加强帧间特征建模能力和保留关键信息。具体地,该算法设计了隐式帧间对齐策略,利用扩散模型捕获帧间潜在特征,降低估计显式运动信息的计算复杂度。同时,设计的自适应时空重要性编码器可动态分配码率优化关键区域的生成质量。此外,引入感知损失函数,结合感知图像块相似度(Learned perceptual image patch similarity, LPIPS)约束,以提高重建帧的视觉保真度。实验结果表明,与DCVC(Deep contextual video compression)等算法相比,该算法在低码率(<0.1 BPP)情况下,LPIPS值平均降低了36.49%,展现出更丰富的纹理细节和更自然的视觉效果。

    Abstract:

    In complex environments across air, space, land, and sea, the massive volume of video data exerts tremendous pressure on limited transmission bandwidth and storage devices. Therefore, improving the coding efficiency of video compression technologies under low bit rate conditions becomes crucial. In recent years, deep learning-based video compression algorithms have made significant progress, yet due to issues such as model design flaws, mismatches between optimization objectives and perceptual quality, and biases in training data distributions, the visual perception quality at extremely low bit rates has been compromised. Generative encoding effectively improves the texture and structure restoration ability at low bit rates through data distribution learning, alleviating the problem of blur artifacts in deep video compression. However, there are still two major bottlenecks in existing research: Firstly, time domain correlation modeling is insufficient and inter-frame feature correlation is missing; secondly, the lack of dynamic bit allocation mechanism makes it difficult to achieve adaptive extraction of key information. Therefore, this article proposes a video encoding algorithm based on conditional guided diffusion model-video compression (CGDM-VC), aiming to improve the perceptual quality of videos under low bit-rate conditions while enhancing inter-frame feature modeling capabilities and preserving key information. Specifically, the algorithm designs an implicit inter-frame alignment strategy, utilizing a diffusion model to capture potential inter-frame features and reduce the computational complexity of estimating explicit motion information. Meanwhile, the designed adaptive spatio-temporal importance-aware coder can dynamically allocate code rates to optimize the generation quality of key regions. Furthermore, a perceptual loss function is introduced, combined with the learned perceptual image patch similarity (LPIPS) constraint, to improve the visual fidelity of the reconstructed frames. Experimental results demonstrate that, compared to algorithms such as deep contextual video compression (DCVC), the proposed method achieves an average LPIPS reduction of 36.49% under low bit rate conditions (<0.1 BPP), showing richer texture details and more natural visual effects.

    参考文献
    相似文献
    引证文献
引用本文

刘美琴,陈虹宇,周一鸣,倪文昊.低码率生成式无人机视频编码算法[J].数据采集与处理,2025,40(2):320-333

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-02-15
  • 最后修改日期:2025-03-14
  • 录用日期:
  • 在线发布日期: 2025-04-11