基于波束-信道-功率联合优化的多干扰机协同决策方法
作者:
作者单位:

中国人民解放军陆军工程大学通信工程学院, 南京 210007

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62401625,62571548)。


Multi-jammer Cooperative Decision-Making via Joint Beam-Channel-Power Optimization
Author:
Affiliation:

College of Communications Engineering, Army Engineering University of PLA, Nanjing 210007, China

Fund Project:

National Natural Science Foundation of China (Nos.62401625, 62571548).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着认知电子战的快速发展,多干扰机协同已成为提升复杂电磁环境下干扰效能的重要手段,然而现有方法普遍面临能量分散、决策耦合及动作空间维度爆炸等难题。提出一种基于深度强化学习的波束方向-信道-功率联合决策方法,构建“分布式执行、集中优化”的多智能体架构,各干扰机基于局部观测独立决策并共享全局奖励以实现策略协同;设计融合双目标网络与玻尔兹曼探索策略的改进深度Q网络(Deep Q-network, DQN)算法,解决Q值过估计问题并自适应平衡探索与利用,实现波束指向、信道选择与功率分配的三维联合优化。仿真结果表明,与独立Q学习及独立深度强化学习方法相比,所提方法干扰成功率提升至约90%,有效解决了多干扰机协同决策难题,为智能化电子对抗提供了新的技术途径。

    Abstract:

    This study aims to address the critical challenges of energy diffusion, resource conflicts, and high-dimensional action spaces inherent in multi-jammer cooperative jamming within complex electromagnetic environments. Conventional omnidirectional jamming suffers from severe energy inefficiency, while independent decision-making among multiple jammers frequently results in interference overlap. Furthermore, the joint optimization of beam direction, jamming channel, and transmit power creates an exponentially growing action space that traditional reinforcement learning methods struggle to handle. To overcome these limitations, we propose a collaborative decision-making framework based on deep reinforcement learning to achieve three-dimensional joint resource optimization with minimal communication overhead. The proposed method constructs a multi-agent architecture featuring “centralized training with decentralized execution”(CTDE), where each jammer utilizes an independent deep Q-network to approximate action-value functions based on local observations. Centralized training is achieved through a shared global reward signal defined as the total number of successfully jammed users, aligning individual policies with system-wide objectives without high-bandwidth data exchange. To mitigate Q-value overestimation, double target networks with soft parameter updating are integrated. An adaptive Boltzmann exploration strategy with exponentially decaying temperature is employed to dynamically balance the exploration and the exploitation. The action space is formulated as a three-dimensional joint space integrating beam direction, frequency channel, and power level assignment. Comprehensive simulations conducted in a 400 m×400 m scenario with four communication user pairs and two intelligent jammers demonstrate the effectiveness of the proposed approach. Quantitative results indicate that the jamming success rate reaches approximately 90%, representing a 50% improvement over independent deep reinforcement learning and an 80% improvement over independent Q-learning. This approach effectively resolves resource conflicts in multi-jammer systems through global reward sharing while ensuring low communication overhead. The integration of double target networks and adaptive Boltzmann exploration successfully addresses training instability in high-dimensional spaces. By achieving joint optimization of spatial, spectral, and power resources, the method significantly enhances energy utilization efficiency, providing a robust technical foundation for intelligent electronic countermeasures.Highlights:1. A novel “distributed execution with centralized optimization” multi-agent architecture is proposed to achieve collaborative jamming with minimal communication overhead and exposure to risk.2. An improved deep Q-network algorithm integrating double target networks and adaptive Boltzmann exploration is designed to address Q-value overestimation and balance exploration-exploitation trade-offs.3. A three-dimensional joint optimization framework for beam direction, jamming channel, and transmit power is proposed, and simulation results validate that the proposed method achieves approximately 90% jamming success rate, outperforming independent learning.

    参考文献
    相似文献
    引证文献
引用本文

戴进,冯智斌,余帅,童晓兵,徐逸凡,龚玉萍,李欣然.基于波束-信道-功率联合优化的多干扰机协同决策方法[J].数据采集与处理,2026,(3):687-700

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2026-04-12
  • 最后修改日期:2026-05-10
  • 录用日期:
  • 在线发布日期: 2026-06-10