Abstract:To cope with the challenges brought by increasingly intelligent multifunctional radars to the opposing side, this article proposes an jamming decision-making method based on the Proximal Policy Optimization (PPO) algorithm and the Mask-TIT network. Firstly, starting from a realistic scenario, the adversarial scene between the jammer and the radar is modeled as a Partially Observable Markov Decision Process (POMDP), a new state transition function and reward function are designed based on the working principles of the radar, and the observation space is designed according to the hierarchy of the multifunctional radar model. Secondly, a Mask-TIT network structure is designed using the Transformer's representation capacity for sequence data and the characteristics of radar jamming patterns, which is used to build a more powerful Actor-Critic network architecture. Finally, the Proximal Policy Optimization algorithm is used for optimization learning. The experiment shows that compared with existing methods, the algorithm reduces the average amount of interactive data required for convergence by 25.6%, and the variance after convergence is significantly reduced.