In recent years, the emergence of the Transformer model has significantly enhanced the accuracy of automatic speech recognition technology. This research aims to address the critical security vulnerabilities in Transformer-based automatic speech recognition systems by enhancing the transferability of universal speech adversarial examples. While Transformer models have significantly advanced speech processing, their susceptibility to universal adversarial perturbations remains a major concern. To exploit these weaknesses effectively, we propose a novel attack framework that leverages the structural commonalities of Transformer architectures. First, we implement a feature-level disruption strategy that maximizes the dissimilarity between perturbed and original speech within the middle-layer representations. By altering these latent representation patterns, the attack successfully shifts the internal decision boundaries of models. Second, given that sample-dependent semantic information often inhibits the generalization of universal noise, we introduce an attention gradient control mechanism. This mechanism strategically weakens the gradients associated with semantic context features, forcing the perturbation to capture underlying, sample-independent acoustic vulnerabilities instead. Finally, experimental evaluations conducted on LibriSpeech demonstrate the superior performance of the proposed method. The results indicate that our approach achieves an average word error rate of 80.6% across multiple target models, representing a 36.6% improvement in transferability compared to existing baseline universal attacks. These findings conclude that the targeted manipulation of middle-layer features combined with the suppression of semantic dependencies is a highly effective strategy for cross-model adversarial threats.Highlights:1. Propose a novel framework of universal speech adversarial attacks that maximizes middle-layer feature dissimilarity to exploit the structural similarities inherent in Transformer-based speech recognition models.2. Introduce a targeted attention gradient control mechanism to decouple sample-independent acoustic features from sample-dependent semantic context, significantly boosting attack transferability.3. Achieve a substantial increase in universal attack success rates across diverse Transformer architectures, outperforming traditional universal perturbation methods.