Speech Transmission System Based on Personalized Federated Learning and Semantic Communication
CSTR:
Author:
Affiliation:

School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

Clc Number:

TN929.5

Fund Project:

National Natural Science Foundation of China (No.62071242)。

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In multi-user speech transmission scenarios, the statistical heterogeneity of data among different users results in the transmission performance degradation if a uniform semantic communication based speech transmission model is used by all users. To address this problem, this paper proposes a novel deep learning-based semantic communication system using federated learning based on hypernetworks (DeepSC-FedHN), enabling each user to obtain a personalized model adaptive to its own data characteristics without compromising data privacy. Specifically, considering that different modules of the semantic encoder play different roles in extracting semantic information, the edge server employs a per-user hypernetwork to generate a personalized aggregation weight matrix by dynamically evaluating the importance of each module in the semantic encoder. The generated aggregation weight matrix is then used to update the corresponding model parameters, effectively tailoring the global knowledge to different users’ needs. Concurrently, since the channel codec and semantic decoder are not involved in extracting the semantic features of each local users’ data, the standard federated averaging (FedAvg) algorithm is used to perform weighted aggregation and updates on the channel codecs and semantic decoders of all the users. Experimental results on TIMIT and Edinburgh DataShare datasets show that the proposed DeepSC-FedHN scheme leads to significant improvement of speech transmission performance. Specifically, it outperforms conventional local training, the standard FedAvg approach, the federated proximal (FedProx) method, and the layer-wise personalized FL scheme (DeepSC-pFedLA) in terms of perceptual evaluation of speech quality (PESQ), signal-to-distortion ratio (SDR) and short time objective intelligibility (STOI), particularly in non-independent and identically distributed (non-IID) data settings. Additionally, the proposed DeepSC FedHN model exhibits better generalization ability for unseen speakers’ data and also demonstrates significantly lower computational overhead for model aggregation compared to the DeepSC pFedLA. We conclude that the integration of a hypernetwork for generating personalized weights offers a highly effective mechanism for tackling data heterogeneity in federated semantic communication systems, leading to superior and more adaptable speech transmission performance while fully preserving user data privacy.

    Reference
    Related
    Cited by
Get Citation

LIU Yuezhao, GUO Haiyan, WANG Tianshun, CHEN Feifei. Speech Transmission System Based on Personalized Federated Learning and Semantic Communication[J]. Journal of Data Acquisition and Processing,2026,(1):117-131.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 29,2024
  • Revised:February 18,2025
  • Adopted:
  • Online: March 01,2026
  • Published:
Article QR Code