Speech Transmission System Based on Personalized Federated Learning and Semantic Communication

doi:10.16337/j.1004-9037.2026.01.008

Home > Archive>Volume , Issue 1, 2026 >117-131. DOI:10.16337/j.1004-9037.2026.01.008

Speech Transmission System Based on Personalized Federated Learning and Semantic Communication
DOI:
                        10.16337/j.1004-9037.2026.01.008
                    
CSTR:
                        
Author:
                        
Affiliation:School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
Clc Number:TN929.5
Fund Project:National Natural Science Foundation of China (No.62071242)。

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In multi-user speech transmission scenarios， the statistical heterogeneity of data among different users results in the transmission performance degradation if a uniform semantic communication based speech transmission model is used by all users. To address this problem， this paper proposes a novel deep learning-based semantic communication system using federated learning based on hypernetworks （DeepSC-FedHN）， enabling each user to obtain a personalized model adaptive to its own data characteristics without compromising data privacy. Specifically， considering that different modules of the semantic encoder play different roles in extracting semantic information， the edge server employs a per-user hypernetwork to generate a personalized aggregation weight matrix by dynamically evaluating the importance of each module in the semantic encoder. The generated aggregation weight matrix is then used to update the corresponding model parameters， effectively tailoring the global knowledge to different users’ needs. Concurrently， since the channel codec and semantic decoder are not involved in extracting the semantic features of each local users’ data， the standard federated averaging （FedAvg） algorithm is used to perform weighted aggregation and updates on the channel codecs and semantic decoders of all the users. Experimental results on TIMIT and Edinburgh DataShare datasets show that the proposed DeepSC-FedHN scheme leads to significant improvement of speech transmission performance. Specifically， it outperforms conventional local training， the standard FedAvg approach， the federated proximal （FedProx） method， and the layer-wise personalized FL scheme （DeepSC-pFedLA） in terms of perceptual evaluation of speech quality （PESQ）， signal-to-distortion ratio （SDR） and short time objective intelligibility （STOI）， particularly in non-independent and identically distributed （non-IID） data settings. Additionally， the proposed DeepSC FedHN model exhibits better generalization ability for unseen speakers’ data and also demonstrates significantly lower computational overhead for model aggregation compared to the DeepSC pFedLA. We conclude that the integration of a hypernetwork for generating personalized weights offers a highly effective mechanism for tackling data heterogeneity in federated semantic communication systems， leading to superior and more adaptable speech transmission performance while fully preserving user data privacy.

Reference

Cited by

Get Citation

LIU Yuezhao, GUO Haiyan, WANG Tianshun, CHEN Feifei. Speech Transmission System Based on Personalized Federated Learning and Semantic Communication[J]. Journal of Data Acquisition and Processing,2026,(1):117-131.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 29,2024
Revised:February 18,2025
Adopted:
Online: March 16,2026
Published:

For Authors

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code