基于联邦分割学习与低秩适应的RoBERTa预训练模型微调方法
作者:
作者单位:

上海科技大学信息科学与技术学院, 上海 201210

作者简介:

通讯作者:

基金项目:


Fine-Tuning Method for Pre-trained Model RoBERTa Based on Federated Split Learning and Low-Rank Adaptation
Author:
Affiliation:

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    微调后的大语言模型(Large language models, LLMs)在多任务中表现出色,但集中式训练存在用户隐私泄漏的风险。联邦学习(Federated learning, FL)通过本地训练避免了数据共享,但LLMs庞大的参数量对资源受限的设备和通信带宽构成挑战,导致在边缘网络中部署困难。结合分割学习(Split learning, SL),联邦分割学习可以有效解决这一问题。基于模型深层权重的影响更为显著,以及对部分层的训练准确率略低于整体模型训练的发现,本文按照Transformer层对模型进行分割,同时引入低秩适应(Low-rank adaption, LoRA)进一步降低资源开销和提升安全性。因此,在设备端,仅对最后几层进行低秩适应和训练,然后上传至服务器进行聚合。为了降低开销并保证模型性能,本文提出了基于联邦分割学习与LoRA的RoBERTa预训练模型微调方法。通过联合优化边缘设备的计算频率和模型微调的秩,在资源受限的情况下最大化秩,提高模型的准确率。仿真结果显示,仅训练LLMs最后3层的情况下,在一定范围内(1~32)增加秩的取值可以提高模型的准确率。同时,增大模型每轮的容忍时延和设备的能量阈值可以进一步提升模型的准确率。

    Abstract:

    Fine-tuned large language models (LLMs) perform exceptionally well in various tasks, but centralized training poses user privacy leakage risks. Federated learning (FL) mitigates data sharing issues through local training, yet the large parameter size of LLMs challenges resource-constrained devices and communication bandwidth, making deployment in edge networks difficult. Considering split learning (SL), federated split learning can effectively address these issues. Given the more pronounced influence of deep-layer model weights and the discovery that training certain layers yields slightly lower accuracy compared to training the entire model, we opt to split the model based on Transformer layers. Additionally, utilizing low-rank adaption (LoRA) can further reduce resource overhead and enhance security. Therefore, at each device, we only perform LoRA and training on the final few layers. These adapted layers are then uploaded to the server for aggregation. From the perspective of cost reduction and ensuring model performance, we propose a fine-tuning method for the pre-trained model RoBERTa based on federated split learning and LoRA. By jointly optimizing the computational frequency of edge devices and the rank of model fine-tuning, we maximize the rank to improve model accuracy under resource constraints. Simulation results indicate that only training the last three layers of the LLMs can improve model accuracy within a certain range (1—32) by increasing the rank. Additionally, increasing the per-round delay and the energy threshold of devices can further enhance model accuracy.

    参考文献
    相似文献
    引证文献
引用本文

谢思静,文鼎柱.基于联邦分割学习与低秩适应的RoBERTa预训练模型微调方法[J].数据采集与处理,2024,(3):577-587

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-03-30
  • 最后修改日期:2024-04-27
  • 录用日期:
  • 在线发布日期: 2024-06-14