基于MAML改进的少样本声音事件检测算法
作者:
作者单位:

宁波大学信息科学与工程学院,宁波 315211

作者简介:

通讯作者:

基金项目:

宁波市重点研发计划(2022Z189)。


Improved Few-Shot Sound Event Detection Algorithm Based on MAML
Author:
Affiliation:

College of Information Science and Engineering, Ningbo University, Ningbo 315211, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    基于深度学习的声音事件检测模型通常需要大量带标注的数据从头进行训练,但是受到数据访问权限、使用许可以及罕见个例样本稀缺等因素制约,获取特定任务的数据成本高昂。为应对声音事件检测中的少样本挑战,本文在与模型无关的元学习(Model-agnostic meta learning, MAML)基础上提出一种模型无关且梯度平衡的元学习算法。该算法利用大量N-way K-shot任务训练模型,使其具备快速学习的能力,仅需少次梯度更新即可在N-way K-shot目标任务中识别未见声音事件。在外循环阶段,多梯度下降算法被用于估计动态损失平衡因子,促使模型关注训练难度更高的少样本任务,从而增强模型的共享表示。本文还融入数据增强和标签平滑,进一步降低少样本引起的过拟合。实验结果表明,该算法在ESC50、NSynth以及DCASE2020三个数据集的5-way 1-shot设定中分别达到73.56%、82.86%以及57.48%准确率,相较于改进前的MAML算法相对准确率提升10%左右。

    Abstract:

    Sound event detection models based on deep learning typically require a substantial mount of labeled data to train from scratch. Access to task-specific data is costly due to restrictions such as data access rights, usage licenses, and the scarcity of rare individual samples. In order to address the challenge of few shot in sound event detection, this paper proposes a model-agnostic and gradient-balanced meta learning algorithm based on model-agnostic meta learning (MAML). This algorithm trains the model with a large quantities of N-way K-shot tasks, enabling it to acquire the ability of rapid learning, accurately discriminating the unheard sound event in the N-way K-shot target task with minimal gradient updates. In the outer loop stage, the multi-gradient descent algorithm is used to estimate the dynamic loss balance factor, encouraging the model to focus on few-shot training tasks that are more difficult to train, thereby enhancing the shared representation of the model. Furthermore, this paper incorporates data augmentation and label smoothing to mitigate the risk of overfitting caused by the scarcity of training samples. Experimental results demonstrate that the algorithm achieves 73.56%, 82.86% and 57.48% accuracies in the 5-way 1-shot setting on the ESC50, NSynth and DCASE2020 datasets, respectively, showing about 10% relative accuracy improvement compared to the previous MAML algorithm.

    参考文献
    相似文献
    引证文献
引用本文

陈豪杰,杨锐,潘善亮.基于MAML改进的少样本声音事件检测算法[J].数据采集与处理,2025,40(3):741-753

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-04-28
  • 最后修改日期:2024-07-17
  • 录用日期:
  • 在线发布日期: 2025-06-13