融合LSTM-GRU网络的语音逻辑访问攻击检测
作者:
作者单位:

1.中国刑事警察学院公安信息技术与情报学院, 沈阳 110854;2.广州市刑事科学技术研究所, 广州 510030

作者简介:

通讯作者:

基金项目:

国家重点研发计划(2017YFC0821000);广州市科技计划(2019030004);辽宁网络安全执法协同创新中心(WXZX-201807003);司法部司法鉴定重点实验室(司法鉴定科学研究院)开放基金;中国刑事警察学院研究生创新能力提升项目。


Logical Access Attack Audio Detection Based on LSTM-GRU
Author:
Affiliation:

1.Video and Audio Material Examination Department, Criminal Investigation Police University of China, Shenyang 110854, China;2.Criminal Science and Technology Institute of Guangzhou, Guangzhou 510030, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为进一步提高语音欺骗检测的准确率,提出一种融合LSTM-GRU网络的语音逻辑访问攻击(语音转换、语音合成)检测方法。融合LSTM-GRU网络是由长短期记忆网络( Long short-term memory,LSTM)层、门控循环神经单元(Gated recurrent unit,GRU)层、丢弃层、批归一化层和全连接层串联结合的一种混合网络,其中LSTM层可以解决语音序列中的长时依赖问题,GRU层则可降低模型参数量。实验在ASVspoof2019 LA数据集上进行,提取20维的梅尔倒谱系数特征用于模型训练,在测试阶段使用训练好的LSTM-GRU模型对测试集中的语音进行欺骗检测。与GRU网络及LSTM网络的比较结果表明:LSTM-GRU网络在3种网络模型中正确识别率最高,等错误率(Equal error rate, EER)比ASVspoof2019挑战赛所提供基线系统低27.07%,对逻辑访问攻击语音检测的平均准确率达到98.04%,并且融合LSTM-GRU网络具备训练时间短、防止过拟合及稳定性高等优点。结果证明本文方法可有效应用于语音逻辑访问攻击检测任务中。

    Abstract:

    In order to improve the accuracy of speech spoofing detection, a speech spoofing detection method based on LSTM-GRU network is proposed. LSTM-GRU network is a hybrid network combining long short-term memory(LSTM) layer, gated recurrent unit (GRU) layer, dropout layer, batch normalization layer and dense layer in series. LSTM layer can solve the problem of longtime dependence in speech sequence, while GRU layer can reduce the number of model parameters. The experiment is conducted on the ASVspoof2019 LA dataset, and the 20-dimensional Mel-frequency cepstral coefficient features are extracted for model training. In the test stage, the trained LSTM-GRU model is used for deception detection of the speech in the test set. By comparing with separate GRU and LSTM networks, the results show that: LSTM-GRU network achieves the highest correct recognition rate among the three network models; the equal error rate is 27.07% lower than the baseline system provided by the ASVspoof2019 challenge; the average accuracy of speech detection for logical access attack is 98.04%; LSTM-GRU network has the advantages of short training time, over-fitting prevention and high stability. It is proved that the proposed method can be effectively applied to speech logical access attack detection task.

    参考文献
    相似文献
    引证文献
引用本文

杨海涛,王华朋,牛瑾琳,楚宪腾,林暖辉.融合LSTM-GRU网络的语音逻辑访问攻击检测[J].数据采集与处理,2022,37(2):396-404

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-06-22
  • 最后修改日期:2021-11-05
  • 录用日期:
  • 在线发布日期: 2022-04-11