基于深度学习的声源定位与跟踪综述
作者:
作者单位:

大连理工大学信息与通信工程学院,大连116024

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62271103, 61871066)。


Sound Source Localization and Tracking Based on Deep Learning: A Survey
Author:
Affiliation:

School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, China

Fund Project:

National Natural Science Foundation of China (Nos.62271103, 61871066).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    声源定位与跟踪是机器听觉获取空间信息的重要途径之一。随着多麦克风设备与语音交互、会议系统和声学监测等应用的发展,在复杂声场条件下对声源方向与位置进行稳定估计的需求持续增加。基于此,本文对基于深度学习的声源定位与跟踪相关技术进行了系统综述。现有综述多聚焦于声源定位,而对基于深度学习的声源跟踪研究缺乏系统梳理。针对这一不足,本文将声源定位与跟踪纳入统一框架进行综合分析。首先,概述了声源定位与跟踪的基本问题定义与传统方法框架。然后,从输入表征、模型结构与学习目标三个角度,介绍了深度学习方法在特征设计、网络建模以及训练策略方面的主要路线。接着,总结了常用数据集、实验设置与评价指标,并讨论不同条件下结果对比的注意事项。最后,对声源定位与跟踪技术进行总结,并对未来可能的研究方向进行展望。

    Abstract:

    Sound source localization and tracking constitute an important means for machine hearing to acquire spatial information. With the growing adoption of multi-microphone devices in applications such as speech interaction, conference systems, and acoustic monitoring, the demand for stable estimation of a sound source’s direction and position in complex acoustic environments continues to increase. Accordingly, this paper presents a systematic review of deep-learning-based techniques for sound source localization and tracking. Existing review articles have mainly focused on sound source localization, whereas deep-learning-based sound source tracking has not yet been systematically reviewed. To fill this gap, this paper presents a unified analysis of both sound source localization and tracking. First, the fundamental problem formulation and the framework of traditional approaches are outlined. Then, from the perspectives of input representation, model architecture, and learning objectives, the main lines of deep learning methods are introduced with respect to feature design, network modeling, and training strategies. Next, commonly used datasets, experimental settings, and evaluation metrics are summarized, and key considerations for comparing results under different conditions are discussed. Finally, the reviewed techniques are summarized and potential future research directions are outlined.Highlights:1.This paper systematically reviews research on deep learning-based sound source localization and tracking, with particular emphasis on the technological evolution from instantaneous spatial localization to continuous trajectory estimation.2.The development of mainstream methods is summarized from the perspectives of input representation, network architecture, and temporal modeling, covering typical deep learning models such as CNN, RNN/LSTM, CRNN, and Transformer.3.The performance advantages of deep learning-based methods in noisy, reverberant, multi-source-overlapping, and dynamic scenarios are summarized, and future directions are identified, including robustness in real-world environments, generalization ability, and lightweight deployment.

    参考文献
    相似文献
    引证文献
引用本文

陈喆,宋登鏊,王一宇,殷福亮.基于深度学习的声源定位与跟踪综述[J].数据采集与处理,2026,(2):371-396

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2026-01-09
  • 最后修改日期:2026-02-26
  • 录用日期:
  • 在线发布日期: 2026-04-15