Sound Source Localization and Tracking Based on Deep Learning: A Survey
CSTR:
Author:
Affiliation:

School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, China

Clc Number:

TN912.3

Fund Project:

National Natural Science Foundation of China (Nos.62271103, 61871066).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Sound source localization and tracking constitute an important means for machine hearing to acquire spatial information. With the growing adoption of multi-microphone devices in applications such as speech interaction, conference systems, and acoustic monitoring, the demand for stable estimation of a sound source’s direction and position in complex acoustic environments continues to increase. Accordingly, this paper presents a systematic review of deep-learning-based techniques for sound source localization and tracking. Existing review articles have mainly focused on sound source localization, whereas deep-learning-based sound source tracking has not yet been systematically reviewed. To fill this gap, this paper presents a unified analysis of both sound source localization and tracking. First, the fundamental problem formulation and the framework of traditional approaches are outlined. Then, from the perspectives of input representation, model architecture, and learning objectives, the main lines of deep learning methods are introduced with respect to feature design, network modeling, and training strategies. Next, commonly used datasets, experimental settings, and evaluation metrics are summarized, and key considerations for comparing results under different conditions are discussed. Finally, the reviewed techniques are summarized and potential future research directions are outlined.Highlights:1.This paper systematically reviews research on deep learning-based sound source localization and tracking, with particular emphasis on the technological evolution from instantaneous spatial localization to continuous trajectory estimation.2.The development of mainstream methods is summarized from the perspectives of input representation, network architecture, and temporal modeling, covering typical deep learning models such as CNN, RNN/LSTM, CRNN, and Transformer.3.The performance advantages of deep learning-based methods in noisy, reverberant, multi-source-overlapping, and dynamic scenarios are summarized, and future directions are identified, including robustness in real-world environments, generalization ability, and lightweight deployment.

    Reference
    Related
    Cited by
Get Citation

CHEN Zhe, SONG Dengao, WANG Yiyu, YIN Fuliang. Sound Source Localization and Tracking Based on Deep Learning: A Survey[J]. Journal of Data Acquisition and Processing,2026,(2):371-396.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 09,2026
  • Revised:February 26,2026
  • Adopted:
  • Online: April 15,2026
  • Published:
Article QR Code