Speech Deepfake Attribution: The State of the Art and Prospects

doi:10.16337/j.1004-9037.2026.02.005

Home > Archive>Volume , Issue 2, 2026 >347-370. DOI:10.16337/j.1004-9037.2026.02.005

Speech Deepfake Attribution: The State of the Art and Prospects
DOI:
                        10.16337/j.1004-9037.2026.02.005
                    
CSTR:
                        
Author:
                        
Affiliation:1Army Engineering University of PLA, Nanjing 210007, China;2Information Support Force Engineering University, Wuhan 430000, China
Clc Number:TN912
Fund Project:National Natural Science Foundation of China (Nos.62371469，62071484).

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the rapid evolution of generative artificial intelligence， speech deepfake technologies have achieved unprecedented realism， enabling the synthesis of highly natural and speaker-specific speech from only a few seconds of reference audio. While traditional countermeasures have primarily focused on binary detection—such approaches are insufficient for forensic investigation， legal accountability， and security governance. In real-world adversarial scenarios， it is not enough to determine whether speech is fake； it is equally critical to identify how it was generated， whose voice characteristics were exploited， and which specific model instance may have been involved. This paradigm shift from “detection” to “attribution” marks a fundamental transformation in speech security research. This paper presents a comprehensive survey of speech deepfake attribution， systematically organizing the field into a hierarchical forensic framework that includes three progressive tasks： forgery method attribution， source speaker attribution， and model inversion. Forgery method attribution aims to identify the generative architecture or vocoder family responsible for producing the fake speech by exploiting intrinsic “model fingerprints” embedded in spectral， temporal， and phase domains. Source speaker tracing focuses on recovering or verifying the identity of the original speaker whose voice was converted， leveraging residual prosodic， behavioral， and physiological cues that survive imperfect disentanglement in voice conversion systems. Model inversion represents a deeper forensic objective， attempting to infer specific model parameters or configurations from generated speech， thereby bridging the gap between class-level attribution and instance-level accountability. From both the perspectives of generative model mechanisms and physical acoustic characteristics of speech signals， the feasible core principles for each subtask are elaborated. Different dimensions， such as architectural frameworks and training strategies， are distinguished to systematically organize the research status， mainstream methodologies， and technological evolution paths of each subtask. Furthermore， benchmark datasets and evaluation metrics for both closed-set and open-set scenarios are systematically summarized. Finally， the paper discusses emerging challenges such as open-world generalization， robustness under complex channel distortions and neural codecs， adversarial attacks， and ethical constraints related to privacy and legal admissibility. Future directions are outlined toward proactive traceability， model-level reverse engineering， robust feature disentanglement， and the integration of active watermarking with passive forensic techniques. The survey aims to provide a structured roadmap for advancing speech deepfake attribution and fostering a trustworthy digital speech ecosystem.Highlights：1. A hierarchical framework for speech deepfake attribution is systematically established， unifying forgery method attribution， source speaker tracing， and model inversion into a progressive forensic paradigm beyond binary real/fake detection.2. The intrinsic mechanisms of attribution are analyzed from generative model fingerprints and acoustic signal characteristics， revealing how architectural design， training strategies， and inference processes leave distinguishable trace patterns.3. Open-world robustness， complex channel conditions， and model instance reverse engineering are identified as key challenges， with future directions proposed toward proactive traceability and a comprehensive speech security defense ecosystem.

Reference

Cited by

Get Citation

ZHANG Xiongwei, ZHANG Qiang, SUN Meng, YANG Jibin, LI Yihao, GE Xiaoyi. Speech Deepfake Attribution: The State of the Art and Prospects[J]. Journal of Data Acquisition and Processing,2026,(2):347-370.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:January 10,2026
Revised:February 27,2026
Adopted:
Online: April 15,2026
Published:

For Authors

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code