Abstract:A noise-robust fingerprint-factor-based audio feature and a semi-supervised audio dictionary training algorithm are proposed to fill up the deficiency caused by noise in content-based audio retrieval. The proposed method extracts audio fingerprint from Mel spectra and utilizes non-negative matrix factorization to factorize fingerprint into noise-robust spectral factor and temporal factor as features. Also an semi-supervised audio dictionary training algorithm is proposed. It uses an audio effect set to calculate the distribution of basic sound effects as initialized dictionary. The quantization is conducted while the dictionary is dynamically updated at the same time to better characterize data. The experimental results show that under low signal-to-noise ratio (SNR), the proposed method significantly improves the average precision compared with other algorithms.