Abstract:This paper presents the study of posteriorgram features optimization based on nonnegative matrix factorization (NMF) algorithm and modified segmental dynamic time warping (SDTW) detection for unsupervised query-by-example spoken term detection. First, a Gaussian mixture model (GMM) is trained with frequency domain linear prediction (FDLP) acoustics feature parameters instead of Mel-frequency cepstral coefficients (MFCCs). Then the NMF algorithm is applied to the generated Gaussian posteriorgram matrix, and the derived base matrix is used as a subspace transform matrix for projection of raw feature. The projection can highlight the primary component of features and smooth the distance matrix. In the detecting phase, the best matching score is modified by using multi adjacent output scores, instead of the 1-best output score for normal SDTW. Experimental results show that without affecting detection time, the proposed method consistently outperforms the baseline systems with MFCCs and FDLP features with the detection precision improved by 18.6% and 18.1% respectively.