2016, 31(2):231-241.
Abstract:Acoustic event detection refers to the task of detecting each semantic segment in an audio stream and associating it with a classification label. Acoustic event detection is a fundamental technique for sound scene recognition and semantic understanding, and it is very promising in many application fields, such as the semantic understanding of the environmental sounds for a human-like robot, the context aware of sounds in the travelling environment for an unmanned vehicle. In this paper, the history of acoustic event detection is reviewed from the point of view of related fields and application requirements, meanwhile, the typical works of acoustic event detection is introduced, and the future research of acoustic event detection is analyzed. In the analysis of related fields, we focus on the researches of speech recognition, music processing based on computation, and sound processing based on auditory. In the application requirements, we introduce the works of context aware of sounds and multimedia information retrieval. Finally, the state of the art in acoustic event detection is analyzed, and its future research fields is predicted.
Zou Cairong , Liang Ruiyu , Xie Yue
2016, 31(2):242-251.
Abstract:As the world populatian aging, hearing impairments become a high incidence chronic disease. Hearing aids is one of the most effective means of hearing intervention and hearing rehabilitation for presbycusis hearing patients. Various techniques of hearing aids have advanced significantly over the past decades, primarily thanks to the maturing of signal processing technology and electronic technology. Among these technologies, sound classfication, filter decomposition, noise suppression and echo cancellation are four basic algorithms for hearing aids. Based on deep understanding, we elaborate the algorithms in terms of aspects: the basic principles, the current research status, features and problems. In addition, by analyzing the current problems of hearing aids, three new research direction, auditory bionics, auditory cognition and selffitting hearing aids, are outlooked and briefly introduced.
Bao Yongqiang , Liang Ruiyu , Cong Yun , Gao Chonghong,Wang Qinyun
2016, 31(2):252-259.
Abstract:The latest research progress in audio forensics is introduced with the emphasis on audio authenticity. First, the history of audio forensics research is reviewed. The classification of audio forensics is discussed. Then, the framework of audio forensics is designed. Several key technologies of audio forensics are summarized including audio active forensics technology,audio tamper technology based on electrical network frequency (ENF),audio tamper detection technology with different sampling rates and audio tamper detection technology with the same sampling rates under the passive power grid frequency components,the characteristic parameters of recording equipment,pattern recognition ,situation of database construction , recording environment identification and so on.Finally,the prospective of audio forensics technology is presented.
Tao Zhi , Zeng Xiaoliang , Gu Lingling , Zhang Xiaojun , Wu Di , Xue Longji
2016, 31(2):260-267.
Abstract:To provide the basis for parameter selection of pathological voice recognition, an asymmetric modeling method is proposed to simulate diseased vocal fold. According to the layered structure and tissue properties of the vocal fold, a mechanical model is set up to produce the voice source with the straight airflow expelled by lungs. An inversion procedure adopting genetic particle swarm optimization based on quasi-Newton method (GPSO-QN) is developed to adjust the parameters of the vocal fold model and to reproduce the targeted voice source. Experimental results show that the vocal fold mechanical model can produce the voice source that is consistent with the target. In addition, the optimized parameter sets show that the asymmetries of two opposing vocal folds result in the pathology voice.
Zhang Xiaofei , Li Shu , Zheng Wang
2016, 31(2):268-275.
Abstract:We combine the parallel factor framework with the compressed sensing theory to solve the problem of the direction of arrival estimation for the electromagnetic vector sensor array. We first rearrange the received data matrix as a parallel factor model, and compress it to a smaller one based on the compressed sensing theory. Then the trilinear alternating least square algorithm is exploited to decompose the compressed parallel factor model. Finally, the angle estimation is obtained with sparsity. Owing to compression, the computational complexity of the algorithm is lower than that of the conventional parallel factor model-based algorithm, and more storage memory is saved. The algorithm needs no peak searching and is applicable to both uniform and non-uniform linear array. Moreover, the angle estimation performance of the proposed algorithm is better than that of the ESPRIT algorithm and close to that of the conventional parallel factor model-based algorithm, which can be verified by various simulations.
Shu Feng , Cui Yudi , Qian Zhenyu , Lu Zaoyu , Zhou Ye , Hu Jinsong , Liu Miao
2016, 31(2):276-281.
Abstract:Compared to half-duplex relay systems, full-duplex relay systems can greatly improve the spectral efficiency. However, the information leakage between transmitter and receiver of relay degrade the performance of full-duplex system. To deal with the self-interference and enhance the system achievable rate in full-duplex MIMO relay system with decoded-and-forward strategy, an iterative beamforming structure at the relay is proposed. In this structure, the received and transmit beamforming at the relay are optimized with minimum mean square error (MMSE) criterion over both uplink and downlink (called MMSE plus MMSE). And then the two beamforming matrices are combined for optimal solutions. Simulation results show that the proposed MMSE plus MMSE performs better than existing schemes like null space projection (NSP) and maximum signal-to-interference ratio (Max-SIR). For example, The proposed algorithm harvests about 0.8 bps/Hz gain over the Max-SIR when SNR is high. At the BER=10-3, the proposed scheme harvests about 1.5 dB SNR gain over the Max-SIR.
Sun Tongjing, He Jinpeng, Gu Yu
2016, 31(2):282-288.
Abstract:For ultra-low-SNR underwater weak signal processing problem, an underwater echo signal processing method is presented based on the theory of sparse decomposition and the combined matched pursuit method. The focus is how to integrate the prior information, such as the incident signal and the echo model, into the sparse dictionary (atoms). First, the highlight model of underwater echo signal is established, the relation between the echo model and incident signal is obtained, and the over complete dictionary fitting for echo signal characteristics is structured by discretizing, energy normalizing and shifting the known transmitting signal. And then, the sparse decomposition of underwater echo signal is conducted based on the matched pursuit method, and the processing results are compared and analyzed with the commonly used matched filter methods. The simulation results show that the proposed method can accurately reconstruct the original echo signal, and has obvious advantage in processing underwater echo signal with ultra-low SNR.
Guo Dongliang , Huang Chao , Li Zhonghua , Zhang Tiejun
2016, 31(2):289-295.
Abstract:Aiming at the low accuracy of interferometer direction finding (DF) with low signal-to-noise ratio (SNR), a new adaptive direction finding method is proposed based on SNR estimation and phase-difference vector averaging. This method can enhance the accuracy and stability of the phase-difference measurement through multiple measuring and averaging the phase-difference complex vectors, which can improve the performance of the direction finding. The proposed adaptive criterion can estimate SNR of the arrival signal and quickly determine the required sample size, therefore adjust the sample size adaptively at different SNRs and obtain the stable accuracy of DF. The effect of SNR threshold on the performance of the proposed method is analyzed. The method holds a low computational complexity and has little influence on real-time of DF.Theoretical analysis and simulation results reveal that the method can achieve high accuracy even under very low SNR conditions, and improve the DF performance obviously.
Liu Quanchao , Huang Heyan , Wang Yashen , Feng Chong
2016, 31(2):296-306.
Abstract:An algorithm based on statistics and rules is proposed to automatically identify maximal-length evaluation phrase. The identification of evaluation phrase is taken as sequence tagging problem. Then conditional random field model is used to recognize evaluation phrase with simple structure. Therefore, rule database is established and maximal-length evaluation phrase with complex structure is identified automatically. F-measure value reaches 72.38%. Based on the above work, rule base is constructed for extracting opinon target and appraisal expression. Rule-based extracting appraisal expression is proposed to automatically extract opinion target and maximal-length evaluation phrase. Experiments were conducted at netease car portal and got a higher precision.
Yuan Fei , Chen Weiling , Li Ye , Cheng En
2016, 31(2):307-314.
Abstract:The real-time measurement of quality of the underwater acoustic voice communication is crucial to communication quality. Real-time measurements can help to adjust voice modulation parameters timely and improve the adaptive ability of link. An objective assessment model for voice quality is proposed based on parameter extraction, based on the underwater acoustic channel characteristics. Three feature parameters of voice are extracted: Mel-frequency cepstrum coefficient(MFCC), linear predictive cepstrum coefficient(LPCC ) and log spectral deviation(LSD). The three parameters form the weight spectral distortion evaluation. the mapping relationship of distortion evaluation and receive voice quality of perceptual evaluation of speech quality -mean opinion score (PESQ-MOS) is used to conduct quantization. Dynamic Mel-frequency cepstrum coefficient(DMFCC) spectral distortion evaluation is also introduced as regulatory factor, which improves the adapt-ability. The results of simulation and sea test show that the measured MOS via the assessment model is close to PESQ-MOS, which indicated the model has practical value.
Feng Chao , Wen Yimin, Tang Lingbing
2016, 31(2):315-324.
Abstract:Recurring concept drift is one of the sub-types of concept drift. In recurring concept drift detection, it is very important to represent concepts and select the most appropriate classifier to classify. We propose an algorithm, conceptual clustering and prediction through main feature extraction (MFCCP), for classifying data stream with recurring concept drifts. MFCCP can recognize recurring concepts by computing the differences of main features and impact factors of different batches of samples. It maintains a classifier for each concept and monitors the classification accuracy to select classifier according to hoeffding inequality in order to enhance the ability of adapting to concept drift. The experimental results over the three datasets illustrate that MFCCP achieves better classification accuracy, adapts faster to concept drift, and detects concept drift more accurately than the other four algorithms on the data streams with recurring concept drifts, and therefore, MFCCP is apt to classify data stream without recurring concept drift.
Song Peng,Jin Yun,Zha Cheng,Zhao Li
2016, 31(2):325-330.
Abstract:In speech emotion recognition system, recognition rates will drop drastically when the training and the testing utterances are from different corpora. To solve this problem, a novel sparse feature transfer approach is proposed. By employing sparse coding algorithm, the common sparse feature representation of emotion features from different corpora is obtained. Meanwhile, the maximum mean discrepancy (MMD) algorithm is introduced to measure the distance between different distributions, and is used as the regularization term for the objective function of sparse coding. Finally, the robust sparse features are achieved for recognition. Experimental results show that, compared to traditional methods, the proposed approach can significantly improve the recognition rates for cross databases.
2016, 31(2):331-337.
Abstract:Bottleneck (BN) features based on the middle layer of deep neural network have been widly applicated to large vocabulary continuous speech recognition (LVCSR), because they can use the traditional Gaussian mixture density hidden Markov model (GMM-HMM) for acoustic modeling. In order to extract discriminative bottleneck features, the parameters of the BN feature extractor and GM M-HMM are optimized jointly by using the minimum phone error (MPE) criterion after training the GMM-HMM using the conventional BN features. Different from other discriminative training method, large batches instead of mini-batch in conventional neural network optimization are used to obtain the statistics, which accelerates training speed. Experiments demonstrate that the proposed bottleneck feature extractor can outperform the traditional methods with 9% relative word error reduction.
Zhang Xinran , Dai Yuehua , Zhang Mengbo , Yang Xiaoj ing
2016, 31(2):338-346.
Abstract:Since SNR is playing an important role in signal processing for OFDM, a kind of SNR estimation is proposed based on the guard interval. The channel model and the guard interval are analyzed. Firstly, two OFDM signals with different guard intervals are discriminated by using correlation function. Then, according to the structure characteristics of two different guard intervals, two kinds of SNR estimations are put forward. Finally, the SNR estimation is completed by combining the discriminant results with estimation algorithm. The simulation results show that the algorithm does not need auxiliary data and has good performance in different SNR conditions.
Zhang Jian , Qu Dan , Li Zhen
2016, 31(2):347-354.
Abstract:Recurrent neural network language model (RNNLM) is an important method in statistical language models because it can tackle the data sparseness problem and contain a longer distance constraints. However, it lacks practicability because the lattice has to expand too many times and explode the sea rch space. Therefore, a N-best rescoring algorithm is proposed which uses the RNNLM to rerank the recognition results and optimize the decoding process. Experimental results show that the proposed method can effectively reduce the word error rate of the speech recognition system.
Zhu Tianyi , Lu Jing , Chen Kai
2016, 31(2):355-361.
Abstract:One of the most important functions of the two-loudspeaker system is to realize stereo playback in a specific region. Through sending the ipsilateral channel of stereo signals to the corresponding ear of the listener. The fundamental challenge is to cancel the crosstalk of signals from two loudspeakers. Usually, crosstalk cancellation filter is employed based on the inverse of the transfer matrix from sources to ears.Nevertheless, common used crosstalk cancellation will cause severe spectral coloration which brings many negative effects, e.g. weakening system robustness, shrinking sweet spot area and losing considerable dynamic range. Therefore, two different regularization methods based on measured transfer functions are investigated. Experiments are carried out to compare the integrated performance of the spectral coloration and the crosstalk cancellation performance. Interaural time difference (ITD) is also utilized to validate the efficacy of the improved stereo sound reproduction system.
Zhang Qingfang , Zhao Heming , Gong Chenghui
2016, 31(2):362-369.
Abstract:Aiming at the mismatch between training speech and test speech from different speaking manners, a kind of feature processing algorithm is proposed based on joint factor analysis and feature mapping. The speaking mode information is extracted by joint factor analysis algorithm, then the speaking mode factor and space are optimized. Before training and test,the feature is mapped by speaking mode inf ormation to reduce the speaking mode effects. The experimental results show that the proposed algorithm can effectively extract the speaker information of the training speech, and improve the recognition rate of whispered speaker recognition system.
Xu Chenglong , Cheng Yunpeng , Dong Wenbin , Sun Hao
2016, 31(2):370-376.
Abstract:The channel exploration problem is analyzed for the distributed HF opportunistic spectrum access (OSA) system. Due to the scarcity of the spectrum resources, applying the cognitive radio technology to the HF communication system has called extensive attentions.Multiple secondary users (SUs) sequentially sense multiple licensed channels. Then the system decides whether the channels can be used based on the sensing results. Thus, the data can be transmitted utilizing available spectrum bands by using spectrum aggregation technology. However, the ability of spectrum aggregation is constrained by hardware limitations. Therefore, a dynamic stopping approach is proposed considering the interaction among the SUs under the constraint of the hardware limitations. In the proposed stopping approach, the channel-free probability can vary with the process of channel exploration and the SUs can periodically release the previously sensed channels. Moreover, simulation results show that the throughput performance of the HF communication system can be effectively improved by the proposed dynamic stopping approach.
Zhang Chaoran, Cheng Jinfang, Xiao Dawei
2016, 31(2):377-384.
Abstract:Since conventional amplitude and phase estimation(APES) algorithm cannot be applied to a non-uniform array, a vector APES(VAPES) based on vector-hydrophone is proposed. The phase difference between acoustic pressure and acoustic particle velocity received by vector-hydrophone is irrelative to the locations of sensors, thus the pressure and velocity channel can be used as two sub-arrays, and applied to non-uniform array. The array gain and its stability are analyzed. The simulations indicate that compared to conventional APES algorithm, the method can be applied to non-uniform array, and its array gain is higher; compared to minimum variance distortionless response(MVDR) algorithm, the method is more robust, can handle the coherent situation, and abtain more accrate signal power. The practical data verifies the validity of VAPES.
Sun Xiaohui , Ling Zhenhua , Dai Lirong
2016, 31(2):385-392.
Abstract:A unit selection speech synthesis method is presented using an automatic error detection. It aims to design a unit selection criterion consistent with the subjective perception of listeners so as to improve the naturalness of synthetic speech. Firstly, crowdsourcing platform, instead of linguistics experts in the traditional approach is facilitated to collect mass perceptual data efficiently. Then, a synthetic error detector based on a support vector machine(SVM) classifier is constructed based on speech features such as syllable duration, unit cost and acoustic parameters distance extracted from subjective evaluations. During speech synthesis, N-best unit selection results given by conventional unit selection algorithms are rescored by the trained synthetic error detector in order to select the optimal one. Preference test results show that the proposed method can effectively improve the naturalness of synthetic speech.
Hu Jianwei , Cai Yueming , Wang Lei
2016, 31(2):393-399.
Abstract:Different wireless channels may experience different kinds of channel fading due to the complexity of practical situations exposed to the impact of channel estimation error and cochannel interference. To coincide with the practical scenarios, the outage performance of the two-way relaying system is investigated under the Rayleigh fading environment and Rayleigh-Rice mixed fading environment, respectively. First, the system model and the system protocol are presented. Then, based on the expression of the signal-to-interference-plus-noise ratio at the receiver, the closed-form expressions for the outage probabilities are derived for Rayleigh fading environment and Rayleigh-Rice mixed fading environment, respectively. Finally, the optimum relay position selection is discussed considering the channel estimation error and the co-channel interference in the two-way relaying system, and the closed-form expression for the optimum relay position is derived . Simulation results verify the correctness of the outage probability and the optimum relay position analysis, and reveal that channel estimation error and the co-channel interference have great impact on the system performance. Furthermore,the performance declines visibly when the channel estimation quality order is less than 1, and becomes better when the fading coefficient increases.
Li Li , Chen Yu , Shu Feng , Yu Hai , Gui Linqing , Kang Qiju
2016, 31(2):400-406.
Abstract:A power allocation scheme based on generalized signal-to-leakage-and-noise ratio (SLNR) is proposed for cognitive radio network to address the problem of interference between cognitive network and primary network, therefore to guarantee quality of service of primary users (PUs) and improve perfor mance of secondary users (SUs). Beamforming matrix is introduced in the transmitter and the receiver of cognitive network to obtain a higher sum rate. Then a joint alternative iterative structure is developed to cascade transmit beamforming, power allocation and receive beamforming into an iteration loop. Simulation results show that the iteration loop converges fast and performs better than traditional beamforming scheme with equal power allocation in terms of the sum rate and the bit error rate.
Li Bohao, Zhang Lianhai, Zheng Yongjun
2016, 31(2):407-414.
Abstract:A study of acoustic segment models(ASMs) for unsupervised query-by-example spoken term detection is presented. Firsty, a Gaussian mixture model(GMM) is trained without any transcription information to label speech frames with Gaussian posteriorgram. Hierarchical agglomerative clustering is used to decompose the posterior features into acoustically exhibiting segments. A label is assigned to each result segment by k-means clustering, then posteriorgram is faciltitated to train ASMs. In query matching phase, Viterbi decode is proposed to represent query and test posteriorgrams as ASM sequences. Dynamic match lattice spotting based on minimum edit distance is used to locate possible occurrences of the query term. Experimental results show that the proposed method outperforms traditional GMM and ASMs tokenizers.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links