Volume 31,Issue 2,2016 Table of Contents

History and State of Art of Acoustic Event Detection

2016, 31(2):231-241.

Abstract (1648) HTML (0) PDF 552.86 K (1268) Comment (0) Favorites

Abstract:Acoustic event detection refers to the task of detecting each semantic segment in an audio stream and associating it with a classification label. Acoustic event detection is a fundamental technique for sound scene recognition and semantic understanding, and it is very promising in many application fields, such as the semantic understanding of the environmental sounds for a human-like robot, the context aware of sounds in the travelling environment for an unmanned vehicle. In this paper, the history of acoustic event detection is reviewed from the point of view of related fields and application requirements, meanwhile, the typical works of acoustic event detection is introduced, and the future research of acoustic event detection is analyzed. In the analysis of related fields, we focus on the researches of speech recognition, music processing based on computation, and sound processing based on auditory. In the application requirements, we introduce the works of context aware of sounds and multimedia information retrieval. Finally, the state of the art in acoustic event detection is analyzed, and its future research fields is predicted.

Research Progress and Outlook of Speech Processing Algorithms for Digital Hearing Aids

Zou Cairong , Liang Ruiyu , Xie Yue

2016, 31(2):242-251.

Abstract (880) HTML (0) PDF 561.92 K (1257) Comment (0) Favorites

Abstract:As the world populatian aging, hearing impairments become a high incidence chronic disease. Hearing aids is one of the most effective means of hearing intervention and hearing rehabilitation for presbycusis hearing patients. Various techniques of hearing aids have advanced significantly over the past decades, primarily thanks to the maturing of signal processing technology and electronic technology. Among these technologies, sound classfication, filter decomposition, noise suppression and echo cancellation are four basic algorithms for hearing aids. Based on deep understanding, we elaborate the algorithms in terms of aspects: the basic principles, the current research status, features and problems. In addition, by analyzing the current problems of hearing aids, three new research direction, auditory bionics, auditory cognition and selffitting hearing aids, are outlooked and briefly introduced.

Research Progress on Key Technologies of Audio Forensics

Bao Yongqiang , Liang Ruiyu , Cong Yun , Gao Chonghong，Wang Qinyun

2016, 31(2):252-259.

Abstract (1230) HTML (0) PDF 528.13 K (1931) Comment (0) Favorites

Abstract:The latest research progress in audio forensics is introduced with the emphasis on audio authenticity. First, the history of audio forensics research is reviewed. The classification of audio forensics is discussed. Then, the framework of audio forensics is designed. Several key technologies of audio forensics are summarized including audio active forensics technology,audio tamper technology based on electrical network frequency (ENF),audio tamper detection technology with different sampling rates and audio tamper detection technology with the same sampling rates under the passive power grid frequency components,the characteristic parameters of recording equipment,pattern recognition ,situation of database construction , recording environment identification and so on.Finally,the prospective of audio forensics technology is presented.

Research on Asymmetric Model of Vocal Fold in Pathology Voice

Tao Zhi , Zeng Xiaoliang , Gu Lingling , Zhang Xiaojun , Wu Di , Xue Longji

2016, 31(2):260-267.

Abstract (536) HTML (0) PDF 1.39 M (1104) Comment (0) Favorites

Abstract:To provide the basis for parameter selection of pathological voice recognition, an asymmetric modeling method is proposed to simulate diseased vocal fold. According to the layered structure and tissue properties of the vocal fold, a mechanical model is set up to produce the voice source with the straight airflow expelled by lungs. An inversion procedure adopting genetic particle swarm optimization based on quasi-Newton method (GPSO-QN) is developed to adjust the parameters of the vocal fold model and to reproduce the targeted voice source. Experimental results show that the vocal fold mechanical model can produce the voice source that is consistent with the target. In addition, the optimized parameter sets show that the asymmetries of two opposing vocal folds result in the pathology voice.

Angle Estimation for Electromagnetic Vector Sensor Array via Compressed Sensing-Parallel Factor

Zhang Xiaofei , Li Shu , Zheng Wang

2016, 31(2):268-275.

Abstract (768) HTML (0) PDF 487.69 K (1131) Comment (0) Favorites

Abstract:We combine the parallel factor framework with the compressed sensing theory to solve the problem of the direction of arrival estimation for the electromagnetic vector sensor array. We first rearrange the received data matrix as a parallel factor model, and compress it to a smaller one based on the compressed sensing theory. Then the trilinear alternating least square algorithm is exploited to decompose the compressed parallel factor model. Finally, the angle estimation is obtained with sparsity. Owing to compression, the computational complexity of the algorithm is lower than that of the conventional parallel factor model-based algorithm, and more storage memory is saved. The algorithm needs no peak searching and is applicable to both uniform and non-uniform linear array. Moreover, the angle estimation performance of the proposed algorithm is better than that of the ESPRIT algorithm and close to that of the conventional parallel factor model-based algorithm, which can be verified by various simulations.

High-Performance Beamforming Algorithm for Full-Duplex MIMO Relay System

Shu Feng , Cui Yudi , Qian Zhenyu , Lu Zaoyu , Zhou Ye , Hu Jinsong , Liu Miao

2016, 31(2):276-281.

Abstract (518) HTML (0) PDF 491.68 K (927) Comment (0) Favorites

Abstract:Compared to half-duplex relay systems, full-duplex relay systems can greatly improve the spectral efficiency. However, the information leakage between transmitter and receiver of relay degrade the performance of full-duplex system. To deal with the self-interference and enhance the system achievable rate in full-duplex MIMO relay system with decoded-and-forward strategy, an iterative beamforming structure at the relay is proposed. In this structure, the received and transmit beamforming at the relay are optimized with minimum mean square error (MMSE) criterion over both uplink and downlink (called MMSE plus MMSE). And then the two beamforming matrices are combined for optimal solutions. Simulation results show that the proposed MMSE plus MMSE performs better than existing schemes like null space projection (NSP) and maximum signal-to-interference ratio (Max-SIR). For example, The proposed algorithm harvests about 0.8 bps/Hz gain over the Max-SIR when SNR is high. At the BER=10-3, the proposed scheme harvests about 1.5 dB SNR gain over the Max-SIR.

Underwater Echo Signal Processing Method Based on Sparse Decomposition

Sun Tongjing， He Jinpeng， Gu Yu

2016, 31(2):282-288.

Abstract (607) HTML (0) PDF 757.17 K (932) Comment (0) Favorites

Abstract:For ultra-low-SNR underwater weak signal processing problem, an underwater echo signal processing method is presented based on the theory of sparse decomposition and the combined matched pursuit method. The focus is how to integrate the prior information, such as the incident signal and the echo model, into the sparse dictionary (atoms). First, the highlight model of underwater echo signal is established, the relation between the echo model and incident signal is obtained, and the over complete dictionary fitting for echo signal characteristics is structured by discretizing, energy normalizing and shifting the known transmitting signal. And then, the sparse decomposition of underwater echo signal is conducted based on the matched pursuit method, and the processing results are compared and analyzed with the commonly used matched filter methods. The simulation results show that the proposed method can accurately reconstruct the original echo signal, and has obvious advantage in processing underwater echo signal with ultra-low SNR.

Noise-Robust Interferometer Direction Finding Method Based on SNR Estimation and Vector Averaging

Guo Dongliang , Huang Chao , Li Zhonghua , Zhang Tiejun

2016, 31(2):289-295.

Abstract (553) HTML (0) PDF 656.75 K (984) Comment (0) Favorites

Abstract:Aiming at the low accuracy of interferometer direction finding (DF) with low signal-to-noise ratio (SNR), a new adaptive direction finding method is proposed based on SNR estimation and phase-difference vector averaging. This method can enhance the accuracy and stability of the phase-difference measurement through multiple measuring and averaging the phase-difference complex vectors, which can improve the performance of the direction finding. The proposed adaptive criterion can estimate SNR of the arrival signal and quickly determine the required sample size, therefore adjust the sample size adaptively at different SNRs and obtain the stable accuracy of DF. The effect of SNR threshold on the performance of the proposed method is analyzed. The method holds a low computational complexity and has little influence on real-time of DF.Theoretical analysis and simulation results reveal that the method can achieve high accuracy even under very low SNR conditions, and improve the DF performance obviously.

Method of Extracting Maximal-length Evaluation Phrase and Appraisal Expression

Liu Quanchao , Huang Heyan , Wang Yashen , Feng Chong

2016, 31(2):296-306.

Abstract (542) HTML (0) PDF 719.88 K (1036) Comment (0) Favorites

Abstract:An algorithm based on statistics and rules is proposed to automatically identify maximal-length evaluation phrase. The identification of evaluation phrase is taken as sequence tagging problem. Then conditional random field model is used to recognize evaluation phrase with simple structure. Therefore, rule database is established and maximal-length evaluation phrase with complex structure is identified automatically. F-measure value reaches 72.38%. Based on the above work, rule base is constructed for extracting opinon target and appraisal expression. Rule-based extracting appraisal expression is proposed to automatically extract opinion target and maximal-length evaluation phrase. Experiments were conducted at netease car portal and got a higher precision.

Real-Time Measurement for Experience Quality of Underwater Acoustic Voice Communication

Yuan Fei , Chen Weiling , Li Ye , Cheng En

2016, 31(2):307-314.

Abstract (451) HTML (0) PDF 990.59 K (1060) Comment (0) Favorites

Abstract:The real-time measurement of quality of the underwater acoustic voice communication is crucial to communication quality. Real-time measurements can help to adjust voice modulation parameters timely and improve the adaptive ability of link. An objective assessment model for voice quality is proposed based on parameter extraction, based on the underwater acoustic channel characteristics. Three feature parameters of voice are extracted: Mel-frequency cepstrum coefficient(MFCC), linear predictive cepstrum coefficient(LPCC ) and log spectral deviation(LSD). The three parameters form the weight spectral distortion evaluation. the mapping relationship of distortion evaluation and receive voice quality of perceptual evaluation of speech quality -mean opinion score (PESQ-MOS) is used to conduct quantization. Dynamic Mel-frequency cepstrum coefficient(DMFCC) spectral distortion evaluation is also introduced as regulatory factor, which improves the adapt-ability. The results of simulation and sea test show that the measured MOS via the assessment model is close to PESQ-MOS, which indicated the model has practical value.

Algorithm of Recurring Concept Drift Based on Main Feature Extraction

Feng Chao , Wen Yimin， Tang Lingbing

2016, 31(2):315-324.

Abstract (850) HTML (0) PDF 1.56 M (1362) Comment (0) Favorites

Abstract:Recurring concept drift is one of the sub-types of concept drift. In recurring concept drift detection, it is very important to represent concepts and select the most appropriate classifier to classify. We propose an algorithm, conceptual clustering and prediction through main feature extraction (MFCCP), for classifying data stream with recurring concept drifts. MFCCP can recognize recurring concepts by computing the differences of main features and impact factors of different batches of samples. It maintains a classifier for each concept and monitors the classification accuracy to select classifier according to hoeffding inequality in order to enhance the ability of adapting to concept drift. The experimental results over the three datasets illustrate that MFCCP achieves better classification accuracy, adapts faster to concept drift, and detects concept drift more accurately than the other four algorithms on the data streams with recurring concept drifts, and therefore, MFCCP is apt to classify data stream without recurring concept drift.

Speech Emotion Recognition Using Sparse Feature Transfer

Song Peng，Jin Yun，Zha Cheng，Zhao Li

2016, 31(2):325-330.

Abstract (500) HTML (0) PDF 669.37 K (965) Comment (0) Favorites

Abstract:In speech emotion recognition system, recognition rates will drop drastically when the training and the testing utterances are from different corpora. To solve this problem, a novel sparse feature transfer approach is proposed. By employing sparse coding algorithm, the common sparse feature representation of emotion features from different corpora is obtained. Meanwhile, the maximum mean discrepancy (MMD) algorithm is introduced to measure the distance between different distributions, and is used as the regularization term for the objective function of sparse coding. Finally, the robust sparse features are achieved for recognition. Experimental results show that, compared to traditional methods, the proposed approach can significantly improve the recognition rates for cross databases.

Discriminative Criterion Based Bottleneck Feature and Its Application in LVCSR

Liu Diyuan , Guo Wu

2016, 31(2):331-337.

Abstract (549) HTML (0) PDF 446.99 K (880) Comment (0) Favorites

Abstract:Bottleneck (BN) features based on the middle layer of deep neural network have been widly applicated to large vocabulary continuous speech recognition (LVCSR), because they can use the traditional Gaussian mixture density hidden Markov model (GMM-HMM) for acoustic modeling. In order to extract discriminative bottleneck features, the parameters of the BN feature extractor and GM M-HMM are optimized jointly by using the minimum phone error (MPE) criterion after training the GMM-HMM using the conventional BN features. Different from other discriminative training method, large batches instead of mini-batch in conventional neural network optimization are used to obtain the statistics, which accelerates training speed. Experiments demonstrate that the proposed bottleneck feature extractor can outperform the traditional methods with 9% relative word error reduction.

SNR Estimation for OFDM Signal Based on Guard Interval

Zhang Xinran , Dai Yuehua , Zhang Mengbo , Yang Xiaoj ing

2016, 31(2):338-346.

Abstract (632) HTML (0) PDF 612.78 K (1016) Comment (0) Favorites

Abstract:Since SNR is playing an important role in signal processing for OFDM, a kind of SNR estimation is proposed based on the guard interval. The channel model and the guard interval are analyzed. Firstly, two OFDM signals with different guard intervals are discriminated by using correlation function. Then, according to the structure characteristics of two different guard intervals, two kinds of SNR estimations are put forward. Finally, the SNR estimation is completed by combining the discriminant results with estimation algorithm. The simulation results show that the algorithm does not need auxiliary data and has good performance in different SNR conditions.

N-best Rescoring Algorithm Based on Recurrent Neural Network Language Model

Zhang Jian , Qu Dan , Li Zhen

2016, 31(2):347-354.

Abstract (707) HTML (0) PDF 717.95 K (1091) Comment (0) Favorites

Abstract:Recurrent neural network language model (RNNLM) is an important method in statistical language models because it can tackle the data sparseness problem and contain a longer distance constraints. However, it lacks practicability because the lattice has to expand too many times and explode the sea rch space. Therefore, a N-best rescoring algorithm is proposed which uses the RNNLM to rerank the recognition results and optimize the decoding process. Experimental results show that the proposed method can effectively reduce the word error rate of the speech recognition system.

Improvement of Crosstalk Cancellation for Stereo Reproduction System Based on Two Loudspeakers

Zhu Tianyi , Lu Jing , Chen Kai

2016, 31(2):355-361.

Abstract (609) HTML (0) PDF 646.54 K (1044) Comment (0) Favorites

Abstract:One of the most important functions of the two-loudspeaker system is to realize stereo playback in a specific region. Through sending the ipsilateral channel of stereo signals to the corresponding ear of the listener. The fundamental challenge is to cancel the crosstalk of signals from two loudspeakers. Usually, crosstalk cancellation filter is employed based on the inverse of the transfer matrix from sources to ears.Nevertheless, common used crosstalk cancellation will cause severe spectral coloration which brings many negative effects, e.g. weakening system robustness, shrinking sweet spot area and losing considerable dynamic range. Therefore， two different regularization methods based on measured transfer functions are investigated. Experiments are carried out to compare the integrated performance of the spectral coloration and the crosstalk cancellation performance. Interaural time difference (ITD) is also utilized to validate the efficacy of the improved stereo sound reproduction system.

Whispered Speaker Identification Based on Factor Analysis and Feature Mapping

Zhang Qingfang , Zhao Heming , Gong Chenghui

2016, 31(2):362-369.

Abstract (645) HTML (0) PDF 1.38 M (1132) Comment (0) Favorites

Abstract:Aiming at the mismatch between training speech and test speech from different speaking manners, a kind of feature processing algorithm is proposed based on joint factor analysis and feature mapping. The speaking mode information is extracted by joint factor analysis algorithm, then the speaking mode factor and space are optimized. Before training and test，the feature is mapped by speaking mode inf ormation to reduce the speaking mode effects. The experimental results show that the proposed algorithm can effectively extract the speaker information of the training speech, and improve the recognition rate of whispered speaker recognition system.

Multi-User HF Opportunistic Spectrum Access Based on Spectrum Aggregation

Xu Chenglong , Cheng Yunpeng , Dong Wenbin , Sun Hao

2016, 31(2):370-376.

Abstract (494) HTML (0) PDF 1.24 M (1118) Comment (0) Favorites

Abstract:The channel exploration problem is analyzed for the distributed HF opportunistic spectrum access (OSA) system. Due to the scarcity of the spectrum resources, applying the cognitive radio technology to the HF communication system has called extensive attentions.Multiple secondary users (SUs) sequentially sense multiple licensed channels. Then the system decides whether the channels can be used based on the sensing results. Thus, the data can be transmitted utilizing available spectrum bands by using spectrum aggregation technology. However, the ability of spectrum aggregation is constrained by hardware limitations. Therefore, a dynamic stopping approach is proposed considering the interaction among the SUs under the constraint of the hardware limitations. In the proposed stopping approach, the channel-free probability can vary with the process of channel exploration and the SUs can periodically release the previously sensed channels. Moreover, simulation results show that the throughput performance of the HF communication system can be effectively improved by the proposed dynamic stopping approach.

Non-uniform Array Vector-APES Beamforming Algorithm Based on Vector-Hydrophone

Zhang Chaoran， Cheng Jinfang， Xiao Dawei

2016, 31(2):377-384.

Abstract (742) HTML (0) PDF 947.51 K (973) Comment (0) Favorites

Abstract:Since conventional amplitude and phase estimation(APES) algorithm cannot be applied to a non-uniform array, a vector APES(VAPES) based on vector-hydrophone is proposed. The phase difference between acoustic pressure and acoustic particle velocity received by vector-hydrophone is irrelative to the locations of sensors, thus the pressure and velocity channel can be used as two sub-arrays, and applied to non-uniform array. The array gain and its stability are analyzed. The simulations indicate that compared to conventional APES algorithm, the method can be applied to non-uniform array, and its array gain is higher; compared to minimum variance distortionless response(MVDR) algorithm, the method is more robust, can handle the coherent situation, and abtain more accrate signal power. The practical data verifies the validity of VAPES.

Unit Selection Speech Synthesis Integrating Automatic Error Detection

Sun Xiaohui , Ling Zhenhua , Dai Lirong

2016, 31(2):385-392.

Abstract (677) HTML (0) PDF 852.16 K (930) Comment (0) Favorites

Abstract:A unit selection speech synthesis method is presented using an automatic error detection. It aims to design a unit selection criterion consistent with the subjective perception of listeners so as to improve the naturalness of synthetic speech. Firstly, crowdsourcing platform, instead of linguistics experts in the traditional approach is facilitated to collect mass perceptual data efficiently. Then, a synthetic error detector based on a support vector machine(SVM) classifier is constructed based on speech features such as syllable duration, unit cost and acoustic parameters distance extracted from subjective evaluations. During speech synthesis, N-best unit selection results given by conventional unit selection algorithms are rescored by the trained synthetic error detector in order to select the optimal one. Preference test results show that the proposed method can effectively improve the naturalness of synthetic speech.

Performance Analysis of Two-Way AF Relaying with Channel Estimation Error and Co-channel Interference

Hu Jianwei , Cai Yueming , Wang Lei

2016, 31(2):393-399.

Abstract (773) HTML (0) PDF 527.11 K (1102) Comment (0) Favorites

Abstract:Different wireless channels may experience different kinds of channel fading due to the complexity of practical situations exposed to the impact of channel estimation error and cochannel interference. To coincide with the practical scenarios, the outage performance of the two-way relaying system is investigated under the Rayleigh fading environment and Rayleigh-Rice mixed fading environment, respectively. First, the system model and the system protocol are presented. Then, based on the expression of the signal-to-interference-plus-noise ratio at the receiver, the closed-form expressions for the outage probabilities are derived for Rayleigh fading environment and Rayleigh-Rice mixed fading environment, respectively. Finally, the optimum relay position selection is discussed considering the channel estimation error and the co-channel interference in the two-way relaying system, and the closed-form expression for the optimum relay position is derived . Simulation results verify the correctness of the outage probability and the optimum relay position analysis, and reveal that channel estimation error and the co-channel interference have great impact on the system performance. Furthermore,the performance declines visibly when the channel estimation quality order is less than 1, and becomes better when the fading coefficient increases.

Power Allocation Based Iterative Structure of Joint Rx&Tx Beamforming for Cognitive Networks

Li Li , Chen Yu , Shu Feng , Yu Hai , Gui Linqing , Kang Qiju

2016, 31(2):400-406.

Abstract (606) HTML (0) PDF 516.77 K (1060) Comment (0) Favorites

Abstract:A power allocation scheme based on generalized signal-to-leakage-and-noise ratio (SLNR) is proposed for cognitive radio network to address the problem of interference between cognitive network and primary network, therefore to guarantee quality of service of primary users (PUs) and improve perfor mance of secondary users (SUs). Beamforming matrix is introduced in the transmitter and the receiver of cognitive network to obtain a higher sum rate. Then a joint alternative iterative structure is developed to cascade transmit beamforming, power allocation and receive beamforming into an iteration loop. Simulation results show that the iteration loop converges fast and performs better than traditional beamforming scheme with equal power allocation in terms of the sum rate and the bit error rate.

Unsupervised Query-by-Example Spoken Term Detection Based on Acoustic Segment Models

Li Bohao， Zhang Lianhai， Zheng Yongjun

2016, 31(2):407-414.

Abstract (528) HTML (0) PDF 711.43 K (1022) Comment (0) Favorites

Abstract:A study of acoustic segment models(ASMs) for unsupervised query-by-example spoken term detection is presented. Firsty, a Gaussian mixture model(GMM) is trained without any transcription information to label speech frames with Gaussian posteriorgram. Hierarchical agglomerative clustering is used to decompose the posterior features into acoustically exhibiting segments. A label is assigned to each result segment by k-means clustering, then posteriorgram is faciltitated to train ASMs. In query matching phase, Viterbi decode is proposed to represent query and test posteriorgrams as ASM sequences. Dynamic match lattice spotting based on minimum edit distance is used to locate possible occurrences of the query term. Experimental results show that the proposed method outperforms traditional GMM and ASMs tokenizers.

For Authors

Quick search

Volume retrieval

External Links