Ben De , Zhang Gong , Tan Xiaoyang , Liu Yandong , Huang Zhiqiu
2015, 30(2):239-243. DOI: 10.16337/j.1004-9037.2015.02.001
Abstract:The DAPJ (Journal of Data Acquisition and Processing)has focused on publishing papers on the theory and applications of signal, data and information for 30 years, ranging from A/D converter in the early days to modern DSP and FPGA processing chip, from fractal technology and theory of wavelet to speech recognition and image processing, from neural network to deep learning, and from wireless communication to optical communication. The DAPJ,like a mirror, always reflects the latest development in the field of signal, data and information processing. This paper makes a brief summary about the history and current research of signal, data and information processing, and looks forward to the future developments of DAPJ, which will serve as the guildlines for the contents to be published on this journal.
Liu Yandong, Chen Jun, Zhang Huangqun
2015, 30(2):244-251. DOI: 10.16337/j.1004-9037.2015.02.002
Abstract:Three development stages of "Journal of Data Acquisition and Processing", namely, start-up phase, adjustment phase and stable development phase are reviewed. The development of published paper amount,propotion of articles with scientific funds, subject distribution, impact factor and SJR indicators etc. in 30 years is introduced. The journal style and idea for 30 years are also summarized. The new idea for promoting "Journal of Data Acquisition and Processing" is explored from the following aspects: the enhancement of planning topics, the invitation article improvement in quality and quantity, the review process optimization and the publishing process standardization.
Zhao Li, Zhang Xinran, Liang Ruiyu, Wang Qingyun
2015, 30(2):252-265. DOI: 10.16337/j.1004-9037.2015.02.003
Abstract:Digital hearing aids are the main ways to treat hearing and speech residual diseases. In recent years, as associated digital signal processing (DSP) technologies are widely researched, the control, modification and enhancement of the voice signals and filtering techniques, which are applied to digital hearing aids, are significantly developed. From the aspects of loudness compensation, noise suppression and echo cancellation, recent studies of the algorithms and implementations at home and abroad, which are reviewed according to the current structures of the mainstream digital hearing aids systems. After that, According to the problems and different solutions for different strategies,corresponding comparative analysis and evaluations are carried out. Finally, unresolved issues of algorithm program are discussed at the present stage for digital hearing aids, then the future direction and development of related technologies are predicted with the prospect.
He Qianhua , Pan Weiqiang , Hu Yongjian , Zhu Zhengyu , Li Yanxiong , Feng Xiaohui
2015, 30(2):266-274. DOI: 10.16337/j.1004-9037.2015.02.004
Abstract:Biometric authentication system has been widely used today, including fingerprint and face recognition systems. As for non-spot user authentication, compared with other biometric traits, speech has some advantages, such as high acceptability, low demand on quipments, flexible access anywhere and anytime, low computation complexity and suitability for remote authentication, which promotes the applications of speech based authentication system. However, playback becomes a general risk because of its easy carring out without any training, and the availability of cheap high quality audio recorder and speaker. This paper gives a through review to the methods for playback detection. It shows that the research is at the start point, but the demands are increasing.
2015, 30(2):275-288. DOI: 10.16337/j.1004-9037.2015.02.005
Abstract:Compressed sensing technology, especially the compressed speech sensing technology has gradually become the research hotspot in signal processing. The currently key issues of compressed speech sensing include the construction of the sparse decomposition matrix, the selection of the measurement matrix and the design of the reconstruction algorithm for speech signal. The important representatives of sparse decomposition matrix are the orthogonal basis, linear prediction matrix based on speech characteristics and overcomplete dictionary. For measurement matrix, the performance of reconstructed speech signals based on random measurement matrix is analyzed. For reconstruction algorithm, the robust reconstruction algorithms with noisy measurement or noisy speech signal are researched. In the paper, the above three kinds of compressed speech sensing technologies are introduced and compared, and the main applications of compressed speech sensing are also provided. Finally, the possible future research points of compressed speech sensing are discussed.
Tao Chao , Yin Jie , Liu Xiaojun
2015, 30(2):289-298. DOI: 10.16337/j.1004-9037.2015.02.006
Abstract:Photoacoustic imaging (PAI) is a state of art biomedicine imaging technique in the 21st century, for it inherits the high resolution of ultrasonography in deep tissue and the ability of optical imaging in biochemical information detection simultaneously. The recent progresses of PAI in biomedicine are reviewed. The basic principles and two major implementations of PAI, photoacoustic tomography and photoacoustic microscopy are introduced. Then the capability of multi wavelength PAI in evaluating chemical components in tissues, and the feasibility of PA spectral analyses in evaluating histological microstructures in biological tissue are demonstrated, at the same time, several analysis methods and clinical applications of PAI in biomedical imaging are discussed. Finally, the advantages and potential applications of PAI in biology and medicine are sunmarized.
Zou Yuexian , Guo Yifan , Zheng Weiqiao
2015, 30(2):299-306. DOI: 10.16337/j.1004-9037.2015.02.007
Abstract:A robust high resolution speaker source direction of arrival (DOA) estimation method is proposed based on one acoustic vector sensor (AVS) and spatial sparse representation. Under the reverberation and additive noise conditions, the array covariance vector mode l of the received signals by AVS is first derived. Then the sparse representation model of the covariance vector is developed. Finally the robust DOA estimation is obtained by recovering the sparse vector. A large number of simulation experiments are carried out under different reverberation and additive noise conditions, and also DOA estimation experiments in the actual environment. The results show that the proposed speaker DOA estimation is able to achieve root mean square error(RMSE) of below 1° when SNR is from 5 dB to 30 dB and 2—10° error in the real scenario.
Zhang Xiongwei , Wu Haijia , Zhang Liangliang , Zou Xia
2015, 30(2):307-318. DOI: 10.16337/j.1004-9037.2015.02.008
Abstract:In order to improve the reconstruction performance of deep models, reconstruction error constraint based on cross entropy is added to traditional contrastive divergence (CD) algorithm. The improved algorithm is used to train reconstructive deep auto encoder(RDAE), which is used to replace the vector quantization method for LSF in MELP speech coding algorithm. Experimental results show that the improved CD algorithm improves the deep model gain reconstruction performance while costing some likelihood of the model. When the node number of the hidden layer of RDAE is set to 19 bit, the indicators, which include the weighted LSF distance, the performance of reconstructed speech, and the spectrum distortion, perform better in both training set and testing set by the proposed method than by the vector quantization method at 25 bit. That is to say, the coding bitrate of the MELP coder is reduced from 2.5 kb/s to 2.1 kb/s. The reduction rate of the coding bitrate is up to 12.5%, while the speech quality remains.
Ta Dean, Li Ying, Liu Chengcheng
2015, 30(2):319-327. DOI: 10.16337/j.1004-9037.2015.02.009
Abstract:Ultrasonic backscatter signal is quite sensitive to the microstructure of cancellous bone. Trabecular bone spacing (TbSp) is an important parameter for characterizing bone microstructures. In order to acquire TbSp accurately from the ultrasonic backscatter signal of the cancellous bone, a TbSp estimation method is proposed using a combination of the Hilbert transform and fundamental frequency estimation (HFE) method. The HFE results from the cancellous bone in vitro are compared with the TbSp obtained from μ-CT. The HFE results are accurate (estimation error<3%) and stable (standard variation<4%) at higher frequencies (5 MHz and 10 MHz), and more accurate when standard TbSp is large, and have a high correlation (r2=0.75—0.99, p<0.01, n=16) with the standard TbSp at different frequencies. It shows that the HFE method is accurate and stable for TbSp estimation. The HFE method is vertified to characterize the cancellous bone TbSp.
2015, 30(2):328-335. DOI: 10.16337/j.1004-9037.2015.02.010
Abstract:Focusing on the human computer interaction(HCI) real scenarios, a method for acoustic source three-dimensional localization and speech enhancement is proposed. Combining with the receiver model of six microphone parallel uniform linear array (ULA), the target acoustic source is located in three dimensions based on the method of time-difference of arrival (TDOA) estimation improved from generalized cross correlation (GCC).On the basis of positioning the target acoustic source, the target speech enhancement is implemented by the method of delay-and-sum beamforming (DSBF). Simulation results show that the method can position the target acoustic source accurately and enhance the target speech effectively. In the context of SNR greater than 1.5 dB,the positioning accuracy of the target acoustic source can reach more than 98% and improvement of SNR can reach 5 dB with less computation cost and easier hardware implementation.
Zhang Linghua , Yao Shaoqin , Xie Weichao
2015, 30(2):336-343. DOI: 10.16337/j.1004-9037.2015.02.011
Abstract:Voice conversion is a technique for changing the personality characteristics of a source speaker′s voice into the target speaker′s, while preserving the original semantic information. An adaptive particle swarm optimization (PSO) based method is proposed to model voice features by training the radial basis function (RBF) neural network in order to capture the spectral envelope mapping function between speakers. In addition, the pitch transformation is captured by modeling pitch with the joint spectral feature paramet ers in RBF neural network, which makes the converted pitch contain more target details. Finally, the performance of the improved voice conversion system is tested by subjective and objective method respectively. Experimental results show that the performance of the proposed method is better than that of the Gaussian mixture model (GMM) based system, especially for the male to female conversion.
Guo Yecai , Song Gong Kunkun , Wu Lifu , Sun Xinyu , Wang Lihua
2015, 30(2):344-349. DOI: 10.16337/j.1004-9037.2015.02.012
Abstract:In view of the problems of inaccurate localization and ambiguous orientation of traditional cross power spectrum(CS) in direction of arrival estimation, an improved algorithm of sound source localization using a circular microphone array is proposed and demonstrated by practical experiments. In the proposed algorithm, the twelve-element circular microphone array is firstly designed, then the phase rotation factor is defined by combining the time delay with the phase information of speech signals received by each pair of microphone and introduced into the traditional cross power spectrum to define novel circular integrated cross power spectrum(CICS), finally the sound source orientation can be estimated by CICS algorithm. The simulation and practical experiment results show that the proposed algorithm can greatly improve the estimated accuracy of source orientation compared with the traditional cross spectrum algorithm.
Lü Xiaoqi , Shi Jing , Ren Xiaoying , Zhang Chuanting
2015, 30(2):350-358. DOI: 10.16337/j.1004-9037.2015.02.013
Abstract:Aiming at the complicated internal structure of abdomen and the mutual infiltration between organizations, an improvement of level set method is used to extract the 3D abdomen magnetic resonance image segmentation of liver area. Level set method is easy to leak in weak edges. Therefore, this problem is solved by preprocessing threshold segmentation. Certain interference in further processing is reduced and the edges is more clear through remapping the intensity of pre segmentation results, and then the improved level set method is used for further segmentation. Experiments proves that the method is superior to traditional level set methods. The leaking problem of weak edges is reduced, and the desired segmentation result is obtained. The applicability of level set is expended.
Xu Chundong, Zhan Ge, Ying Dongwen, Li Junfeng, Yan Yonghong
2015, 30(2):359-364. DOI: 10.16337/j.1004-9037.2015.02.014
Abstract:Noise estimation is a fundamental part of speech enhancement. Most traditional methods are heuristic which can not enable the optimal estimation. An unsupervised noise power estimation is presented based on maximum likelihood. A log power statistical model is constructed using hidden Markov model (HMM) in each subband. This model comprises speech and nonspeech Gauss components, and the mean value of nonspeech Gauss component is the estimation of noise power. Moreover, speech may be long term absent, some constraints are introduced to this model for stability. The experiments validate that the proposed method can obtain the maximum likelihood noise estimation and outperforms conventional heuristic methods.
Maimaitiaili Tuerxun , Dai Lirong
2015, 30(2):365-371. DOI: 10.16337/j.1004-9037.2015.02.015
Abstract:Two methods are proposed by employing deep neural network for Uyghur large vocabulary continuous speech recognition : Hybrid architecture models are established with deep neural network (DNN) and hidden Markov model (HMM) for replacing Gaussian mixture model (GMM) in GMM-HMM to compute the state emission probabilities; DNN is facilitated as a front-end acoustic feature extractor to extract bottleneck feature(BN) to provide more effective acoustic features for the traditional GMM-HMM modeling framework(BN-GMM-HMM). The experimental results show that DNN-HMM and BN-GMM-HMM reduce word error rate(WER) by 8.84% and 5.86% compared with the GMM-HMM baseline system, which demonstrates that the two methods accomplish significant performance improvements.
He Saijuan , Chen Huawei , Yin Mingjie , Ding Shaowei
2015, 30(2):372-381. DOI: 10.16337/j.1004-9037.2015.02.016
Abstract:Differential microphone arrays have become a promising method to address multiple sound source localization. Among the differential microphone arrays, the existing typical method is the histogram approach, which utilizes the time-frequency sparseness characteristic of speech signals. A direction-finding algorithm for multiple sound sources by the short-time average complex sound intensity estimation is proposed based on time-frequency masking and fuzzy clustering. The frequency bounds for time-frequency masking under various array sizes are also discussed. The advantages of the proposed method are that it has closed-form solution, superior to the histogram approach, and also less sensitive to array size. Based on the idea of time-frequency masking, an improved histogram approach is also presented. The performance of the proposed methods is verified by simulation results under noisy and reverberant environment.
2015, 30(2):382-389. DOI: 10.16337/j.1004-9037.2015.02.017
Abstract:An improved affine projection algorithm for hearing aids is proposed to increase the convergence speed and reduce the misalignmengt. The algorithm establishes a new nonlinear function between the step-size and the estimation error. The step-size of the proposed algorithm is adjusted automatically according to the change of the estimation error, which leads to high convergence speed and low misalignment. In order to improve the accuracy of estimating the power of error, a rule for selecting forgetting factor is proposed. Mathematical analysis provides theoretical basis for its outstanding capacity of converging to the objective system. The simulation results show that the proposed algorithm achieves faster convergence speed and lower final misalignment compared with traditional adaptive methods and fixed step-size affine projection algorithm.
Lu Lihua,Zhang Lianhai, Chen Qi
2015, 30(2):390-398. DOI: 10.16337/j.1004-9037.2015.02.018
Abstract:An indexing method based on confusion network instead of Lattice is proposed in the weighted finite-state transducer framework (WFST) to improve the efficiency of the spoken term detection system. In the indexing stage, firstly confusion networks are extracted from Lattices and transformed to automatons; Then, timed factor transducers are constructed with these automatons; Finally, the index is achieved by taking the union of the factor transducers and optimizing the union. In the searching stage, the queries are transformed to automatons and then composed with the index. After optimization, the automaton representing the searching results is obtained. Experimental results show that compared with the WFST index based on Lattice, the confusion network-based index has smaller index size, faster searching speed and better performance when ensuring the retrieval accuracy.
Pan Haiqi , Yang Zhen , Xu Longting , Zhu Junhua
2015, 30(2):399-407. DOI: 10.16337/j.1004-9037.2015.02.019
Abstract:A solution is proposed to deal with the problem that ″less number of features cannot coexist with higher recognition rate″ in the traditional theory of speaker recognition. Ladder observation matrix projection is used to change the traditional Mel-frequency cepstral coefficient (MFCC) parameters based on compressed sensing theory, presenting a new recognition parameters named compressed sensing MFCC (CS-MFCC) parameters. These parameters make storage capacity decrease to less than 1/n of the original, here n is the compression ratio of the line ladder matrix, and also greatly increase the robustness of the system. Furthermore simulation results prove that when n is 4, the recognition rate increases to 96% above.
You Xing,Sun Qi,Liu Jianpo,Tong Guanjun,Yuan Xiaobin g
2015, 30(2):408-416. DOI: 10.16337/j.1004-9037.2015.02.020
Abstract:There are lots of constrains for video steganography in sensor networks. Based on the analysis of previous similar researches, a novel video steganography algorithm is proposed to improve steganography efficiency and reduce energy consumptions. Considering the characteristics of H.264/AVC video stream in multimedia sensor network, this algorithm promotes a novel classification strategy having the advantages of error drift prevention and covering code adjustment with simplified calculation complexity. Furthermore, for different types of data features, we introduce and design a covering coding method satisfied the application limited conditions of multimedia sensor networks. Both data analysis and experimental results show that the proposed algorithm is robust against statistical analysis attacks and it also minimizes embedding error. With less communicating redundancy, the algorithm doubles embedding efficiency and decreases computation complexity from exponential to linear.
He Jie , Feng Dazheng , Meng Chao , Ma Lun
2015, 30(2):417-423. DOI: 10.16337/j.1004-9037.2015.02.021
Abstract:A two-stage dimension-reduced space-time adaptive processing (STAP) based on correlation matrix is proposed for clutter suppression and moving target detection. Firstly, to reduce the degrees of freedom of the clutter, the full dimensional space-time received data is pre-filtered. Secondly, the correlation matrix of output data after preprocessing is divided into submatrices, and further reduction of both the computational complexity and the training requirement is achieved by optimizing two low-dimensional weight vectors . Theoretical analysis and computer simulation results illustrate that the proposed method can obtain fast convergence and better clutter suppression performance. The method shows good robust performance with a small computational cost when there are clutter fluctuation and random amplitude and phase errors in array elements. Experiment results by using measured data demonstrate effectiveness and robustness of the proposed method.
Liu Huaming , Bi Xuehui , Wang Weilan, Wang Xiuyou
2015, 30(2):424-433. DOI: 10.16337/j.1004-9037.2015.02.022
Abstract:To repair Thangka relic by digital technology, the damaged regions, which Thangka are segmented, must be solved firstly. The algorithm based on maximum entropy and local priority is proposed to segment damaged regions of rip Thangka, considering the extent of damaged features and color in contrast to neighboring regions. Firstly, the gray image of Thangka is segmeted using maximum entropy algorithm. The false damaged regions are removed and the seriously damaged regions are obtained. Secondly, the gray image of Thangka is segmeted by local priority algorithm, the false damaged regions are removed and the transition of the damaged regions are achieved. Finally, the serious damaged regions is combined with the transition of the damaged regions and the ultimate segmentation result is gained. Experimental results demonstrate that the proposed algorithm can not only segment damaged Thangka effectively, but also segment damaged murals etc, which shows the effectiveness and robustness of the proposed algorithm.
2015, 30(2):434-440. DOI: 10.16337/j.1004-9037.2015.02.023
Abstract:Standard C-support vector machine (C-SVM) algorithm has certain limitation when dealing with many factual pattern classification problems, especially in the extreme case such as the recognition error cost loss in great difference. A kind of generalized C-SVM algorithm is introduced. By estimating the cost of the recognition error, optimal separating hyperplane can be translated into the low cost passage, and leaves more space for the high lost cost to increase recognition rate, thus reducing the damage of recognition error. The new method improves the applicability of C-SVM and sample recognition correct rate. When applied to radar high resolution range profile′s recognition, experimental results show that the proposed method can achieve better recognition effect than the traditional method.
Chang Huijun, Shan Hong, Man Yi, Mao Mao
2015, 30(2):441-451. DOI: 10.16337/j.1004-9037.2015.02.024
Abstract:A decomposition method of user packet time-stamp sequence is proposed. Firstly, the method samples the time sequence and utilizes a low-pass filter to extract the burst component based on the different attenuation characteristics of different types of sampled signals. Then, it uses a traversal and matching method to extract the periodic sub-sequence based on Euclid distance between vectors. Finally, it decomposes the encrypted packet sequence into the burst, periodic, and random components. The method does not need to par se the payload of packet, while the periodic component can be used to analyze the user′s routine behavior, and the burst component can be used for burst abnormality detection. Simulation results show that the method can effectively distinguish different components in the sequence.
Liu Feng , Xuan Shibin , Liu Xiangpin
2015, 30(2):452-463. DOI: 10.16337/j.1004-9037.2015.02.025
Abstract:To improve the accuracy and robustness of occlusions and fast moving in video target tracking, a tracking algorithm based on particle filter optimized by a new cloud adaptive particle swarm optimization(CAPSO) is proposed. The possible position of moving target in the next frame image is predicted by particle filter, and the target template and candidate regions are mateched with the color histogram statistical characteristics to ensure the tracking accuracy. Then the proposed CAPSO is utilized to divide the particles into three group based on the fitness of the particle in order to adopt different inertia weight generating strategy. The inertia weight in general group is adaptively varied depending on X-conditional cloud generator. The inertia weight has randomness property because of the cloud model. Therefore, the re-sampling frequency of particles filter is reduced. The computational cost of particle filter is effectively reduced and it is effective to solve the target tracking problem of occlusions. In addition, the algorithm can effectively balance the global and local searching abilities of the algorithm by adopting three different inertia weight generating strategies, which can adjust the particle search range, thus being adaptable to different motion levels. Experimental results show that the proposed algorithm has a good tracking accuracy and real-time performance in case of occlusions and fast moving in video target tracking.
2015, 30(2):464-468. DOI: 10.16337/j.1004-9037.2015.02.026
Abstract:Potential information with high value are carried by short text information flow in transmission. A model of decision tree for hot topic is established with the information entropy of training data set, according to the characteristics of short text information flow. The average amount of information of each topic categories and the information gain ratio of each characteristic word for distinguishing short text information flow are computed in the first step by the above algorithm of decision tree. Then, the characteristic word with maximum information gain ratio is selected for the job of test, while the top down construction process of the decision tree is accomplished. Finally, the hot topic is determined according to the leaf node type. The experiment result on real short text information flow shows that the proposed algorithm is more stable and faster than others.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links