2017, 32(2):205-220. DOI: 10.16337/j.1004-9037.2017.02.001
Abstract:Low resource speech recognition is one of currently researching hotspots in speech recognition community, and is also one of the important challenges for the application of multilingual and minority language speech recognition technologies. This paper summarizes and reviews the current states and history of low resource speech recognition, and introduces several key technologies, including articulatory feature, multilingual bottleneck feature, subspace Gaussian mixture model, convolutional neural network based acoustic model and recurrent neural network based language model. After that the open keyword search (OpenKWS) evaluation is introduced. Finally, the prospective of low resource speech recognition is presented.
Dai Lirong , Zhang Shiliang , Huang Zhiying
2017, 32(2):221-231. DOI: 10.16337/j.1004-9037.2017.02.002
Abstract:In this paper, deep learning is briefly introduced. Then, a review of the research progress of deep learning based speech recognition is presented from the following five points: Training criterions for deep learning based acoustic models, different model architectures for deep learning based speech recognition acoustic modeling, scalable and distributed optimization methods for deep learning based acoustic model training, speaker adaptation for deep learning based acoustic model, and deep leaning based end-to-end speech recognition. At the end of this paper, the future possible research points of deep learning based speech recognition are also proposed.
2017, 32(2):232-245. DOI: 10.16337/j.1004-9037.2017.02.003
Abstract:Compressed sensing (CS) is widely used in different areas. The key technologies of compressed sensing include the selection of sparse matrix, the construction of the measurement matrix, and the design of the reconstruction algorithm. Speech signal usually has special structural characteristics in the measurement matrix and reconstruction algorithm. In actual applications, noises may inevitably exist. In compressed sensing theory, the reconstruction system is nonlinear and sensitive to noise. Therefore, we need to study the robust compressed sensing technology. This technique would have utilizable perspective, if the robustness problem gets solved. The paper begins with the concept of compressed sensing, then analyses the effects brought by various noises. When it comes to the solutions to the noises in the speech signal, this paper focuses on the introduction of robust projection operator and robust recovery algorithms. Finally, the possible future research directions are prospected.
Zhao Li , Liang Ruiyu , Xie Yue , Zhuang Dongzhe
2017, 32(2):246-257. DOI: 10.16337/j.1004-9037.2017.02.004
Abstract:The early polygraph technology is easily affected by personal factors and external environment, especially the anti-polygraph technology. Although lie detection technology based on electroencephalogram can directly observe neural activity of the relevant brain regions to reveal the internal rules when lies occur, the required professional equipment is too large and expensive to use handle. Compared with the above technologies, lie detection technology in speech has spatio-temporal span and high concealment, etc. The development situation of current polygraph technology and basic principle are described, the types and characteristics of non-speech and speech related indicators are introduced and analyzed. Then, several public speech databases for lie detection are introduced, and the research progress of lie detection algorithm is highlighted. Finally, the future direction of lie detection technology in speech is summarized from five parts, i.e., the Chinese corpus, speech feature extraction, anti-polygraph technology, theoretical research and related auxiliary work.
Ye Zhongfu , Luo Dawei , Wei Jinqiang , Xu Xu
2017, 32(2):258-265. DOI: 10.16337/j.1004-9037.2017.02.005
Abstract:The received signals of array are coherent because of the reflection and refraction in the multipath propagation. The presence of coherent signals leads to the rank loss of signal covariance matrix, which results in the invalidity of conventional high-resolution direction of arrival(DOA) estimation methods. Therefore a variety of algorithms have been presented to solve this problem, which recover the rank of covariance matrix by utilizing the special property of steering vector. Recently, there has been a growing interest in deriving new algorithms that can reduce the loss of array aperture and increase the number of resolvable signals as well as the accuracy of DOA estimation. In this paper, the yield of coherent signals and their effect on DOA estimation are introduced, and the data models are established. Then, the coherent DOA estimation algorithms are presented class by class depending on their decorrelation approaches. Finally, the future research directions are prospected.
Zhang Xiongwei , Li Yinan , Shi Wenhua , Hu Yonggang , Chen Xushan
2017, 32(2):266-277. DOI: 10.16337/j.1004-9037.2017.02.006
Abstract:Non-negative compositional models are of great importance in the application of artificial intelligence, data mining and intelligent information processing research. They have gradually become one of the most representative and frequently used models of acoustic source separation in recent years. The embedded additive combination of non-negative components matches well with the characteristic of human perception. Techniques that make use of non-negative compositional models have been increasingly popular in acoustic source separation. Starting from the most basic non -negative compositional model, which is termed as non-negative matrix factorization (NMF), we firstly review the principles of non-negative compositional model, including the basic problem to be solved, the measurement of objective function and some typical methods to solve related problems. Based on these principles, we systematically discuss the variety extensions of NMF designed for particular applications in acoustic source separation. Finally, some open problems are presented and discussed.
Guo Jichang , Qiu Linyao,Zhang Xue
2017, 32(2):278-285. DOI: 10.16337/j.1004-9037.2017.02.007
Abstract:An improved algorithm is proposed based on unsymmetrical cross multi-hexagon grid search(UMHexagonS), which is a fast motion estimation algorithm recommended by H.264. In the prediction link for initial search points, predictive vector sets are established, and the follow-up search strategy is determined according to the length of prediction vector sets. In the global search link, the correlation between prediction motion vectors is used to skip some search steps, and some templates are replaced. In addition, the block of zero coefficients is detected to terminate the motion estimation process in advance, according to characteristics of integer transform and quantization. Experimental results show that, when the quantization step size is 28, the proposed algorithm reduces the motion estimation time by 34.80% compared with the UMHexagonS algorithm, while maintaining the performance. Finally, the algorithm can adapt to video sequences with different motion intensity under different quantization steps, which is a fast motion estimation algorithm with fast speed and good performance suiting for H.264.
Xu Kailiang , Zhang Zhenggang , Liu Dan , Ta Dean , Hu Bo
2017, 32(2):286-292. DOI: 10.16337/j.1004-9037.2017.02.008
Abstract:As one of the most promising technologies, ultrasonic Lamb waves have been widely analyzed in the non-destructive evaluation. However, due to the guided dispersion and mode overlap, mode identification and dispersion determination are still challenging, thereby attacting considerable attentions. Here, we use two traditional spectrum estimation methods, i.e., the Yule-Walker method and the Burg method to analyze the array-transducer signals for high-resolution ultrasonic Lamb waves dispersion measurement. An inverse method was also designed to estimate the aluminum plate thickness from the extracted wideband dispersion curves. Experiments on three aluminate plates with different thicknesses (3 mm, 4 mm and 5 mm) demonstrated that the spectrum estimation methods were helpful for the ultrasonic Lamb waves based non-destructive evaluation.
Li Junshan , Yang Yawei , Wang Rui , Hu Shuangyan , Sui Zhongshan , Ren Xinbo
2017, 32(2):293-299. DOI: 10.16337/j.1004-9037.2017.02.009
Abstract:The performance of images captured by traditional camera for motion deblurring is unstable. To tackle the problem, the principle and coded strategy of coded exposure camera are studied, and a novel point spread function (PSF) estimation and motion deblurring approach based on camera-optimized codes and efficient marginal estimation is proposed. Firstly, the alpha matting deblurring approach for traditional camera is investigated, which is extended to coded exposure camera. Then the coded factors influencing deblurring performance are analyzed to find the optimized code fitting for PSF estimation and invertibility. Finally, a PSF estimation approach based on efficient margin and maximum posteriori is modified, and images motion deblurring is accomplished with spatial prior of efficient marginal gradient in a coarse-to-fine way. Experimental results based on simulated and real images show that the proposed algorithm can effectively estimate PSF, and the performance for motion deblurring is superior to that of other existing approaches.
Wu Bo , He Shiwen , Yu Denggao , Li Yuanwen , Huang Yongming , Yang Lüxi
2017, 32(2):300-306. DOI: 10.16337/j.1004-9037.2017.02.010
Abstract:Full dimension massive multiple-input multiple-output(MIMO) system can significantly improve spatial resolution, reduce inference and improve power efficiency. When uniform planar array is configured at the base station and path loss is taken into consideration, this paper proposes a nonsymmetrical, dynamic and adaptive codebook design method based on the Kronecker product codebook. The method requires that the amplitude of the beam coverage area is not less than δ dB attenuation of that of the beam center. Starting from the edge of the coverage of the base station, the proposed method calculates each code word and puts it into codebook, thereby leading to larger coverage until the whole area is covered. Compared with the conventional DFT codebook, characteristics of the proposed codebook are analyzed. The new codebook can effectively partition the area into several circular areas and improve the density of beams at the edge of the cell. Finally, the size of the proposed code is analyzed in terms of δ dB attenuation and number of antennas.
2017, 32(2):307-313. DOI: 10.16337/j.1004-9037.2017.02.011
Abstract:In speech enhancement algorithm based on generalized sidelobe canceller (GSC), when there is an error in direction estimating, the target speech cannot be blocked by blocking matrix (BM) module completely. Then in the multiple-input canceller (MC) module, the target speech will be eliminated, which will cause the leakage of the target speech. In this paper, a new optimization algorithm is proposed for the leakage of the speech caused by the error of signal direction of arrival (DOA). First, we adjust the spectrum of the signal with time delay compensation, then the blocking matrix would be adjusted adaptively according to the characteristics of the correlation between the final output of MC module and the output of BM module. The estimated direction can be closer to the real target speech direction in order to reduce the leakage of the target speech. Simulation results show that the proposed algorithm has better speech enhancement performances in both objective and subjective evaluations.
Du Xiuli , Jiang Huancheng , Chen Bo , Qiu Shaoming
2017, 32(2):314-320. DOI: 10.16337/j.1004-9037.2017.02.012
Abstract:To increase low efficiency of handling high-speed data in existing adaptive filter algorithms, an least mean squarse(LMS) adaptive filter algorithm based on parallel technology and pipeline is proposed. The proposed algorithm accelerates data processing speed to improve the speed of weight coefficient computing significantly, and reduces the critical path to improve the system working clock effectively. In the experiment based on FPGA, for the LMS adaptive filter based on 4-channel parallel structure and 4-stage pipelines, its data processing rate increases by eight times, and the power consumption can be reduced to 16%, with the same rate of data processing. It can thus realize the real-time LMS adaptive filtering process of high-speed or hyper-speed data stream.
Shi Chenguang , Wang Fei , Zhou Jianjiang , Li Hailin
2017, 32(2):321-329. DOI: 10.16337/j.1004-9037.2017.02.013
Abstract:With rapid development of passive radar systems in modern battlefield, the environment of airborne radar faces serious threat and challenge. A novel radio frequency (RF) stealth performance optimization algorithm based on cooperative noise jamming in airborne radar system is proposed for RF stealth technology in modern electronic warfare. Firstly, the influence of cooperative noise jamming on RF stealth performance in airborne radar system is investigated in detail on the basis of power rule. Then, the probability of intercept is formulated. A novel RF stealth performance optimization algorithm in airborne radar system based on cooperative noise jamming is proposed, where the probability of intercept is minimized by optimizing the transmitting power and the cooperative jamming power on the guarantee of system performance. Numerical simulation results demonstrate the feasibility and effectiveness of the proposed algorithm.
Guo Changjian , Zheng Yicheng , Liang Jiawei , Hong Xuezhi , Li Rong
2017, 32(2):330-336. DOI: 10.16337/j.1004-9037.2017.02.014
Abstract:A novel long reach passive optical networks (LR-PON) with high power budget and high capacity is investigated. Based on self-phase modulation (SPM) induced negative chirp and super-Nyquist image induced aliasing, diversity is firstly introduced to the aliased components using the first-order super-Nyquist image. Then fractional sampling and per-subcarrier maximum ratio combining (MRC) are adopted to harvest the diversity gain. Simulation and experimental results show that, using our proposed scheme, the transmission length of a 10 GHz bandwidth QPSK modulated orthogonal frequency-division multiplexing (OFDM) signal can be extended from 45 km to more than 80 km without any forbidden area. It is also shown that with adaptive modulation, the proposed LR-PON system has a data rate of more than 32 Gb/s and a power budget of larger than 32 dB.
Wang Shengchun , Luo Siwei , Wang Xu , Huang Yaping , Dai Peng
2017, 32(2):337-345. DOI: 10.16337/j.1004-9037.2017.02.015
Abstract:Although automatic detection technology based on machine vision for railway infrastructure has been widely used, fence, as an important safeguard against foreign invasion to ensure safe running of train, has not been detected automaticaly yet, but manually as in traditional inspection. Based on panoramic stitching techniques, we acquire the panorama of the fence along the railway, and then extract gray-level statistical features such as the mean and variance values to construct the two-dimensional statistical histogram of panoramic image. On the bases of these data, we propose a segment method using the maximum entropy of two-dimensional gray mean-variance histogram to achieve rapid fence defect detection from the fence panorama. Experimental results verify the validity and accuracy of the proposed approach and it has the precision ratio of 87.5% and recall ration of 92.1%.
2017, 32(2):346-353. DOI: 10.16337/j.1004-9037.2017.02.016
Abstract:To effectively use the complementarity of different keyword spotting systems and solve the problem that the confidence scores from several different subsystems is not in the same range, a keyword spotting system based on score normalization and system combination is proposed. Firstly, to avoid keyword missing due to pruning errors in a large vocabulary recognition system, the keyword soft Beam pruning method is presented. Secondly, score normalization is needed to transform these confidence scores into a common domain, prior to combining them. Finally, after score normalization,the outputs are combined from different subsystems. Results show that score normalization methodology improves keyword search performance by 30% in average. Experiment also show that combining the outputs of diverse systems, system perform is 10% better than the best normalized KWS system.
Feng Zhe , Yang Xubing , Zhang Fuquan
2017, 32(2):354-362. DOI: 10.16337/j.1004-9037.2017.02.017
Abstract:For the Plane-Gaussian (PG) artificial network, its network parameters are generated from k-plane clustering algorithm in training phase. Compared with random parameters of extreme learning machine (ELM), PG is a time-consumer and easy to trap into local optimal solution. To improve the performance of PG network, inspired by ELM in this paper, a new training method based on random projection for PG network, termed as RandPG, is proposed. Typically, for the three-layer network, the weight matrix between input and hidden layers is selected by random projection to speed training network, and the weight matrix between hidden and output layers is obtained by Moore-Penrose generalized inverse. It is proved that the network has global approximation theoretically. Meanwhile, the effectiveness of this network is tested on the line-distribute datasets, planedistribute datasets and several UCI datasets. The results indicate that RandPG provides a simple and convenient way to train parameters of neural network, and it not only follows the advantage of PG network, which is more suitable for classifying subspace-distribute datasets, but also significantly accelerates its learning speed.
Li Na , Pan Zhisong , Shi Lei , Xue Jiao , Ren Yiqiang
2017, 32(2):363-374. DOI: 10.16337/j.1004-9037.2017.02.018
Abstract:The objects of the real world can be assigned multiple meaning, with a variety of non-single labels. As to multi-label learning, although the related current work may take advantage of the reuse score to analyze the relationship between multiple labels, it still can find neither the label structure nor the main labels and importance rankings. The nonnegative matrix factorization (NMF) method can divide associated nodes into societies effectively, and explore the potential relationship between them. Consequently, it is worth studying how to use NMF in multi-label community detection. Here, an algorithm is proposed for multi-label community detection, which can analysis labels effectively and discover the community structure inside, and then obtain relations community. Besides, these multi-label nodes can be sorted according to their importance scores, and then the master-slave structure of these marked nodes can be obtained and the effectiveness of this algorithm is thus verified, which helps us learn the hidden information
Jiang Rui , Gong Qingyong , Zhu Daiyin , Zhu Zhaoda
2017, 32(2):375-381. DOI: 10.16337/j.1004-9037.2017.02.019
Abstract:By utilizing the relationship of different baselines, the multi-baseline interferometric synthetic aperture radar (InSAR) can significantly enhance the capability of obtaining higher accuracy of the elevation measurement than that of traditional single-baseline InSAR. A novel filtering method of multi-baseline InSAR interferogram based on signal subspace processing is presented here. In the novel filtering method, we regard each pixel unit of different interferograms of different baselines as a training sample. Then the filter based on subspace tracking operation can effectively filtrate the noise interference of all the interferograms. Monte Carlo tests and simulated InSAR data validate that the new approach can achieve better filtering effect with similar levels of the execution time than the pivoting average-filter and the pivoting median-filter.
Tang Mengmeng , Ji Genlin , Zhao Bin
2017, 32(2):382-389. DOI: 10.16337/j.1004-9037.2017.02.020
Abstract:Trajectory outlier detection is significantly important in the field of trajectory data mining. Algorithm TOP-EYE (Top-k evolving trajectory outlier detection) is an efficient algorithm for detecting abnormal trajectory. From the point of view of the direction and density, algorithm TOP-EYE takes use of the method of evolutionary computation to detect anomalies, which is different from other algorithms. To improve the efficiency of mining trajectory outliers from massive trajectory datasets, the parallel algorithm for detecting trajectory outliers based on evolutionary computation, called PDAT-TOP (Parallel detecting abnormal trajectory based on TOP-EYE), is proposed. The algorithm takes advantages of parallel computation to improve the efficiency of detecting abnormal trajectory. Algorithm PDAT-TOP is implemented on Hadoop. Experimental results demonstrate that the algorithm can effectively detect abnormal trajectory, and it has high scalability and better speedup.
Wang Zhizhao , Lu Yuliang , Yang Guozheng , Guan Yongyao
2017, 32(2):390-398. DOI: 10.16337/j.1004-9037.2017.02.021
Abstract:Active measurement has been a primary method used in network path bandwidth measurements to date,with greater flexibility and deployment convenience compared to the passive one. The available bandwidth measurement probe model is not precise enough for background traffic and problems with high error rate, so we define the communication protocol, detect packet structure, and propose the calculation method of the increase rate of sequence delays and the outlier range. The algorithm can reduce the interference of background traffic on the network path. The effectiveness of the algorithm has been verified by using NS2 simulation.
Kong Chao , Zhang Huaxiang , Sheng Haidi
2017, 32(2):399-407. DOI: 10.16337/j.1004-9037.2017.02.022
Abstract:In view of application requirements of visual dictionary in image representation and retrieval, this paper proposes an image retrieval method based on the combination of multiple visual dictionaries and saliency weight, which can represent image features with saliency and sparsity. Firstly, the image is divided into blocks, and different kinds of underlying features of image blocks are extracted. Secondly, the image block features are used to learn the multiple visual dictionaries through non-negative sparse coding. The spatial information and saliency are introduced into the sparse vectors for the image blocks by the saliency pooling method, and saliency weight is introduced to form the sparse representation of the entire image. Finally, a proposed SDD distance is used for image retrieval. Compared with the method of single visual dictionary on common image dataset Corel and Caltech, Experimental results demonstrate that the proposed method can effectively improve the image retrieval accuracy.
Li Tiancai , Wang Bo , Xi Yaoyi , Zhang Jiaming
2017, 32(2):408-416. DOI: 10.16337/j.1004-9037.2017.02.023
Abstract:Text segmentation has important applications in many fields, including text summarization, information retrieval, and so on. Topic model is an important tool in text segmentation. However previous text segmentation methods based on topic model generally rely on manually setting of the number of topics influencing results significantly. To solve the problem, a novel text segmentation method based on hierarchical Dirichlet process(HDP) model is proposed. Firstly, texts are modeled with HDP model to get their expression with topic vectors. Then, the topic vectors are used in C99 segmentation algorithm for text segmentation. Finally, two optimization strategies are applied to result optimization. Experimental results show that the presented method can omit manually setting of the topics numbers and improve the performance of text segmentation.
2017, 32(2):417-423. DOI: 10.16337/j.1004-9037.2017.02.024
Abstract:Building detection in disaster area is pivotal in collecting disaster information and implementing post disaster rescue. Aiming at detecting buildings in disaster area from remote sensing images wtih high-resolution, an improved multi-directional and multi-scale segmentation algorithm based on morphological features is proposed to implement automated detection of buildings in disaster area. Firstly, we integrate the properties of morphological operators (e.g., reconstruction, granularity, and directionality) into the implicit characteristics of buildings (e.g. , brightness, size, and contrast) to extract bright and high-contrast buildings. Then, the regional image edge information is combined to extract potential buildings. Experimental results show that the proposed method has a higher detection rate and a low false rate in detecting buildings of disaster area.
Chen Chuixiong , Yan Yunyang , Liu Yi′an , Gao Shangbing , Zhou Jingbo
2017, 32(2):424-430. DOI: 10.16337/j.1004-9037.2017.02.025
Abstract:The extraction of motion regions and analysis of flicker are carried out separately in flame detection usually. A novel method is proposed here. Flicker of flame is detected while the motion regions are extracted. Firstly, candidate fire regions were detected based on Ohta color space with a color model of flame. Then, the motion regions with flicker frequency feature were extracted according to the degree and times of changes over a period at a certain position. Finally, the status whether the connected region is in flames or not was determined according to the intersection between the flame color region and the motion region. Experimental results show that the proposed method can ignore the regions which do not have the feature of flame flicker after the motion regions were extracted.It also performs well with high flame detection rate and lower false detection rate even if the motion region is incomplete.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links