• Volume 36,Issue 5,2021 Table of Contents
    Select All
    Display Type: |
    • Attack Methods in Speaker Verification System:The State of the Art and Prospects

      2021, 36(5):831-849. DOI: 10.16337/j.1004-9037.2021.05.001

      Abstract (1628) HTML (1753) PDF 1.55 M (2071) Comment (0) Favorites

      Abstract:The development of automatic speaker verification (ASV) technology is profoundly affecting and changing the current human-computer interaction system. As the core speech function of some smart devices, ASV can accept the voice of the target speakers and accurately identify the speakers’ identities. In recent years, the rapid development of artificial intelligence technology has promoted the leapfrog development of ASV systems. However, with the development of artificial neural network and deep learning technology, more and more researchers begin to study the way to attack ASV systems. How to attack ASV systems through a series of processing of raw speech has been a hot topic in speech research in recent years. At present, the attack methods of ASV systems can be roughly divided into spoofing attacks and adversarial attacks. In this paper, the typical methods and basic principles of the two kinds of attacks are summarized, some problems existing in current attack methods are sorted out, the safety problems existing in the system of ASV are revealed, a brief outlook on the future development of ASV system security is given, and the development directions of improving the security and reliability of ASV systems are provided.

    • Review of Multicomponent Gas Sensors Based on Photoacoustic Spectroscopy Technology

      2021, 36(5):850-871. DOI: 10.16337/j.1004-9037.2021.05.002

      Abstract (2264) HTML (2203) PDF 5.05 M (2904) Comment (0) Favorites

      Abstract:Multi-component trace gas detection has important research and application value in industrial, military, agricultural, medical and other fields. Photoacoustic spectroscopy technology is favored by researchers since its high sensitivity, fast response, high selectivity, non-contact real-time continuous measurement and other advantages. Firstly, the basic principle of acoustic spectroscopy and the demand for multicomponent gas monitoring are expounded in this manuscript. Then, in the perspective of light source classification, the existing multicomponent gas measurement technology of the latest research progress is introduced. The commonly used photoacoustic spectroscopy is also reported including multiplexing technology and Fourier transform infrared spectrum interferometric, etc. The application scope, advantages and disadvantages are compared and analyzed. At the same time, the spectral interference, adsorption-desorption effect and the corresponding solutions of the gas sensing system are introduced in view of the practical application environment. Finally, the future development of multicomponent photoacoustic spectroscopic detection methods is summarized and prospected.

    • Review on Sound Field Reproduction Based Three-Dimensional Audio Technology

      2021, 36(5):872-883. DOI: 10.16337/j.1004-9037.2021.05.003

      Abstract (1874) HTML (1729) PDF 791.51 K (2122) Comment (0) Favorites

      Abstract:In recent years,spatial audio technology has been widely used and developed in a variety of fields. Many studies have been focused on the flexibility of creating an immersive spatial auditory experience. In this review, the advantages and disadvantages of the state-of-art approach have been analyzed. It mainly involves the dimension and representation of sound field, the reproduction of sound field, and corresponding room equalization technology. Finally, we discuss the current problems and predict possible promising directions for this research topic.

    • Improved GSC Speech Enhancement Algorithm Based on Kalman Filtering

      2021, 36(5):884-890. DOI: 10.16337/j.1004-9037.2021.05.004

      Abstract (1059) HTML (1045) PDF 1.37 M (1749) Comment (0) Favorites

      Abstract:In view of the poor performance of generalized sidelobe canceller (GSC) of incoherent noise cancellation, an improved GSC denoising algorithm with post-Kalman filter is proposed. The algorithm corrects the adaptive noise canceller through the normalized least mean square algorithm, outputs the speech signal after filtering the directional interference noise to the Kalman filter, and iterates the residual background noise with the minimum mean square error (MMSE) to suppress incoherent noise and thermal noise generated by microphone array elements. After the objective speech quality evaluation, perceptual evaluation of speech quality (PESQ), and spectrogram analysis under different signal-to-noise ratio conditions, it is proved that compared with the traditional GSC and the improved GSC of post-spectral subtraction, this algorithm is better at noise elimination. The performance is better, and the enhanced signal is closer to the target signal.

    • High-Speed Ultrasound Endoscopic Imaging System Based on High Repetition Frequency Transmit Circuit

      2021, 36(5):891-897. DOI: 10.16337/j.1004-9037.2021.05.005

      Abstract (876) HTML (976) PDF 1.02 M (1583) Comment (0) Favorites

      Abstract:Optical coherence tomography intravascular ultrasound (OCT-IVUS) imaging can compensate for the low imaging depth of optical phase interference imaging and the low resolution of ultrasound imaging, and can comprehensively identify vulnerable plaques in blood vessels. However, due to the limitation of the IVUS ultrasound transmitting repetition frequency, it is difficult for OCT-IVUS imaging to ensure a large number of A-line acquisitions while performing high-frame-rate imaging, thereby limiting the display resolution. To improve the imaging speed of IVUS without reducing the resolution of image display, a high-repetition ultrasound excitation method is applied to solve this problem. Here, an ultrasound transmit circuit with a 50 kHz repetition frequency is designed to improve the imaging speed of IVUS, and a high-speed ultrasonic endoscopic imaging system with 50 f/s is developed based on this circuit. The high-voltage pulse test and the signal noise ratio (SNR) test proved that the transmit circuit is able to excite a 25 MHz transducer while obtaining a signal with high SNR. The transmit circuit can be used for the excitation of 25 MHz ultrasound transducers and has a high SNR; the ultrasound endoscopic imaging system developed by the application of this excitation circuit can increase the imaging speed without reducing the display resolution. The system has clinical application value for improving the effective use of OCT-IVUS, early detection, diagnosis, and prevention of cardiovascular diseases, and has a promoting effect on the detection of vulnerable plaques.

    • Survey on Facial Expression Synthesis Algorithms

      2021, 36(5):898-920. DOI: 10.16337/j.1004-9037.2021.05.006

      Abstract (1261) HTML (2692) PDF 2.21 M (2414) Comment (0) Favorites

      Abstract:Facial expression synthesis technology is designed to reconstruct face image with new expressions while retaining identity information. The development of deep learning provides a new solution for the synthesis of facial expressions. This paper introduces the development of facial expression synthesis technology from the aspects of feature extraction, expression synthesis of generated antagonistic networks and experimental evaluation. Firstly, extraction of facial features is introduced, which is the key technology in expression synthesis. Facial features can describe facial expressions objectively and comprehensively. Secondly, the state-of-the-art facial expression synthesis methods based on deep learning are analyzed, in which methods based on generative adversarial network (GAN) are mainly discussed. By research on facial expression datasets and evaluation methods, the widely used facial expression datasets and objective evaluation methods are given in this paper. Finally, future work is discussed according to the existing problems of facial expression synthesis methods.

    • Lightweight Model for Bone-Conducted Speech Enhancement Based on Convolution Network and Residual Long Short-Time Memory Network

      2021, 36(5):921-931. DOI: 10.16337/j.1004-9037.2021.05.007

      Abstract (893) HTML (1017) PDF 2.56 M (1693) Comment (0) Favorites

      Abstract:Bone-conducted speech enhancement based on deep learning has reached a milestone recently. However, there are still some issues to prevent its real-world applications, such as large models and high computational complexities. In this paper, a lightweight deep learning model is proposed to improve the efficiency of bone-conducted speech enhancement. Inspired by the fact that convolution network has unique advantages in feature extraction with a few of parameters, convolution structures are introduced into the frequency dimensions of the spectrogram in our model. These structures can extract the details of the spectrogram in the time-frequency structures and explore the potential relationship between high and low frequency components. These new features extracted by CNN are fed into the improved long short-term memory network to recover high-frequency components information and reconstruct speech signals. From the experiments on bone conduction speech database, we can draw a conclusion that the proposed model can reconstruct the time-frequency details of the high-frequency components. While improving the enhancement performance, the model size and the computational complexity are reduced.

    • Spoken Document Classification Based on Fusion of Acoustic Features and Deep Features

      2021, 36(5):932-938. DOI: 10.16337/j.1004-9037.2021.05.008

      Abstract (903) HTML (706) PDF 603.98 K (1375) Comment (0) Favorites

      Abstract:Traditional speech document classification systems are usually completed through the transcribed text from speech recognition systems, which suffer from the recognition errors. Although the fusion of speech and recognized text can reduce the impact of recognition errors to some extent, the fusion that is made at the level of representation vector does not take full advantage of the complementarity between speech and text information. A neural network spoken document classification system based on the fusion of acoustic feature and deep feature is proposed in this paper. In the training procedure of the neural network,a trained acoustic model is first adopted to generate deep feature that contains semantic information for each document. Then acoustic feature and deep feature of each spoken document are fused frame by frame through the gating mechanism. Finally, the fused feature is used for spoken document classification. The proposed system is evaluated on a speech news broadcast corpus. The experimental result showed that the proposed system was obviously superior to the spoken document classification systems based on the fusion of speech and text, and the final accuracy reached 97.27%.

    • Low-Complexity Echo Cancellation Algorithm Based on L0-IPNLMS for Hearing Aids

      2021, 36(5):939-949. DOI: 10.16337/j.1004-9037.2021.05.009

      Abstract (894) HTML (826) PDF 1.91 M (1423) Comment (0) Favorites

      Abstract:An L0-norm constrained improved proportional normalized least-mean-square (L0-IPNLMS) algorithm based on set membership filtering (SMF) theory (SM-L0-IPNLMS) is proposed to effectively reduce the computational complexity of echo cancellation algorithms in digital hearing aids. The variable step size theory of set membership (SM) is introduced into the L0-IPNLMS algorithm to achieve a faster convergence speed in the proposed algorithm. Moreover, by updating the filter coefficients selectively under the bounded error margin, unnecessary iterations are reduced and then the power consumption of the digital hearing aids are decreased. Experiments demonstrate that compared to the L0-IPNLMS algorithm, the computation of the new algorithm is reduced by 15.3%. In the situation that random signal and real speech are input respectively, the convergence speed is improved by 28% and 32.8%, the misalignment is reduced by 1 dB and 3 dB, the mean square error is reduced by 0.66 dB and 1.68 dB, the echo loss enhancement is improved by 0.7 dB and 1.79 dB correspondingly. Furthermore, the SM-L0-IPNLMS algorithm is greatly robust for the input conditions of low SNRs.

    • Heart Sound Segmentation Algorithm of HSMM Based on SVM and Shannon Energy

      2021, 36(5):950-959. DOI: 10.16337/j.1004-9037.2021.05.010

      Abstract (846) HTML (1162) PDF 1.49 M (1590) Comment (0) Favorites

      Abstract:Aiming at the heart sound envelope burr produced by Hilbert transform in the hidden semi-Markov model(HSMM)based on logistic regression, an HSMM combining support vector machine (SVM) and Shannon energy is proposed. First, the wavelet denoising method is used to denoise the heart sound, the heart sound is labeled according to the R peak and T wave, and the Shannon energy envelope and other characteristics are extracted. Then, the HSMM related parameters are trained based on the logistic regression model (LR), and the most possible state is deduced with the help of Viterbi algorithm. Finally, the first heart sound S1 and the second heart sound S2 are identified through the SVM model. The algorithm does not need to set a hard threshold, effectively suppresses noise, and is more helpful for envelope extraction. Experimental results show that the segmentation accuracy of the proposed algorithm is significantly improved compared with the reference algorithm, with good anti-noise performance and better segmentation results.

    • Environmental Sound Classification Method Based on Multilevel Residual Network

      2021, 36(5):960-968. DOI: 10.16337/j.1004-9037.2021.05.011

      Abstract (991) HTML (666) PDF 1.37 M (1463) Comment (0) Favorites

      Abstract:To better identify and classify environmental sound, a multilevel residual network (Mul-EnvResNet) is proposed for environmental sound classification. After time stretch and pitch shift for sound events, the Mel-frequency cepstral coefficients (MFCCs) and their deltas are extracted as feature parameters and sent into the Mul-EnvResNet to classify sound events. The experimental data set uses ESC-50, Mul-EnvResNet is compared with the end-to-end convolutional neural network (EnvNet), the attention based convolutional recurrent neural network (ACRNN) and the unsupervised filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). The experimental results show that, Mul-EnvResNet achieves the best accuracy rate of 89.32% in terms of classification accuracy, compared with the above three models, the classification accuracy has been improved by 18.32%, 3.22% and 2.82%, respectively, which also has obvious advantages compared with other sound classification methods.

    • Application Research of Underdetermined Mixed Matrix Estimation Based on Improved DBSCAN Algorithm

      2021, 36(5):969-977. DOI: 10.16337/j.1004-9037.2021.05.012

      Abstract (734) HTML (682) PDF 1.52 M (1441) Comment (0) Favorites

      Abstract:Aiming at the issue of underdetermined blind source separation (UBSS), when using the density based spatial clustering of applications with noise (DBSCAN) algorithm to estimate the cluster center, it is easy to fall into the local optimum. Therefore, the accuracy of the mixing matrix composed of the cluster center coordinates is reduced, resulting in unsatisfactory signal separation results. This paper proposes a cuckoo adaptive search swarm optimization based on DBSCAN (CASSO-DBSCAN) algorithm. The algorithm enhances the global adaptive search ability based on the Levy flight strategy, and uses the idea of learning from the group to refine the optimization to obtain the optimal solution, which can estimate the cluster centers more accurately. The paper verifies the algorithm through the simulation of blind source separation of speech signals. Results show that it can effectively improve the estimation accuracy of the underdetermined mixing matrix and has good robustness, which proves the feasibility of the algorithm.

    • Multi-AUV Cooperative Localization Method Based on Factor Graph Combined with Chi-Square Detection

      2021, 36(5):978-985. DOI: 10.16337/j.1004-9037.2021.05.013

      Abstract (744) HTML (558) PDF 1.52 M (1572) Comment (0) Favorites

      Abstract:In order to solve the problem of abnormal value of underwater acoustic communication noise caused by complex underwater environment, a multi-AUV cooperative localization algorithm based on factor graph and chi-square detection is proposed. A factor graph model is developed to transform the global function estimation problem into a local function and product estimation problem, using cardinality to detect ranging noise outliers. The proposed algorithm significantly reduces the localization error compared with the conventional Kalman filtering algorithm in the presence of ranging noise outliers. The study is validated with mathematical simulations, showing that the proposed algorithm can effectively improve the positioning stability of the system and deal with the effects of ranging noise outliers on the positioning performance.

    • Data Collection and Feature Analysis of Server Energy Consumption in Data Center

      2021, 36(5):986-995. DOI: 10.16337/j.1004-9037.2021.05.014

      Abstract (1234) HTML (1179) PDF 1.33 M (1727) Comment (0) Favorites

      Abstract:The problem of high energy consumption and low energy efficiency of data center has been pard extensive attention to and investigated by researchers. However, there is no public dataset of server energy consumption for researchers to use, and current filter feature selection can not satisfy requirements of engineers. Here, a simulation environment architecture is proposed to simulate the running state of servers in the data center. Based on the proposed architecture, performance parameters and energy consumption data of server are collected when the server as running various tasks. Causal feature selection is applied to the feature analysis of energy consumption datasets, and thus an interpretable feature subset is constructed and the energy consumption forecast results are obtained. Experimental results show that the size of causal feature subset is about 1/3 to 1/6 of the size of filter feature subset, and the model trained with causal feature subset achieves the optimal prediction accuracy in 75% of the cases.

    • Improved Flower Pollination Algorithm Based Virtual Machine Allocation Approach in Cloud Data Centers

      2021, 36(5):996-1006. DOI: 10.16337/j.1004-9037.2021.05.015

      Abstract (763) HTML (684) PDF 1.22 M (1451) Comment (0) Favorites

      Abstract:For a cloud data center, minimizing resource wastage and increasing resource utility efficiency are two important aims. So an efficient virtual machine allocation strategy is necessary. A flower pollination algorithm based virtual machine allocation(FPA-VMA)approach is proposed. In FPA-VMA, the plant has only one flower, and each flower produces only one pollen gamete. The flower and pollen gamete are similar to the virtual machine and physical machine in cloud data center. The cloud client resource requesting model and the multi-dimensional resource energy consumption model are also analyzed and described. FPA-VMA uses a strategy which is called dynamic switching probability (DSP). DSP finds a near optimal solution quickly and balances the exploration of the global search and exploitation of the local search, thus improving the global convergence of FPA-VMA. Experimental results on the real virtual machine workloads show that FPA-VMA has better performance in resource wastage and energy consumption compared with previous VMA strategies.

    • INTER-VMM: An Interrelation Approach in Virtual Machine Selection and Placement for Virtual Machine Migration

      2021, 36(5):1007-1019. DOI: 10.16337/j.1004-9037.2021.05.016

      Abstract (718) HTML (588) PDF 1.06 M (1210) Comment (0) Favorites

      Abstract:Low energy consumption and full utilization of physical resources are two primary objectives for green cloud data construction, so a virtual machine migration model is required to complete the optimization. An interrelation approach in virtual machine migration(INTER-VMM) is proposed in this paper, which interrelated virtual machine selection and its placement. An energy consumption model based on multi-dimensional physical resources for cloud data centers is designed in INTER-VMM. It is a virtual machine migration strategy combining host detection, virtual machine selection and its placement. The CPU utilization selection(HPS) is adopted in virtual machine selection, which selects the virtual machine with the highest CPU utilization on the overloaded physical host and lets it enter the list of candidate migration virtual machine. The space aware placement (SAP) is adopted in virtual machine placement, considering the method of making full use of the spare time of the physical host. Simulation results show that INTER-VMM has better performance indices than those of common virtual machine migration strategies in recent years, which is valuable for cloud service providers.

    • Key Management Method in Mobile Scenarios for Heterogeneous Wireless Sensor Networks

      2021, 36(5):1020-1029. DOI: 10.16337/j.1004-9037.2021.05.017

      Abstract (683) HTML (489) PDF 1.00 M (1360) Comment (0) Favorites

      Abstract:Mobile wireless sensor networks (MWSNs) support the mobility of nodes, and face more complex security challenges. It is difficult to prevent attackers from launching some extremely destructive attacks, such as node replication attacks and sybil attacks. This paper proposes a safe and efficient key management method under the mobile heterogeneous wireless sensor network model. The proposed method adopts the elliptic curve cryptography encryption algorithm to realize the safe upload of mobile node location information to the base station, and uses the message authentication code based on key hash to realize the identity authentication of the message source. The base station performs statistical analysis on the collected location information of the mobile node to assist in completing the identity authentication and session key establishment between the fixed node and the mobile node. Experimental results show that the proposed method saves network resources during the key establishment process, and can effectively defend against attackers from launching replay attacks, node replication attacks, and sybil attacks, thus enhancing network security.

    • Non-stationary Financial Time Series Prediction Based on Self-adaptive Incremental Ensemble Learning

      2021, 36(5):1030-1040. DOI: 10.16337/j.1004-9037.2021.05.018

      Abstract (981) HTML (484) PDF 1.14 M (1612) Comment (0) Favorites

      Abstract:The financial market is very important to the development of social economy, so financial time series prediction (FTSP) has always been the research focus. So far, many methods based on statistical analysis and soft computing have been proposed to solve FTSP problems, most of which treat financial time series (FTS) as or convert them into stationary time series. However, since most FTSs are non-stationary, these methods usually have problems such as false regression or poor prediction performance. Therefore, this paper proposes a novel self-adaptive incremental ensemble learning (SIEL) method to solve the problem of non-stationary FTSP (NS-FTSP). The main idea of ??the SIEL algorithm is to incrementally train a base model for each non-stationary financial time series (NS-FTS) subset, and then ensemble the base models using the adaptive weighting rule. The focus of the SIEL algorithm is the update of data weight and base model weight. The weight of data is updated based on the performance of the current ensemble model on the latest dataset, and its purpose is not to sample the data, but to weigh the error; the weight of the base model is adaptively updated based on its environment, and the performance of the base model in the newer environment should have a higher weight. In addition, in view of the characteristics of NS-FTS, the SIEL algorithm proposes a strategy to coordinate new and old knowledge and cope with the recurrence of the environment. Finally, the paper gives the experimental results of the SIEL algorithm on three NS-FTS datasets and compares them with the existing algorithms. Experimental results show that the SIEL algorithm can solve the NS-FTSP problem well.

    • Insulator Mask Acquisition and Defect Detection Based on Improved U-Net and YOLOv5

      2021, 36(5):1041-1049. DOI: 10.16337/j.1004-9037.2021.05.019

      Abstract (1012) HTML (1833) PDF 1.68 M (2033) Comment (0) Favorites

      Abstract:Regular inspection of insulators of transmission lines is an indispensable task, while traditional manual inspections have problems such as low efficiency and high work intensity. Therefore, this paper designs an improved U-Net model to realize the segmentation of insulators, and uses an improved YOLOv5 to realize the positioning of blasting insulators in complex backgrounds. Based on the U-Net image semantic segmentation model, this paper proposes an improved network structure SERes-Unet. The model introduces residual structure to reduce the influence of gradient disappearance and structural information loss in the convolution process, and introduces an attention mechanism to correct feature weights, thereby improving network performance. In order to realize the detection of blasting insulators on high-resolution images, it is proposed to cut the pictures and then detect them, and then filter through Non-Maximum suppression(NMS) to obtain the positions of all blasting insulators in the image. The article designs multiple sets of experimental controls to verify the effectiveness and efficiency of the model. In the end, the method achieves an insulator segmentation accuracy of 0.96, a blasting insulator detection accuracy of 0.97, and a recall rate of 0.99.

    • Capture Methods of Gambling Related Illegal Websites in Massive Websites

      2021, 36(5):1050-1061. DOI: 10.16337/j.1004-9037.2021.05.020

      Abstract (708) HTML (2631) PDF 1.86 M (1963) Comment (0) Favorites

      Abstract:Aiming at the problem of detecting illegal gambling websites in massive websites, this paper proposes a classification method based on BERT-BiLSTM and multi-classifier decision-level fusion. This method improves the classification performance by adopting the following steps. Firstly, it extracts the textual information considered with high priority, i.e., meta information in HTML head and hyperlink titles on a web page, to enhance the richness of textual features. Secondly, a novel text classification model based on BERT-BiLSTM is designed, and it is proved superior in learning better sentence feature representatives and boosting performance. At last, the decision-level fusion is performed on the classification results from multiple dimensions (i.e., website title, keywords, and page text) to further improve the performance and robustness of the entire system. Moreover, a variety of strategies generating suspicious domain names are used to improve the ability to actively detect illegal websites. Experimental results and running results in real cyberspace demonstrate the effectiveness of the proposed method.

    • An Acoustic Wave Equation Emotion Recognition Model Based on Image Saliency

      2021, 36(5):1062-1072. DOI: 10.16337/j.1004-9037.2021.05.021

      Abstract (772) HTML (976) PDF 1.32 M (1465) Comment (0) Favorites

      Abstract:Speech emotion recognition (SER) is the key point for computer to understand human emotion, and it is also important in human-computer interaction. When the emotional speech signal transforms in the different media, the recognition accuracy of traditional deep learning model is not high enough, and the migration ability is not strong. Here, an acoustic wave equation emotion recognition model, i.e., image saliency gated recurrent acoustic wave equation emotion recognition (ISGR-AWEER) model is designed. The model is composed of image saliency extraction and gated recurrent model. The first part simulates the attention mechanism, which is used to extract the salient regions in speech. An acoustic wave equation emotion recognition model is designed. The model simulates the recurrent neural network, which can effectively improve the accuracy of SER in cross-media, and can quickly realize the model migration in cross-media. The effectiveness of the current model is verified by the experiments on the interactive emotional dynamic motion capture emotional corpus and the self-built multi-media emotional speech corpus. Compared with recurrent neural network, the accuracy of emotion recognition is improved by 25%, and it has a strong ability of cross-media migration.

Quick search
Search term
Search word
From To
Volume retrieval