2022, 37(2):247-278. DOI: 10.16337/j.1004-9037.2022.02.001
Abstract:Deep learning has recently achieved great breakthroughs in some fields of computer vision. Various new deep learning methods and deep neural network models were proposed, and their performance was constantly updated. This paper makes a survey on the new progresses of applications of deep learning on computer vision since 2016 with emphases on some typical networks and models. We first investigate the mainstream deep neural network models for image classification including standard models and light-weight models. Then, we introduce some main methods and models for different computer vision fields including object detection, image segmentation and image super-resolution. Finally, we summarize deep neural network architecture searching methods.
MA Lun , WANG Ruiping , ZHAO Bin , LIU Xin , LIAO Guisheng , ZHANG Yajing
2022, 37(2):279-287. DOI: 10.16337/j.1004-9037.2022.02.002
Abstract:The impaired behaviors of people with special needs bring heavy psychological pressure and economic burden to individuals, families and the whole society. This paper aims to explore the possibility of sensing the impaired behaviors of people with special needs by combining advanced AI techniques with wearable device embedded with 9-axis motion sensors to prevent accidents and reduce nursing costs. Firstly, the self-collected data are analyzed and preprocessed to extract the features of 108 dimensions. Secondly, in the process of feature selection, the feature is divided into three feature subsets by using two methods of priori analysis and random forest respectively. The purpose is to reduce the time cost on the premise of ensuring the recognition accuracy. Finally, two validation methods and six classifiers are used for evaluation. Experimental results show that multi-sensor data fusion can greatly improve the recognition rate of the classifier and the feature selection can ensure the recognition rate of the classifier under the premise of low performance loss. Feature subset 3 is more suitable for representing impaired behaviors of people with special needs. The light gradient boosting machine (LightGBM) has an obvious performance advantage, and the average recognition rate of 10-fold cross-verification can reach 93%, which turned out to be more feasible and practical considering both computation cost and classification accuracy.
ZHAO Jianchuan , YANG Haoquan , XU Yong , WU Lian , CUI Zhongwei
2022, 37(2):288-297. DOI: 10.16337/j.1004-9037.2022.02.003
Abstract:The key of language identification is to extract useful features from speech fragments. The time-delayed neural network (TDNN) can extract feature vectors, which contain rich context and improve system performance effectively. This paper proposes a multi-task learning method of ECAPA(Emphasized channel attention)-TDNN+contrastive predictive coding(CPC) network for language identification. ECAPA-TDNN is the main network to extract the global features of language. The improved CPC model is the auxiliary network, and the frame level features extracted by ECAPA-TDNN are compared and predicted. Finally, the joint loss function is used to optimize the network. The proposed method is tested on the 10 language data sets provided by the AP17-OLR data set.The result shows that the identification accuracy of the proposed network is higher than baseline on the 1 s, 3 s and All test data sets of AP17-OLR.
SONG Wei , XIE Jianping , GAO Qian , XIE Liangxu , XU Xiaojun
2022, 37(2):298-307. DOI: 10.16337/j.1004-9037.2022.02.004
Abstract:The high performance of artificial intelligence (AI) is usually dependent on large and sufficient data to train parameters. How to improve the predictive performance in the case of insufficient data, i.e., few-shot learning, is one of the important research subjects in the AI field. An image interpolation-based few-shot learning strategy is proposed, whose feasibility is verified in the task of handwritten digit image recognition. The few-shot learning performance of dense neural network and convolutional neural network in MNIST and USPS handwritten digit image recognition is systematically studied. The calculation results show that the image interpolation-based data enhancement method can evidently promote the characteristics extraction ability and learning efficiency of neural network in small sample data. Moreover, selecting the appropriate scaling coefficient of image interpolation can further optimize the few-shot learning performance of neural network.
Tian Jialue , Zhu Yulian , Chen Feiyue , Liu Jiahui
2022, 37(2):308-320. DOI: 10.16337/j.1004-9037.2022.02.005
Abstract:Whitening is a preprocessing method that can remove the correlation between variables of data. Two-dimensional whitening reconstruction (TWR) is a new whitening method for a single image. In this paper, we will elaborate the equivalence between TWR and column-based ZCA whitening, that is, TWR can remove the correlation in image column. However, the correlation within the local block of the image is often much greater than that within the column. From the perspective of removing the correlation within the local block of the image, this paper proposes two improved TWR methods: reshaped-based TWR (RTWR) and patch-based TWR(PTWR). RTWR firstly reshapes an image to form a new matrix of which each column vector corresponds to the sub-block of the original image, and then performs the TWR on the reshaped matrix. In PTWR method, TWR is directly applied to each sub-block of the image. The experimental results on ORL, CMU PIE and AR face datasets show that RTWR and PTWR are more beneficial to improving the subsequent classification performance than TWR.
Li Hongyu , Shen Feng , Han Lu , Zhu Qiuming , Ding Guoru , Du Xiaofu
2022, 37(2):321-335. DOI: 10.16337/j.1004-9037.2022.02.006
Abstract:Spectral data is often characterized by multiple dimensions, such as frequency, time, space, and signal strength, which poses challenges for data acquisition and visualization. The electromagnetic spectrum situation is introduced to characterize the distribution of signal power spectral density in electromagnetic space to realize the spectrum situation awareness in the target region. At present, the acquisition method of spectrum data is usually to arrange a large number of discrete distributed sensors in the target area, which leads to low sampling efficiency and high sampling cost. In the case of limited resources, the above sampling method is not desirable. Therefore, on the basis of comprehensive consideration of sampling time and sampling coverage, in order to achieve electromagnetic spectrum posture cartography in the target region, a method of electromagnetic spectrum situation cartography based on hybrid model and data driven by UAV sampling is proposed. The simulation results show that the proposed method can effectively complete the electromagnetic spectrum situation mapping in the target region, and its completion accuracy and mapping effect are both better than the traditional interpolation algorithm and tensor completion algorithm.
2022, 37(2):336-345. DOI: 10.16337/j.1004-9037.2022.02.007
Abstract:For the problem of user pairing in uplink non-orthogonal multiple access (NOMA), this paper proposes a NOMA user pairing scheme based on bilateral matching model. Different from the existing NOMA user pairing scheme, this scheme adopts pre-grouping according to the user channel gain, so as to avoid the user pairing with large channel gain gap. At the same time, users with small channel gain difference can be avoided to improve the overall performance of the system. Considering that users with too small channel gain cannot communicate in the real scene, a threshold value of channel gain is set as the decision condition of whether the communication can be achieved. After grouping, the channel gain difference is used as the preference degree for pairwise pairing between groups. The simulation results show that compared with the traditional NOMA pairing scheme and the classical orthogonal multiple access (OMA) network, the proposed scheme can effectively improve the traversal and speed of the system. The proposed scheme is still superior to other schemes when the threshold value of channel gain changes.
FU Weicheng , WU Wei , QIU Fasheng , Li Zhe
2022, 37(2):359-370. DOI: 10.16337/j.1004-9037.2022.02.009
Abstract:Magnetic acoustic emission (MAE) is an acoustic emission signal generated in the magnetization process of ferromagnetic materials, which has a wide range of applications in component stress detection and micro damage detection. Aiming at the characteristics of MAE signal instability, complexity, and attenuation, a denoising method based on seagull optimization algorithm combined with variational mode decomposition (SOA-VMD) is proposed. In order to overcome the problem of getting into the local optimal solution in the solving process of the seagull algorithm, we use the Cauchy variation operator to generate random iterations, making Cauchy variation seagull optimization algorithm (CVSOA) to jump out of premature convergence. The amplitude spectrum entropy is used as the fitness function, and the SOA is used to optimize the number of decomposed modes K and secondary penalty term
YIN Dexin , ZHANG Damin , ZHANG Linna , CAI Pengchen , QIN Weina
2022, 37(2):371-382. DOI: 10.16337/j.1004-9037.2022.02.010
Abstract:In order to solve the problems of spectrum shortage caused by massive data exchange in industrial internet of things, cognitive radio technology is applied to the industrial internet of things in this paper. This paper proposes a spectrum allocation strategy based on improved sparrow algorithm and power control in cognitive industrial internet of things(CIIOT). This strategy is based on the premise of maximizing fairness and energy efficiency. First, this paper uses improved binary sparrow search algorithm(IBSSA) based on improved map compass operator and step-size factor, which is used to allocate spectrum for CIIOT users. Then, in the communication process to optimize the transmitting power, this paper uses the closed-loop power control algorithm based on receiving SINR to adjust the dynamic power of the users. Finally, the energy efficiency and fairness of the system are taken as evaluation indexes, and the binary sparrow algorithm (BSSA) and binary bat algorithm (BBA) are compared. Simulation results demonstrate that IBSSA can achieve higher system energy efficiency and user fairness than BSSA and BBA, showing that the proposed optimization strategy significantly improves the fairness and energy efficiency of the CIIOT.
Ren Minjie , Jin Guoqing , Wang Xiaowen , Chen Ruidong , Yuan Yunxin , Nie Weizhi , Liu An’an
2022, 37(2):383-395. DOI: 10.16337/j.1004-9037.2022.02.011
Abstract:With the advent of the all-media era and the development of social networks, the popularity prediction begins to play an important role in the monitoring of public opinion and the competition of data discourse power. The existing popularity prediction researches mostly focuse on foreign media, and it is an emerging and challenging direction to predict the popularity of domestic mainstream media such as microblog. In this paper, we conduct the research on microblog, a domestic social media platform, through the analysis of microblog’s content and users, and design a variety of popularity prediction schemes. Meanwhile, we propose a microblog popularity prediction algorithm based on XGBoost, which converts the popluarity prediction problem into an interactive value file classification problem, and use the extracted and fused features for model training under the categorical framework, which can predict the popularity of microblog with user information more accurately. The proposed algorithm is verified in the microblog popularity prediction dataset, whose accuracy rate can achieve as high as 85.69%.
YANG Haitao , WANG Huapeng , NIU Jinlin , CHU Xianteng , LIN Nuanhui
2022, 37(2):396-404. DOI: 10.16337/j.1004-9037.2022.02.012
Abstract:In order to improve the accuracy of speech spoofing detection, a speech spoofing detection method based on LSTM-GRU network is proposed. LSTM-GRU network is a hybrid network combining long short-term memory(LSTM) layer, gated recurrent unit (GRU) layer, dropout layer, batch normalization layer and dense layer in series. LSTM layer can solve the problem of longtime dependence in speech sequence, while GRU layer can reduce the number of model parameters. The experiment is conducted on the ASVspoof2019 LA dataset, and the 20-dimensional Mel-frequency cepstral coefficient features are extracted for model training. In the test stage, the trained LSTM-GRU model is used for deception detection of the speech in the test set. By comparing with separate GRU and LSTM networks, the results show that: LSTM-GRU network achieves the highest correct recognition rate among the three network models; the equal error rate is 27.07% lower than the baseline system provided by the ASVspoof2019 challenge; the average accuracy of speech detection for logical access attack is 98.04%; LSTM-GRU network has the advantages of short training time, over-fitting prevention and high stability. It is proved that the proposed method can be effectively applied to speech logical access attack detection task.
ZHAO Fan , LI Linyun , WEI Renjie , ZHANG Zhiwei
2022, 37(2):405-414. DOI: 10.16337/j.1004-9037.2022.02.013
Abstract:Aiming at the problem that the existing dam disease detection methods can only roughly locate the area where the crack is located, a dam crack extraction method based on a universal target detector is proposed. Firstly, a two-target detector is designed to detect the crack area and the water stain area as two independent targets on the image at the same time. Secondly, the geometric position relationship between the crack area and the water stain area associated with the same crack is established. Finally, the upper boundary of the water stain frame contained in the crack frame is uniformly sampled, and the curve fitting is performed on the sampling points to obtain the crack curve. The experimental results show that the proposed algorithm can not only accurately detect the crack frame and water stain frame, but also fit the crack curve completely, and it has been effectively verified in the detection of dam diseases with millimeter-level width.
LI Yunhong , YAN Junhong , HU Lei
2022, 37(2):415-425. DOI: 10.16337/j.1004-9037.2022.02.014
Abstract:The shape, direction and category of text in natural scenes are varied, and scene text detection is still a challenge. In order to better separate text from non-text and accurately locate the text area in natural scene image, this paper proposes a text detection network that fuses local and global features. Multi-scale global feature fusion is realized through jump connection, and the constant residual block is improved to realize local fine-grained feature fusion, thereby reducing the loss of feature information and enhancing the strength of feature extraction in text regions. The combination of polygon offset text field and text edge information is used to local text region accurately. In order to evaluate the effectiveness of the method in this paper, multiple sets of comparative experiments are conducted on the existing classic data sets ICDAR2015 and CTW1500. The experimental results show that the method has better performance in text detection in complex scenes.
Xiang Jianhong , Liu Zhuo , Wang Linyu , ZHONG Yu
2022, 37(2):426-436. DOI: 10.16337/j.1004-9037.2022.02.015
Abstract:Automatic driving is one of the most difficult tasks in computer vision, and semantic segmentation in road scenes is one of the core technologies of automatic driving. This paper proposes an upsampling method based on enhanced semantic flow field, which can make the semantic information of the generated graph more detailed and the boundary clearer by learning the semantic flow field between adjacent feature graphs. At the same time, aiming at the difficulty of processing target scale changes and identifying small targets in road scenes, a new multi-level feature fusion method is proposed, which fully integrates deep semantic information and shallow detail information to adapt to targets of different scales. In this paper, CamVid is taken as the data set and data enhancement is carried out. Experiments show that both methods proposed in this paper bring significant improvement in accuracy. Compared with PSPNet, Deeplabv3+ and other models, the overall network has higher accuracy and the segmentation effect is closer to the real value.
2022, 37(2):437-445. DOI: 10.16337/j.1004-9037.2022.02.016
Abstract:The double-talk scenario will deteriorate the performance of echo canceller in acoustic echo cancellation, while traditional double-talk detection and other methods of controlling the adaptive step-size cannot effectively deal with it. To solve this problem, a method of adjusting the adaptive step-size according to the spectral signal-to-interference ratio (the ratio of the near-end speech’s power spectrum to the echo’s power spectrum) is proposed. In order to reduce computational complexity and processing delay, the partitioned frequency block least mean square (PFBLMS) algorithm is used as the adaptive filtering algorithm. So the adaptive step-size is adjusted in the frequency domain. First, the relationship between the spectral signal-to-interference ratio and the coherence function is established. Second, the step-size is obtained through the coherence function. Third, the adaptive step-size of each frequency point is adjusted in real time according to the calculated value. In addition, the dual filter and the sparse control algorithms are combined to further improve robustness and convergence performance of the system. The computer simulation shows that the system can not only guarantee good echo suppression ability in the double-talk scenario, but also track the changes of the echo channel in time. Compared with the double-talk detection method based on the normalized cross-correlation function and the echo cancellation algorithm in the open source project Speex, the proposed system achieves better echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ).
2022, 37(2):446-455. DOI: 10.16337/j.1004-9037.2022.02.017
Abstract:In modern trauma treatment, reasonable and accurate pre-hospital assessment based on the injury and making corresponding treatment decisions are of great significance for reducing the disability and mortality of patients. To improve the shortcomings of manual decision-making and achieve accurate and reasonable standardized trauma treatment decision-making, after in-depth analysis and research on the treatment decision, this study uses the multi-label learning method to divide the overall treatment decision into sub-decisions, and extracts judgment factors corresponding to the sub-decisions as a label sets. Next, to better consider the relationship between labels, this paper combines the chain idea of the Classifier Chains algorithm with the ML-KNN algorithm, and proposes a multi-label learning algorithm by improving the ML-KNN algorithm, named layer chains multi-label K-nearest neighbor (LCML-KNN). The LCML-KNN algorithm divides labels into two layer chains according to the characteristics. After the prediction label information of the first layer chain is output, it is uniquely encoded. And the transformed lables are put into the second layer chain as new features for prediction and judgement. The LCML-KNN algorithm not only better takes into account the relationship between the labels but also expands the feature dimension through the label conversion. The experimental results with various existing multi-label learning algorithms on two trauma datasets verify the robustness and superiority of the LCML-KNN algorithm.
XIE Miao , DENG Yulin , LYU Jie
2022, 37(2):456-462. DOI: 10.16337/j.1004-9037.2022.02.018
Abstract:To improve the performance of personalized recommendation system, a personalized recommendation method based on depth-restricted Boltzmann machine is proposed. Firstly, by extracting the characteristics of users and resources of the recommendation system, a multi-layer restricted Boltzmann machine (RBM) network is constructed, thus forming a personalized recommendation model of depth-restricted Boltzmann machine. Secondly, the maximum likelihood of the training samples to be recommended is calculated by the marginal probability distribution of visible and hidden layers. Then, combined with contrastive divergence (CD) reconstruction, the main parameter updating mode of RBM is obtained, and the stable RBM structure is obtained by updating the visible and hidden layers in both directions. Finally, the personalized recommendation is realized by calculating the user resource score. Experimental results show that, within the reasonable range of sparsity of training samples, compared with the commonly used personalized recommendation algorithms, the proposed method can obtain better root mean squared error (RMSE) performance by reasonably controlling the depth of RBM and setting the appropriate number of hidden layer nodes.
2022, 37(2):463-470. DOI: 10.16337/j.1004-9037.2022.02.019
Abstract:Aiming at the problems such as low estimation accuracy and poor robustness of traditional static soft measurement model in wind tunnel flow measurement, an Attention-LSTM-Kalman measurement model combing attention mechanism (Attention), long short-term memory (LSTM) and Kalman filtering (Kalman) is proposed: a static soft-measuring model is established through LSTM network. On this basis, an improved scheme based on attention mechanism is proposed. Considering the dynamic characteristics of the system, Kalman filter is used to dynamically adjust the output sequence of the soft-measuring model. Experimental results show that LSTM is better than recurrent neural network (RNN) and gated recurrent unit (GRU) models. The comparison of the prediction results of the three models based on LSTM, Attention-LSTM and Attention-LSTM-Kalman shows that the attention mechanism could effectively improve the accuracy of the model, and the introduction of Kalman filter improves the dynamic measurement characteristics of the model. The feasibility and effectiveness of the proposed model are verified by the flow measurement in the wind tunnel system.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links