LIU Anan , LI Tianbao , WANG Xiaowen , SONG Dan
2021, 36(1):1-21. DOI: 10.16337/j.1004-9037.2021.01.001
Abstract:In recent years, deep learning has been widely used and achieved significant development in various fields. How to utilize deep learning to effectively manage the explosive increasing 3D models becomes a hot topic. This paper introduces the mainstream algorithms for deep learning based 3D model retrieval and analyzes the advantages and disadvantages according to the experimental performance. In terms of the retrieval tasks, 3D model retrieval algorithms are classified into two categories: (1) Model-based 3D model retrieval algorithms require that both query and gallery are 3D models. It can be further divided into voxel-based method, point cloud-based method and view-based method in regard of different representations of 3D models. (2) For 2D image-based cross-domain 3D model retrieval algorithms, the query is 2D image while the gallery is 3D model. It can be classified to 2D real image-based method and 2D sketch-based method. Finally, we analyze and discuss existing issues of deep learning based 3D model retrieval methods, and predict possible promising directions for this research topic.
YE Jihua , LIU Kai , ZHU Jintai , JIANG Aiwen
2021, 36(1):22-34. DOI: 10.16337/j.1004-9037.2021.01.002
Abstract:With the rapid development of the animation industry, the face generation with animation characters becomes a key technology. The existing style transfer technology of painting style cannot obtain satisfactory animation results due to the following characteristics of animation: (1) Animation has a highly simplified and abstract unique style, and (2) animation tends to have clear edges and smooth shadows and relatively simple textures, which poses great challenges to the loss function in existing methods. This paper proposes a novel loss function suitable for animation. In the loss function, the semantic loss is expressed as a regularized form in the high-level feature map of the VGG network to deal with the different styles between real and animation images, and the edge sharpness loss with edge enhancement can preserve the edge sharpness of animation images. Experiments on the four public data sets show that through the proposed loss function, clear and vivid animation images can be generated. Moreover, in the CK+ data set, the recognition rate of the proposed method is increased by 0.43% (Miyazaki Hayao style) and 3.29% (Makoto Shinkai style) compared with the existing method, increased by 0.85% (Miyazaki Hayao style) and 2.42% (Makoto Shinkai style) in the RAF data set, and increased by 0.71% (Miyazaki Hayao style) and 3.14% (Makoto Shinkai style) in the SFEW data set, respectively. The generation effect in the Celeba data set is also demonstrated. The above results show that the proposed method combines the advantages of the deep learning model to make the detection result more accurate.
GUO Jichang , WANG Yudong , LIU Di , AI Yufeng , JIA Weiguang
2021, 36(1):35-44. DOI: 10.16337/j.1004-9037.2021.01.003
Abstract:Due to the insufficient underwater salient object detection datasets, the underwater image salient object detection network based on deep learning is prone to overfitting, which affects the performance of the network. In response to the above problems, this paper introduces an image style conversion method and proposes an underwater salient object detection network based on CycleGAN. The network generator is composed of an image style conversion subnetwork and a salient object detection subnetwork. First, the network trains the domain transform subnetwork through unsupervised cascade method, and uses the network to preform style transform on in-air and underwater images to construct training and testing datasets, so as to solve the insufficient problem of underwater salient object detection. Then, it uses in-air and salient object detection datasets after style transformation to train the salient object detection subnetwork to enhance the feature extraction ability of the network. Finally, the output results of the two image styles are fused and optimized to improve the performance of the saliency detection network. Experimental results show that compared with the land and underwater salient object detection network, the mean average error (MAE) and F-measure are relatively increased at least 10.4% and 2.4%, respectively.
LIU Ju , DU Ruohua , WU Qiang , HE Zekun , YU Luyue
2021, 36(1):45-52. DOI: 10.16337/j.1004-9037.2021.01.004
Abstract:In the process of multi-modality magnetic resonance image (MRI) data acquisition, there will be different degrees of modality data missing. However, most of the existing completion methods only aim at random missing, which cannot recover strip and block missing. Therefore, this paper proposes a classification framework of smooth tensor completion algorithm based on multi-directional delay embedding. Firstly, the folded tensor is obtained by multi-directional delay embedding of missing data. Then, the completed tensor is obtained by smoothing tensor CP decomposition. Finally, the reverse operation of multi-directional delay embedding is used to obtain the completed data. The algorithm is used to classify high-level and low-level tumors on the BraTS glioma image data set and compared with seven baseline models. The average classification accuracy of the proposed method achieves 91.31%, and experimental results show that the method has better accuracy compared with the traditional complement algorithm.
LI Lanlan , LIU Xiaolin , WU Kexin , LIN Liqun , WEI Hongan , ZHAO Tiesong
2021, 36(1):53-62. DOI: 10.16337/j.1004-9037.2021.01.005
Abstract:The hybrid coding framework of the third generation audio and video coding standard (AVS3) plays an important role in eliminating redundant information in the video time domain/space domain, but needs to be further improved in eliminating perceptual redundancy and further improving coding performance. This paper proposes a just noticeable distortion (JND) prediction model of data-driven pixel domain to optimize AVS3 video encoder under the premise of ensuring the subjective quality of vision. Firstly, based on the current large subjective database of JND, the threshold of perceptive perception distortion in the pixel domain is obtained according to the human eye characteristics. Secondly, the pixel domain JND prediction model based on deep neural network is constructed. Finally, the residual filter established by the predicted pixel domain JND threshold is used to eliminate perceptual redundancy in AVS3 and reduce coding bitrate. The experimental results show that compared with the AVS3 standard test model HPM5.0, the proposed JND model can save up to 21.52% bitrate and an average of 5.11% bitrate.
Feng Yan , Liu Shuai , Wang Chuanxu
2021, 36(1):63-75. DOI: 10.16337/j.1004-9037.2021.01.006
Abstract:The optical flow information is the motion representation of the image pixels. The existing optical flow estimation methods are difficult to ensure high precision in dealing with complex situations, such as occlusion, large displacement and detailed presentation. In order to overcome these difficult problems, a new convolutional neural network is proposed. The model improves the estimation accuracy by improving the convolution form and feature fusion. Firstly, the deformable convolution with stronger adjustment and optimization ability is added to extract the spatial features such as large displacement and details of adjacent frame images. Then, the feature correlation layer is generated by using the attention-based mechanism to carry out the feature fusion of the two adjacent frames, which is used as the input of the decoding part composed of deconvolution and upsampling and aims to overcome the disadvantage of low accuracy for the traditional methods of estimating optical flow based on feature matching. And finally the above estimated optical flow is optimized with a set of network stack. Experiments show that the proposed network model performs better than existing methods in dealing with occlusion, large displacement and detail presentation.
PU Tian , ZHANG Ziye , PENG Zhenming
2021, 36(1):76-84. DOI: 10.16337/j.1004-9037.2021.01.007
Abstract:Some existing enhancement methods enhance uneven lighting images by bringing out the details in the dark areas, but easily result in over-enhancement. In this paper, an extended form of Retinex is proposed from a new viewpoint and applied to uneven lighting image enhancement. Taking the center-surround Retinex output as the perceived reflectance, the proposed algorithm decomposes an image into a perceived reflectance image and a perceived illumination one. Image enhancement can be achieved by adjusting the perceived illumination image and combining back both images. Experimental comparisons with some state-of the-art methods show that the proposed method has good performance on enhancing brightness and details, and improving the image quality for uneven lighting images.
YANG Yunlu , ZHOU Yajian , NING Hua
2021, 36(1):85-94. DOI: 10.16337/j.1004-9037.2021.01.008
Abstract:Aiming at the privacy leakage problem in the data mining model and the opacity of existing privacy protection technologies, a more universal image differential privacy-generative adversarial network (IDP-GAN) combining differential privacy with the image generation model—generative adversarial network (GAN) is proposed. IDP-GAN uses the Laplace implementation mechanism to reasonably allocate Laplace noise to the input features of the affine transformation layer and the polynomial approximation coefficients of the loss function of the output layer. While achieving differential privacy protection, IDP-GAN effectively reduces the consumption of privacy budget during training. Experiments on the standard data sets MNIST and CelebA verify that IDP-GAN can generate higher quality image data. In addition, membership inference attacks experiments prove that IDP-GAN has better ability to resist attacks.
2021, 36(1):95-102. DOI: 10.16337/j.1004-9037.2021.01.009
Abstract:Aiming at the problem of low detection accuracy of occluded faces in real scenes, an occluded face detection method based on convergent convolutional neural network (CNN) and attention enhancement network was proposed. First, on the multi-layer original feature map of the main network, the response value of the visible part of the face in the original feature map is enhanced by supervised learning. Then, multiple enhanced feature maps are combined into an additional enhanced network and set in converge with the main network to accelerate the detection of multi-scale occlusion faces. Finally, supervised information is distributed to feature maps of various sizes for supervised learning, and loss functions based on anchor frame sizes are set for feature maps of different sizes. Experimental results on WIDER FACE and MAFA datasets show that the detection accuracy of the proposed method is higher than the current mainstream face detection methods.
2021, 36(1):103-113. DOI: 10.16337/j.1004-9037.2021.01.001
Abstract:As security issues have received widespread attention, the research on person re-identification has become more realistic, which is gradually being applied to video surveillance, intelligent security and other fields. The increasing of the number of monitoring equipments provides massive data support for research, but manual labeling or detector recognition inevitably introduces noisy labels. When training large-scale deep neural networks, as the amount of data increases, the noise of the label brings non-negligible damage to model training. In order to solve the noise label problem of person re-identification, this paper combines noise and non-noise data to train differentiated features, and proposes a noise-label adaptive pedestrian re-identification method without using additional verification sets, noise ratio, types and other priors. In addition, the method adaptively learns the weight of noise data to further reduce the influence. On the noisy Market1501 and DukeMTMC-reID data sets, the state of the art is severely affected by noise. The proposed method can improve the evaluation index by about 10% on this basis.
ZHANG Wenjing , SONG Peng , CHEN Dongliang , ZHENG Wenming , ZHAO Li
2021, 36(1):113-121. DOI: 10.16337/j.1004-9037.2021.01.011
Abstract:In practical facial expression recognition systems, recognition rates will drop significantly when the data are collected from different scenarios. To tackle this problem, in this paper, we propose a sparse subspace transfer learning for cross-domain facial expression recognition. Firstly, inspired by the idea of sparse reconstruction, we aim to learn a common projection matrix, and impose an
ZONG Jiaping , WU Yan , CHEN Jianqiang , ZHANG Linna , ZHANG Yue , CEN Yigang
2021, 36(1):122-132. DOI: 10.16337/j.1004-9037.2021.01.012
Abstract:Multi-object tracking with weak appearances based on the surveillance video is one important issue for intelligent biology laboratory. However, due to the occlusion and subtle differences among objects, missing detection or false detection is prone to cause tracking failure. In addition, computational cost of deep learning is too high to be realized on embedded platforms. Therefore, a new lightweight multi-objects tracking algorithm is proposed, which uses YOLOv3 as the basic object detection network. A batch normalization layer weight evaluation based layer compression pruning algorithm is proposed to reduce the computational cost of the detection network such that the detection speed can be significantly improved on the embedded platform. Besides, according to the previous tracking results, the missing detection results can be corrected for the current frame, which improves the accuracy of the detection results. Furthermore, the convolutional neural network is employed to extract the object features. Object features and intersection-over-union (IoU) between the candidate frame and the prediction frame are combined for data association. Experimental results show that the proposed lightweight multi-object tracking algorithm achieves a better result compared with others. Especially, the network achieves a high compression rate with only slightly reducing the detection accuracy, which ensures the proposed network can be easily implemented on the embedded platform.
DONG Huiwen , YU Jing , GUO Lening , XIAO Chuangbai
2021, 36(1):133-146. DOI: 10.16337/j.1004-9037.2021.01.013
Abstract:The robust principal component analysis (RPCA) model aims to estimate underlying low-rank and sparse structures from the degraded observation data. Both the rank function and the
TAN Yun , QIN Jiaohua , HUANG Lixia , XIANG Xuyu , LIU Qiang
2021, 36(1):147-155. DOI: 10.16337/j.1004-9037.2021.01.014
Abstract:Coverless steganography can hide the secret information without modifying the carrier by extracting the features of the carrier and mapping with the information sequence. Therefore, it has strong ability of anti-steganalysis, but the existing algorithms are still limited in hiding capacity, and most of them need to construct a library with large number of images. In this paper, a coverless steganography algorithm based on image block chaos scrambling and discrete wavelet transform (DWT) is proposed. The parameters of chaos transform are extracted from secret key, and the cover image is scrambled to generate multiple new images. Then, block DWT transform is used to generate the corresponding hash sequence according to the relationship between the low-frequency DWT coefficients of adjacent image blocks, and corresponding index library is constructed. The cover image and secret key are sent to the receiver to realize the transmission of secret information. Experimental results show that compared with the existing algorithms, the proposed algorithm not only has a great improvement on the capacity and success rate of information hiding, but also has strong robustness. At the same time, the algorithm has a simple architecture and small transmission load, which shows strong practical value for application.
XIE Xingyu , HE Hui , XING Haihua
2021, 36(1):156-163. DOI: 10.16337/j.1004-9037.2021.01.015
Abstract:Hyperspectral remote sensing image has dozens or even hundreds of bands. It is a comprehensive carrier of many kinds of information, including rich radiation, spatial and spectral information and is widely used in the field of terrain mapping. However, the traditional hyperspectral image classification methods mostly focus on the feature extraction of spectral dimension, but ignore the features of spatial dimension, which affects the accuracy of classification. The three-dimensional convolutional neural network (3D-CNN) can convolute data in three dimensions at the same time, so this paper uses 3D-CNN depth network to classify ground objects with hyperspectral images, and proposes an improved algorithm based on 3D-CNN for hyperspectral remote sensing land-cover classification. The method can reuse the extracted spatial and spectral features and give full play to the value of features. In addition, this paper introduces the idea of shallow feature preservation network, and proposes a depth network model of image classification integrating shallow feature preservation, which further improves the accuracy of hyperspectral remote sensing land-cover classification. Experimental results of two commonly used hyperspectral remote sensing image data sets (Indian Pines and Pavia University) under the framework of Tensorflow show that compared with the basic 3D-CNN network, the classification accuracy of the proposed method is improved by nearly 2%.
Song Yanyan , Zhu Qian , Zhu Jianwei , Mu Chenguang
2021, 36(1):164-175. DOI: 10.16337/j.1004-9037.2021.01.016
Abstract:Aiming at the problem that single transillumination energy cannot completely cover all the information for the digital radiography (DR) of complex structures with large thickness ratios, we propose a pulse coupled neural network (PCNN) image fusion algorithm based on regional characteristics and take aero-engine turbine blades as the research objects. First, the multiple incremental tube voltage transillumination sub-images are decomposed into low frequency sub-bands and high frequency sub-bands at multiple scales by the non-sub-sampled contourlet transform (NSCT). Second, the PCNN algorithm is deployed to adjust the connection strength of the directions that hold the most obvious characteristics in the improved spatial frequency of each sub-band. Third, to fulfill the external excitation, the low-frequency sub-bands are calculated by the regional mean square error, while the high-frequency sub-band by the sum-modified Laplacian. Thus the two results are processed through the fire mapping by following the maximum principle. Finally, the fusion images are obtained by the NSCT inverse transformation. The experimental results show that the proposed method can improve fusion results in terms of entropy, standard deviation, average gradient, clarity and spatial frequency, compared with classical fusion algorithms including the methods based on the Laplace pyramid transformation. Our method can extend image-fusion performance by enriching the detailed information of the images and obtaining higher quality.
Liu Ziquan , Fu Hui , Li Yujie , Zhang Guojiang , Hu Chengbo , Zhang Zhaohui
2021, 36(1):176-183. DOI: 10.16337/j.1004-9037.2021.01.017
Abstract:Infrared fault image recognition is an important method to diagnose electrical equipment, but the recognition relies on the manually created bounding boxes over objects. In this paper, in order to improve the detection efficiency, automatic semantic segmentation of infrared images is investigated to recognize one or more electrical equipment objects. The proposed method is based on Mask-RCNN which has demonstrated good performance on instance segmentation. Our main contribution is applying transfer learning to Mask-RCNN, where importance sampling and parameter mapping are conducted to alleviate the data-shortage problem on pixel-level annotating. Experimental results on real-world datasets have shown that the improved version of Mask-RCNN is able to extract the shapes of electrical equipment, even with limited data with pixel-level annotations. The proposed algorithm provides an efficient way to the subsequent steps of fault region detection and classification.
LIU Jianfeng , GUO Jinhong , WANG Guangyu , XU Guowei , FENG Hua
2021, 36(1):184-198. DOI: 10.16337/j.1004-9037.2021.01.018
Abstract:Based on the in-depth study of the recursive construction principle, code tree structure model, mathematical theory foundation and distribution principle of orthogonal variable spreading factor (OVSF) code, a blind recognition algorithm based on fast walsh-hadamard transform for non-cooperative reception of wideband code division multiple access (WCDMA) signals is proposed. By using the inheritance relation and orthogonal property of OVSF code as well as the cyclic shift of data, and combining with the fast walsh-hadamard transform, the proposed algorithm eliminates the ambiguity of de-spreaded data and reduces the computational complexity. Theoretical analysis and experimental results show that the proposed algorithm can perform rapid de-spread and blind recognition of multiple OVSF codes in the downlink channel of the WCDMA system under the conditions of non-cooperation, no prior information and low signal-to-noise ratio, which is very reliable, effective and practical. In the actual measurement, this algorithm costs 8.2 ms to complete the recognition of at least 20 OVSF codes in three frames of data simultaneously, and the recognition accuracy is more than 95%, which has high engineering application value.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links