• Volume 36,Issue 1,2021 Table of Contents
    Select All
    Display Type: |
    • Review of 3D Model Retrieval Algorithms Based on Deep Learning

      2021, 36(1):1-21. DOI: 10.16337/j.1004-9037.2021.01.001

      Abstract (2954) HTML (3759) PDF 4.32 M (3984) Comment (0) Favorites

      Abstract:In recent years, deep learning has been widely used and achieved significant development in various fields. How to utilize deep learning to effectively manage the explosive increasing 3D models becomes a hot topic. This paper introduces the mainstream algorithms for deep learning based 3D model retrieval and analyzes the advantages and disadvantages according to the experimental performance. In terms of the retrieval tasks, 3D model retrieval algorithms are classified into two categories: (1) Model-based 3D model retrieval algorithms require that both query and gallery are 3D models. It can be further divided into voxel-based method, point cloud-based method and view-based method in regard of different representations of 3D models. (2) For 2D image-based cross-domain 3D model retrieval algorithms, the query is 2D image while the gallery is 3D model. It can be classified to 2D real image-based method and 2D sketch-based method. Finally, we analyze and discuss existing issues of deep learning based 3D model retrieval methods, and predict possible promising directions for this research topic.

    • RD-GAN: A High Definition Animation Face Generation Method Combined with Residual Dense Network

      2021, 36(1):22-34. DOI: 10.16337/j.1004-9037.2021.01.002

      Abstract (971) HTML (1431) PDF 7.00 M (2050) Comment (0) Favorites

      Abstract:With the rapid development of the animation industry, the face generation with animation characters becomes a key technology. The existing style transfer technology of painting style cannot obtain satisfactory animation results due to the following characteristics of animation: (1) Animation has a highly simplified and abstract unique style, and (2) animation tends to have clear edges and smooth shadows and relatively simple textures, which poses great challenges to the loss function in existing methods. This paper proposes a novel loss function suitable for animation. In the loss function, the semantic loss is expressed as a regularized form in the high-level feature map of the VGG network to deal with the different styles between real and animation images, and the edge sharpness loss with edge enhancement can preserve the edge sharpness of animation images. Experiments on the four public data sets show that through the proposed loss function, clear and vivid animation images can be generated. Moreover, in the CK+ data set, the recognition rate of the proposed method is increased by 0.43% (Miyazaki Hayao style) and 3.29% (Makoto Shinkai style) compared with the existing method, increased by 0.85% (Miyazaki Hayao style) and 2.42% (Makoto Shinkai style) in the RAF data set, and increased by 0.71% (Miyazaki Hayao style) and 3.14% (Makoto Shinkai style) in the SFEW data set, respectively. The generation effect in the Celeba data set is also demonstrated. The above results show that the proposed method combines the advantages of the deep learning model to make the detection result more accurate.

    • Underwater Image Salient Object Detection Algorithm Based on Image Style Transfer

      2021, 36(1):35-44. DOI: 10.16337/j.1004-9037.2021.01.003

      Abstract (836) HTML (1184) PDF 3.19 M (2032) Comment (0) Favorites

      Abstract:Due to the insufficient underwater salient object detection datasets, the underwater image salient object detection network based on deep learning is prone to overfitting, which affects the performance of the network. In response to the above problems, this paper introduces an image style conversion method and proposes an underwater salient object detection network based on CycleGAN. The network generator is composed of an image style conversion subnetwork and a salient object detection subnetwork. First, the network trains the domain transform subnetwork through unsupervised cascade method, and uses the network to preform style transform on in-air and underwater images to construct training and testing datasets, so as to solve the insufficient problem of underwater salient object detection. Then, it uses in-air and salient object detection datasets after style transformation to train the salient object detection subnetwork to enhance the feature extraction ability of the network. Finally, the output results of the two image styles are fused and optimized to improve the performance of the saliency detection network. Experimental results show that compared with the land and underwater salient object detection network, the mean average error (MAE) and F-measure are relatively increased at least 10.4% and 2.4%, respectively.

    • A Complete Algorithm for Missing Modalities of Medical Data Based on Tensor Decomposition

      2021, 36(1):45-52. DOI: 10.16337/j.1004-9037.2021.01.004

      Abstract (1189) HTML (1129) PDF 1.27 M (2004) Comment (0) Favorites

      Abstract:In the process of multi-modality magnetic resonance image (MRI) data acquisition, there will be different degrees of modality data missing. However, most of the existing completion methods only aim at random missing, which cannot recover strip and block missing. Therefore, this paper proposes a classification framework of smooth tensor completion algorithm based on multi-directional delay embedding. Firstly, the folded tensor is obtained by multi-directional delay embedding of missing data. Then, the completed tensor is obtained by smoothing tensor CP decomposition. Finally, the reverse operation of multi-directional delay embedding is used to obtain the completed data. The algorithm is used to classify high-level and low-level tumors on the BraTS glioma image data set and compared with seven baseline models. The average classification accuracy of the proposed method achieves 91.31%, and experimental results show that the method has better accuracy compared with the traditional complement algorithm.

    • Just Noticeable Distortion Prediction Model of Data-Driven AVS3 Pixel Domain

      2021, 36(1):53-62. DOI: 10.16337/j.1004-9037.2021.01.005

      Abstract (785) HTML (4844) PDF 2.53 M (1643) Comment (0) Favorites

      Abstract:The hybrid coding framework of the third generation audio and video coding standard (AVS3) plays an important role in eliminating redundant information in the video time domain/space domain, but needs to be further improved in eliminating perceptual redundancy and further improving coding performance. This paper proposes a just noticeable distortion (JND) prediction model of data-driven pixel domain to optimize AVS3 video encoder under the premise of ensuring the subjective quality of vision. Firstly, based on the current large subjective database of JND, the threshold of perceptive perception distortion in the pixel domain is obtained according to the human eye characteristics. Secondly, the pixel domain JND prediction model based on deep neural network is constructed. Finally, the residual filter established by the predicted pixel domain JND threshold is used to eliminate perceptual redundancy in AVS3 and reduce coding bitrate. The experimental results show that compared with the AVS3 standard test model HPM5.0, the proposed JND model can save up to 21.52% bitrate and an average of 5.11% bitrate.

    • Optical Flow Estimation Model Based on Convolutional Neural Network

      2021, 36(1):63-75. DOI: 10.16337/j.1004-9037.2021.01.006

      Abstract (1141) HTML (2216) PDF 3.26 M (2257) Comment (0) Favorites

      Abstract:The optical flow information is the motion representation of the image pixels. The existing optical flow estimation methods are difficult to ensure high precision in dealing with complex situations, such as occlusion, large displacement and detailed presentation. In order to overcome these difficult problems, a new convolutional neural network is proposed. The model improves the estimation accuracy by improving the convolution form and feature fusion. Firstly, the deformable convolution with stronger adjustment and optimization ability is added to extract the spatial features such as large displacement and details of adjacent frame images. Then, the feature correlation layer is generated by using the attention-based mechanism to carry out the feature fusion of the two adjacent frames, which is used as the input of the decoding part composed of deconvolution and upsampling and aims to overcome the disadvantage of low accuracy for the traditional methods of estimating optical flow based on feature matching. And finally the above estimated optical flow is optimized with a set of network stack. Experiments show that the proposed network model performs better than existing methods in dealing with occlusion, large displacement and detail presentation.

    • Enhancing Uneven Lighting Images with Naturalness Preserved Retinex Algorithm

      2021, 36(1):76-84. DOI: 10.16337/j.1004-9037.2021.01.007

      Abstract (1009) HTML (1075) PDF 3.09 M (1882) Comment (0) Favorites

      Abstract:Some existing enhancement methods enhance uneven lighting images by bringing out the details in the dark areas, but easily result in over-enhancement. In this paper, an extended form of Retinex is proposed from a new viewpoint and applied to uneven lighting image enhancement. Taking the center-surround Retinex output as the perceived reflectance, the proposed algorithm decomposes an image into a perceived reflectance image and a perceived illumination one. Image enhancement can be achieved by adjusting the perceived illumination image and combining back both images. Experimental comparisons with some state-of the-art methods show that the proposed method has good performance on enhancing brightness and details, and improving the image quality for uneven lighting images.

    • Image Data Mining Method Supporting Differential Privacy

      2021, 36(1):85-94. DOI: 10.16337/j.1004-9037.2021.01.008

      Abstract (1278) HTML (1594) PDF 2.22 M (1930) Comment (0) Favorites

      Abstract:Aiming at the privacy leakage problem in the data mining model and the opacity of existing privacy protection technologies, a more universal image differential privacy-generative adversarial network (IDP-GAN) combining differential privacy with the image generation model—generative adversarial network (GAN) is proposed. IDP-GAN uses the Laplace implementation mechanism to reasonably allocate Laplace noise to the input features of the affine transformation layer and the polynomial approximation coefficients of the loss function of the output layer. While achieving differential privacy protection, IDP-GAN effectively reduces the consumption of privacy budget during training. Experiments on the standard data sets MNIST and CelebA verify that IDP-GAN can generate higher quality image data. In addition, membership inference attacks experiments prove that IDP-GAN has better ability to resist attacks.

    • Occlusion Face Detection Based on Convergent CNN and Attention Enhancement Network

      2021, 36(1):95-102. DOI: 10.16337/j.1004-9037.2021.01.009

      Abstract (986) HTML (1105) PDF 2.12 M (1620) Comment (0) Favorites

      Abstract:Aiming at the problem of low detection accuracy of occluded faces in real scenes, an occluded face detection method based on convergent convolutional neural network (CNN) and attention enhancement network was proposed. First, on the multi-layer original feature map of the main network, the response value of the visible part of the face in the original feature map is enhanced by supervised learning. Then, multiple enhanced feature maps are combined into an additional enhanced network and set in converge with the main network to accelerate the detection of multi-scale occlusion faces. Finally, supervised information is distributed to feature maps of various sizes for supervised learning, and loss functions based on anchor frame sizes are set for feature maps of different sizes. Experimental results on WIDER FACE and MAFA datasets show that the detection accuracy of the proposed method is higher than the current mainstream face detection methods.

    • Noise Label Based Self-adaptive Person Re-identification

      2021, 36(1):103-113. DOI: 10.16337/j.1004-9037.2021.01.001

      Abstract (918) HTML (792) PDF 2.53 M (1700) Comment (0) Favorites

      Abstract:As security issues have received widespread attention, the research on person re-identification has become more realistic, which is gradually being applied to video surveillance, intelligent security and other fields. The increasing of the number of monitoring equipments provides massive data support for research, but manual labeling or detector recognition inevitably introduces noisy labels. When training large-scale deep neural networks, as the amount of data increases, the noise of the label brings non-negligible damage to model training. In order to solve the noise label problem of person re-identification, this paper combines noise and non-noise data to train differentiated features, and proposes a noise-label adaptive pedestrian re-identification method without using additional verification sets, noise ratio, types and other priors. In addition, the method adaptively learns the weight of noise data to further reduce the influence. On the noisy Market1501 and DukeMTMC-reID data sets, the state of the art is severely affected by noise. The proposed method can improve the evaluation index by about 10% on this basis.

    • Cross-Domain Facial Expression Recognition Based on Sparse Subspace Transfer Learning

      2021, 36(1):113-121. DOI: 10.16337/j.1004-9037.2021.01.011

      Abstract (768) HTML (1076) PDF 1.82 M (1818) Comment (0) Favorites

      Abstract:In practical facial expression recognition systems, recognition rates will drop significantly when the data are collected from different scenarios. To tackle this problem, in this paper, we propose a sparse subspace transfer learning for cross-domain facial expression recognition. Firstly, inspired by the idea of sparse reconstruction, we aim to learn a common projection matrix, and impose an L2,1-norm constraint on the reconstruction coefficient matrix. Secondly, we introduce the Laplacian regularization to preserve the local discriminative structure. Lastly, by utilizing the rich label information of source domain, we tend to project the source samples into a subspace guided by the label information. We conduct extensive experiments on three popular facial expression datasets. The results show that our proposed method can outperform several state-of-the-art subspace transfer learning methods in facial expression recognition.

    • Lightweight Tracking Network of Weak Appearance Multi-object for Intelligent Biology Laboratory

      2021, 36(1):122-132. DOI: 10.16337/j.1004-9037.2021.01.012

      Abstract (629) HTML (743) PDF 2.37 M (1699) Comment (0) Favorites

      Abstract:Multi-object tracking with weak appearances based on the surveillance video is one important issue for intelligent biology laboratory. However, due to the occlusion and subtle differences among objects, missing detection or false detection is prone to cause tracking failure. In addition, computational cost of deep learning is too high to be realized on embedded platforms. Therefore, a new lightweight multi-objects tracking algorithm is proposed, which uses YOLOv3 as the basic object detection network. A batch normalization layer weight evaluation based layer compression pruning algorithm is proposed to reduce the computational cost of the detection network such that the detection speed can be significantly improved on the embedded platform. Besides, according to the previous tracking results, the missing detection results can be corrected for the current frame, which improves the accuracy of the detection results. Furthermore, the convolutional neural network is employed to extract the object features. Object features and intersection-over-union (IoU) between the candidate frame and the prediction frame are combined for data association. Experimental results show that the proposed lightweight multi-object tracking algorithm achieves a better result compared with others. Especially, the network achieves a high compression rate with only slightly reducing the detection accuracy, which ensures the proposed network can be easily implemented on the embedded platform.

    • Dual-Weighted L p -Norm RPCA Model and Its Application in Salt-and-Pepper Noise Removal

      2021, 36(1):133-146. DOI: 10.16337/j.1004-9037.2021.01.013

      Abstract (747) HTML (743) PDF 2.97 M (1857) Comment (0) Favorites

      Abstract:The robust principal component analysis (RPCA) model aims to estimate underlying low-rank and sparse structures from the degraded observation data. Both the rank function and the L 0 -norm minimization in the RPCA model are nondeterministic polynominal(NP)-hard problems, which usually are solved by the convex approximation model, so leading to the undesirable over-shrinkage problem. This paper proposes a dual-weighted L p -norm model based RPCA model by combining the weighting method and the L p -norm. We use the weighted S p -norm low-rank term and the weighted L p -norm sparse term to model the low-rank and sparse recovery problems under the RPCA framework, respectively, which provides better approximations to the rank minimization and the L 0 -norm minimization, thus improving the accuracy of the rank estimation and the sparse estimation. To further demonstrate the performance of the proposed model, we apply the dual-weighted L p -norm RPCA model to remove the salt-and-pepper noise by exploiting the image nonlocal self-similarity and combining the low rank of similar image patch matrices and the sparsity of salt-and-pepper noise. Both qualitative and quantitative experiments demonstrate that the proposed model outperforms other state-of-the-art models, and the singular value over-shrinkage analysis experiment also demonstrates that the model can better deal with the over-shrinkage problem of the rank components.

    • Coverless Information Hiding Based on Chaotic Scrambling of Image Blocks and DWT Transform

      2021, 36(1):147-155. DOI: 10.16337/j.1004-9037.2021.01.014

      Abstract (694) HTML (804) PDF 1.31 M (1567) Comment (0) Favorites

      Abstract:Coverless steganography can hide the secret information without modifying the carrier by extracting the features of the carrier and mapping with the information sequence. Therefore, it has strong ability of anti-steganalysis, but the existing algorithms are still limited in hiding capacity, and most of them need to construct a library with large number of images. In this paper, a coverless steganography algorithm based on image block chaos scrambling and discrete wavelet transform (DWT) is proposed. The parameters of chaos transform are extracted from secret key, and the cover image is scrambled to generate multiple new images. Then, block DWT transform is used to generate the corresponding hash sequence according to the relationship between the low-frequency DWT coefficients of adjacent image blocks, and corresponding index library is constructed. The cover image and secret key are sent to the receiver to realize the transmission of secret information. Experimental results show that compared with the existing algorithms, the proposed algorithm not only has a great improvement on the capacity and success rate of information hiding, but also has strong robustness. At the same time, the algorithm has a simple architecture and small transmission load, which shows strong practical value for application.

    • Hyperspectral Remote Sensing Land-Cover Classification Based on Improved 3D-CNN

      2021, 36(1):156-163. DOI: 10.16337/j.1004-9037.2021.01.015

      Abstract (1392) HTML (1001) PDF 2.76 M (1962) Comment (0) Favorites

      Abstract:Hyperspectral remote sensing image has dozens or even hundreds of bands. It is a comprehensive carrier of many kinds of information, including rich radiation, spatial and spectral information and is widely used in the field of terrain mapping. However, the traditional hyperspectral image classification methods mostly focus on the feature extraction of spectral dimension, but ignore the features of spatial dimension, which affects the accuracy of classification. The three-dimensional convolutional neural network (3D-CNN) can convolute data in three dimensions at the same time, so this paper uses 3D-CNN depth network to classify ground objects with hyperspectral images, and proposes an improved algorithm based on 3D-CNN for hyperspectral remote sensing land-cover classification. The method can reuse the extracted spatial and spectral features and give full play to the value of features. In addition, this paper introduces the idea of shallow feature preservation network, and proposes a depth network model of image classification integrating shallow feature preservation, which further improves the accuracy of hyperspectral remote sensing land-cover classification. Experimental results of two commonly used hyperspectral remote sensing image data sets (Indian Pines and Pavia University) under the framework of Tensorflow show that compared with the basic 3D-CNN network, the classification accuracy of the proposed method is improved by nearly 2%.

    • DR Image Fusion of Aero-engine Turbine Blades Based on Regional Feature Pulse Coupled Neural Network

      2021, 36(1):164-175. DOI: 10.16337/j.1004-9037.2021.01.016

      Abstract (862) HTML (804) PDF 1.84 M (1465) Comment (0) Favorites

      Abstract:Aiming at the problem that single transillumination energy cannot completely cover all the information for the digital radiography (DR) of complex structures with large thickness ratios, we propose a pulse coupled neural network (PCNN) image fusion algorithm based on regional characteristics and take aero-engine turbine blades as the research objects. First, the multiple incremental tube voltage transillumination sub-images are decomposed into low frequency sub-bands and high frequency sub-bands at multiple scales by the non-sub-sampled contourlet transform (NSCT). Second, the PCNN algorithm is deployed to adjust the connection strength of the directions that hold the most obvious characteristics in the improved spatial frequency of each sub-band. Third, to fulfill the external excitation, the low-frequency sub-bands are calculated by the regional mean square error, while the high-frequency sub-band by the sum-modified Laplacian. Thus the two results are processed through the fire mapping by following the maximum principle. Finally, the fusion images are obtained by the NSCT inverse transformation. The experimental results show that the proposed method can improve fusion results in terms of entropy, standard deviation, average gradient, clarity and spatial frequency, compared with classical fusion algorithms including the methods based on the Laplace pyramid transformation. Our method can extend image-fusion performance by enriching the detailed information of the images and obtaining higher quality.

    • Electrical Equipment Detection in Infrared Images Based on Transfer Learning of Mask-RCNN

      2021, 36(1):176-183. DOI: 10.16337/j.1004-9037.2021.01.017

      Abstract (803) HTML (1406) PDF 1.94 M (1862) Comment (0) Favorites

      Abstract:Infrared fault image recognition is an important method to diagnose electrical equipment, but the recognition relies on the manually created bounding boxes over objects. In this paper, in order to improve the detection efficiency, automatic semantic segmentation of infrared images is investigated to recognize one or more electrical equipment objects. The proposed method is based on Mask-RCNN which has demonstrated good performance on instance segmentation. Our main contribution is applying transfer learning to Mask-RCNN, where importance sampling and parameter mapping are conducted to alleviate the data-shortage problem on pixel-level annotating. Experimental results on real-world datasets have shown that the improved version of Mask-RCNN is able to extract the shapes of electrical equipment, even with limited data with pixel-level annotations. The proposed algorithm provides an efficient way to the subsequent steps of fault region detection and classification.

    • Blind Recognition Algorithm for OVSF Code Based on Fast Walsh-Hadamard Transform

      2021, 36(1):184-198. DOI: 10.16337/j.1004-9037.2021.01.018

      Abstract (657) HTML (678) PDF 1.36 M (1593) Comment (0) Favorites

      Abstract:Based on the in-depth study of the recursive construction principle, code tree structure model, mathematical theory foundation and distribution principle of orthogonal variable spreading factor (OVSF) code, a blind recognition algorithm based on fast walsh-hadamard transform for non-cooperative reception of wideband code division multiple access (WCDMA) signals is proposed. By using the inheritance relation and orthogonal property of OVSF code as well as the cyclic shift of data, and combining with the fast walsh-hadamard transform, the proposed algorithm eliminates the ambiguity of de-spreaded data and reduces the computational complexity. Theoretical analysis and experimental results show that the proposed algorithm can perform rapid de-spread and blind recognition of multiple OVSF codes in the downlink channel of the WCDMA system under the conditions of non-cooperation, no prior information and low signal-to-noise ratio, which is very reliable, effective and practical. In the actual measurement, this algorithm costs 8.2 ms to complete the recognition of at least 20 OVSF codes in three frames of data simultaneously, and the recognition accuracy is more than 95%, which has high engineering application value.

Quick search
Search term
Search word
From To
Volume retrieval