Special issue

1 Pedestrian Detection and Tracking Algorithm Based on GhostNet and Attention Mechanism

WANG Lihui , YANG Xianzhao , LIU Huikang , HUANG Jingjing

2022, 37(1):108-121. DOI: 10.16337/j.1004-9037.2022.01.009

[Abstract](925) [HTML](1912) [PDF 4.40 M](2360)

Abstract:
Aiming at the problems of low accuracy and slow speed when only relying on traditional object detection and tracking algorithms in complex scenes， a pedestrian detection and tracking algorithm based on GhostNet and attention mechanism is proposed. First， the backbone network of YOLOv3 is replaced with GhostNet to retain the multi-scale prediction part， the Ghost module is used to reduce the parameters and calculations of the deep network model， and the attention mechanism is integrated into the Ghost module to give higher weight to important features. Then， the direct evaluation index GIoU of object detection is introduced to guide the regression task. Finally， the Deep-Sort algorithm is used for tracking. Experiments on public data sets show that： The mean Average precision （mAP） of the improved model reaches 92.53%， and the frame rate is 2.5 times that of the YOLOv3 model； The tracking accuracy of the proposed algorithm is better than that before the improvement and that of other algorithms； The algorithm can track multi-object pedestrians in complex scenes accurately and effectively， and has strong robustness.

2 Power Target Detection in Aerial Images Based on SSD Deep Neural Network

SHI Xin , Hua Chenbing , ZHANG Kai , WANG Caijian , WANG Shiyong

2022, 37(1):207-216. DOI: 10.16337/j.1004-9037.2022.01.018

[Abstract](892) [HTML](1112) [PDF 2.64 M](2434)

Abstract:
To improve the intelligent design of the rural power distribution network， this paper proposes to identify the typical power targets that affect the design of the distribution network in the aerial images using deep neural networks. Firstly， we use UAV to obtain high spatial resolution aerial images of the distribution network planning area， and construct a data set containing 11 categories and 32 118 typical power targets. Then， through the practical comparison of Faster-RCNN， YOLO and single shot multibox detector （SSD） methods， SSD is selected to detect and identify typical power targets. Finally， feasible areas of distribution network pole planning are obtained. Experimental results show that compared with Faster-RCNN and YOLO， SSD can effectively detect and identify typical power targets such as the substation， distribution room and box transformer， and the recognition accuracy reaches 68.5%， which meets the practical requirements. The proposed method provides the technical support for the power design， reduces the labor cost and improves the efficiency of distribution network design.

3 JPEG Image Digital Watermarking System Based on FPGA

CHEN Xin , SHI Dong , ZHANG Ying

2022, 37(1):240-246. DOI: 10.16337/j.1004-9037.2022.01.021

[Abstract](1014) [HTML](832) [PDF 1.74 M](2073)

Abstract:
This paper designs a JPEG compressed domain digital watermarking system based on FPGA， realizing the real-time embedding of watermark information in JPEG image. After watermarking information is preprocessed by binary and Arnold transform， watermark is embedded into the quantized DCT coefficients with improved LSB embedding algorithm. Then， to complete the JPEG compressed domain digital watermark embedding， the modified DCT coefficients are processed by entropy coding process， and JPEG encoding file is generated. Finally， the design is implemented and tested by the joint system of FPGA develop board and host computer. The results show that the proposed algorithm has a good performance of invisible effect and robustness， and a high throughput.

4 Overview of Non-Line-of-Sight Imaging Technology Based on Transient Images

LIANG Yun , SONG Boyan

2022, 37(1):21-34. DOI: 10.16337/j.1004-9037.2022.01.002

[Abstract](1190) [HTML](2220) [PDF 3.26 M](3061)

Abstract:
Transient image is a fast image sequence in which a scene responds to light pulses. By capturing the time dimension information， the transient image realizes the use of the scene information contained in the time domain， and the non-line-of-sight imaging is the most typical application of transient images in the field of scene analysis. It is a technology for imaging objects or scenes outside the line of sight， and has emerged at home and abroad in recent years. According to different imaging mechanisms， this paper classifies different imaging methods of transient images， and compares a variety of non-line-of-sight imaging algorithms based on transient images according to different algorithm principles or implementation effects. Finally， the challenges of non-line-of-sight imaging technology based on transient images are summarized， and the future development direction is prospected.

5 Change Detection of Remote Sensing Image Based on Siamese Multi-scale Attention Network and Its Anti-noise Ability Research

DU Junhan , LAI Jian , WANG Xue , TAN Kun

2022, 37(1):35-48. DOI: 10.16337/j.1004-9037.2022.01.003

[Abstract](820) [HTML](1720) [PDF 4.94 M](2347)

Abstract:
Remote sensing image change detection has resulted in great breakthroughs in the field of land cover observations. However， the noise of remote sensing image will impact the performance of the change detection methods. To improve the accuracy of change detection， a change detection method based on the Siamese multi-scale attention network （SMA-Net） has been proposed. Firstly， we combine atrous convolutional layers with different dilated rates and spatial attention module to get the multi-scale feature extraction module. Then， the feature maps on the same layer are subtracted to get the difference feature maps and the channel attention mechanism is used to enhance the feature extraction effect. Finally， the change detection result is output by fully connection layers. The proposed method is compared with other change detection methods on the original remote sensing image data with or without noise data. The experimental result shows that the change detection method which uses the spectral information of a single pixel as input， like support vector machine method， is susceptible to the image noise， and the convolutional neural network （CNN） based method is much less susceptible to the image noise. The proposed SMA-Net outperforms other methods on the accuracy and is less susceptible to the image noise.

6 Multi-size Occlusion Face Detection Based on Hierarchical Attention Enhancement Network

WANG Linge , JIANG Baojun , PAN Tiejun

2022, 37(1):73-81. DOI: 10.16337/j.1004-9037.2022.01.006

[Abstract](801) [HTML](1708) [PDF 3.28 M](1938)

Abstract:
Based on the single shot multibox detector （SSD） single-stage face detection model， this paper proposes a multi-size occlusion face detection method based on a hierarchical attention enhancement network to solve the problem of poor accuracy of face detection under complex partial occlusion. Firstly， on the multi-layer original feature map of SSD basic network， the attention enhancement mechanism is introduced to improve the response value of the visible region of the face. Then， different anchor sizes are designed for different enhancement feature layers to improve the hierarchical recognition effect of multi-scale occluded face. In training， the attention loss function， the classification loss function and the regression loss function are fused into a multi-task loss function to jointly optimize the network parameters. Experiments on the WIDER FACE dataset and the MAFA occlusion face dataset show that the detection accuracy and timeliness of the method are better than those of the current mainstream occlusion face detection methods.

7 Defogging Algorithm Based on Power Exponent Stretching

LI Zhongguo , WU Haochen , FU Qigao , XI Qian , WU Jinkun

2022, 37(1):62-72. DOI: 10.16337/j.1004-9037.2022.01.005

[Abstract](638) [HTML](1584) [PDF 2.75 M](1861)

Abstract:
After comparing three channels of RGB（Red-green-blue） and three channels of HSV（Hue-saturation-value） in the same scene between clear and fog pictures， a haze removal algorithm based on power exponent stretching is proposed. Firstly， the image is transformed from RGB to HSV space. Then the saturation component and the brightness component are exponentially stretched with power of 1—3，and then they are both adjusted to their suitable range. After stretching transformation of saturation and brightness， the image is transformed from HSV to RGB space to generate enhanced defogging images. Taking the mean value of saturation， brightness index， information entropy and contrast as defog evaluation indexes， the optimal stretching power index combination is determined. The optimal power index combination is used to complete the defogging process. At the same time， it is decided whether to find the optimal power index again according to the change of image average saturation or the length of time interval. Finally， the fog removal algorithm is implemented by multi-process programming with the Python software. When the image resolution is 400 pixel×300 pixel， it takes 5.077—6.160 s to optimize the power index parameters on the raspberry PI. For one frame defogging， the first frame takes longer time of 0.308 s. The other frames take 0.077—0.168 s to removal haze for a single frame.

8 Methane Premixed Flame Equivalence Ratio Measurement Based on Feature Engineering and Support Vector Machine

CHEN Changyou , FU Yuwen , TU Peichi , SHU Wen , YANG Jiansheng

2022, 37(1):194-206. DOI: 10.16337/j.1004-9037.2022.01.017

[Abstract](855) [HTML](1757) [PDF 1.35 M](1930)

Abstract:
Flame equivalence ratio measurement using flame color modeling method， is an emerging research direction in the combustion diagnosis technology. At present， the modeling methods mainly use the blue/green color features （B/G） in the RGB（Red-green-blue）model as the modeling input， however， the color equivalence ratio modeling by single color ratio fitting has large uncertainty and measurement errors. Therefore， this paper proposes to use the multi-color feature parameters under different-color models as the modeling inputs. Firstly， the digital flame color distribution （DFCD） technology is used to process the methane premixed flame image and obtain the region of interest （RoI） of flame images. Secondly， the flame color feature variables are comprehensively analyzed， and the multi-color features under different color models are designed and extracted， which are 36 color features. Then， the Spearman rank correlation analysis and random forest （RF） algorithm are used to screen out the deeper color features， and 16 dimensional high-quality features are selected. At last， the optimal support vector machine （SVM） parameters are selected using the grid search method （GSM）. Furthermore， the equivalence ratio measurement model of premixed methane flame is trained by SVM using the feature subset constructed. The algorithm is compared with the traditional BP neural network and the extreme learning machine （ELM） algorithm. Experimental results show that the algorithm has better regression prediction effect， and the mean square error （MSE） decreases to 0.023.

9 Comparative Analysis of EEG Time-Frequency Features of Motor Execution and Motor Imagination Under Visual Guidance

WU Biao , QIN Bing , WU Xin , ZHOU Lu , QIAN Zhiyu , LI Weitao , GAO Fan , ZHU Qiaoqiao

2022, 37(1):164-172. DOI: 10.16337/j.1004-9037.2022.01.014

[Abstract](1145) [HTML](1876) [PDF 2.03 M](2400)

Abstract:
The technology of brain-computer interface （BCI） based on motor imagery （MI） has developed rapidly in the past few decades and been widely used in various fields. To compare the brain electrical activity difference between motor execution （ME） and MI， a method based on the time-frequency domain analysis of electroencephalogram （EEG） is proposed. The visually induced upper limb ME and MI control experiments are conducted and the EEG signals of ten healthy subjects are collected and preprocessed. Then the signals are decomposed and converted into eigenvalues of each band through the time-frequency analysis method. Finally， the power values of each band of ME and MI are analyzed and the power differences between each band of ME and MI are computed. The results show that the alpha wave is dominant wave in the process of MI while the delta wave is dominant wave in the process of ME. Compared with MI， the alpha wave during ME shows a downward trend and the delta wave shows an upward trend. The results of this study show that there is significant difference in EEG between ME and MI， which is important for improving the real-time and universal performance of MI based BCI systems.

10 Dual-Path Siamese Network Visual Tracking Method with Attention Mechanism

XIE Jiang , ZHU Yan , SHEN Tao , ZENG Kai , LIU Yingli

2022, 37(1):94-107. DOI: 10.16337/j.1004-9037.2022.01.008

[Abstract](1057) [HTML](2038) [PDF 4.01 M](2098)

Abstract:
Traditional visual tracking methods based on the Siamese network extract pairs of frames from a large number of videos and train them on the offline independently at the stagey of training. They lack the update of the model features and neglect the background information， so the tracking accuracy is a little bit low in the complex environments such as background clutter. In response to the above problems， this paper proposes a dual-path Siamese network visual tracking method with the attention mechanism. The method mainly includes the feature extractor part and the feature fusion part. In the feature extractor part， the residual network is improved and a dual-path network model is designed. By combining the reusability of the residual networks to features of the former layer and the extraction of new features from the dense networks， these two networks are spliced for the feature extraction. At the same time， this paper uses the dilated convolution to replace the traditional convolution， which improves the resolution on the condition of maintaining a certain receptive field. This dual-path feature extraction method can implicitly update the model features， so that obtain the more accurate image feature information. Moreover， the attention mechanism is introduced to the feature fusion part， which can distribute the different weights to the different parts of the feature maps. In the channel domain， the method screens the valuable target image information and enhances the interdependence between the channels. In the spatial domain， it also pays more attention to the local important information and learns more rich contextual connections， which effectively improves the accuracy of object tracking. To confirm the effectiveness of the method， some experiments are conducted on the OTB100 and VOT2016 datasets. We use precision， success rate and expect average overlap-rate as the evaluation criterion， and their values are 0.868， 0.641 and 0.350 respectively on the two datasets， which increase by 5.1%， 2.0% and 0.9% compared with those of the benchmark model. Experimental results show that the proposed method makes full use of the advantages of different networks， and while ensuring the accuracy of the model， it can adapt to the deformation of the target well， reduce the interference between the similar objects， and achieve more stable tracking effect.

11 Medical Image Synthesis Based on Optimized Cycle-Generative Adversarial Networks

CAO Guogang , LIU Shunkun , MAO Hongdong , ZHANG Shu , CHEN Ying , DAI Cuixia

2022, 37(1):155-163. DOI: 10.16337/j.1004-9037.2022.01.013

[Abstract](1066) [HTML](1654) [PDF 1.56 M](2304)

Abstract:
The radiation treatment plan system needs to calculate the dose distribution accurately based on CT images， but sometimes clinical MR images can only be obtained. Image synthesis effectively creates new modality images from another modality， which enhances image information. This paper presents a new method of synthesizing high precision and definition of CT images from MR images. To synthesize clearly pseudo CT images， an improved cycle-consistent generative adversarial network （CycleGAN） with densely connected convolutional network （DenseNet） is proposed. Avoiding the disappearance of input information and the vanishing of gradient information， the improved network can synthesize more credible CT images. Compared with the original method， the proposed method is reduced by 5.9% on mean absolute error， increased by 1.1% on structural similarity and increased by 4.4% on peak signal to ratio， which is trained and tested on the dataset of 18 patients. And compared with the deep convolutional neural network and the atlas-based method， the improved CycleGAN is reduced by 0.065% and 0.55% on relative error， respectively. The proposed method can synthesize more vivid CT images owing to the advantages of deep learning model， which better meets the requirements of dose calculation in radiation treatment planning system.

12 Person Re-identification Based on Hard Negative Sample Confusion to Enhance Robustness of Features

Hao Ling , Duan Jizhong , Pang Jian

2022, 37(1):122-133. DOI: 10.16337/j.1004-9037.2022.01.010

[Abstract](661) [HTML](1570) [PDF 11.46 M](2443)

Abstract:
With the rise of deep learning， person re-identification has gradually become a hot topic in the computer vision field. It performs cross-camera retrieval through a given query image， and finds the images that match the query identity. However， due to the factors such as background and illumination under different cameras， there are a large number of hard negative samples in the collected pedestrian datasets， and the performance of the model trained using these samples is bad and lacks robustness. Therefore， in order to improve the ability of the model to discriminate such negative samples， a novel method of synthesizing images with hard negative samples information through confusion factors is designed. For each input batch images， the similarity measurement is used to find the hard negative sample corresponding to each image， the new generated images with the clues of negative samples are synthesized through the confusion factor， and the model is prompted to mine the negative samples information in a supervised manner thus improving the model robustness. A large number of comparative experiments show that the proposed method achieves high performance on the mainstream datasets. The ablation study proves the effectiveness of the proposed method.

13 Dam Crack Detection Method Based on Universal Target Detector

ZHAO Fan , LI Linyun , WEI Renjie , ZHANG Zhiwei

2022, 37(2):405-414. DOI: 10.16337/j.1004-9037.2022.02.013

[Abstract](933) [HTML](1759) [PDF 4.23 M](3184)

Abstract:
Aiming at the problem that the existing dam disease detection methods can only roughly locate the area where the crack is located， a dam crack extraction method based on a universal target detector is proposed. Firstly， a two-target detector is designed to detect the crack area and the water stain area as two independent targets on the image at the same time. Secondly， the geometric position relationship between the crack area and the water stain area associated with the same crack is established. Finally， the upper boundary of the water stain frame contained in the crack frame is uniformly sampled， and the curve fitting is performed on the sampling points to obtain the crack curve. The experimental results show that the proposed algorithm can not only accurately detect the crack frame and water stain frame， but also fit the crack curve completely， and it has been effectively verified in the detection of dam diseases with millimeter-level width.

14 Natural Scene Text Detection Based on Local and Global Dual-feature Fusion

LI Yunhong , YAN Junhong , HU Lei

2022, 37(2):415-425. DOI: 10.16337/j.1004-9037.2022.02.014

[Abstract](796) [HTML](1292) [PDF 1.89 M](1954)

Abstract:
The shape， direction and category of text in natural scenes are varied， and scene text detection is still a challenge. In order to better separate text from non-text and accurately locate the text area in natural scene image， this paper proposes a text detection network that fuses local and global features. Multi-scale global feature fusion is realized through jump connection， and the constant residual block is improved to realize local fine-grained feature fusion， thereby reducing the loss of feature information and enhancing the strength of feature extraction in text regions. The combination of polygon offset text field and text edge information is used to local text region accurately. In order to evaluate the effectiveness of the method in this paper， multiple sets of comparative experiments are conducted on the existing classic data sets ICDAR2015 and CTW1500. The experimental results show that the method has better performance in text detection in complex scenes.

15 Survey on New Progresses of Deep Learning Based Computer Vision

LU Hongtao , LUO Mukun

2022, 37(2):247-278. DOI: 10.16337/j.1004-9037.2022.02.001

[Abstract](3777) [HTML](4404) [PDF 12.48 M](5273)

Abstract:
Deep learning has recently achieved great breakthroughs in some fields of computer vision. Various new deep learning methods and deep neural network models were proposed， and their performance was constantly updated. This paper makes a survey on the new progresses of applications of deep learning on computer vision since 2016 with emphases on some typical networks and models. We first investigate the mainstream deep neural network models for image classification including standard models and light-weight models. Then， we introduce some main methods and models for different computer vision fields including object detection， image segmentation and image super-resolution. Finally， we summarize deep neural network architecture searching methods.

16 Image Interpolation-Based Few-Shot Learning of Handwritten Digit Recognition

SONG Wei , XIE Jianping , GAO Qian , XIE Liangxu , XU Xiaojun

2022, 37(2):298-307. DOI: 10.16337/j.1004-9037.2022.02.004

[Abstract](1018) [HTML](1198) [PDF 1.80 M](1902)

Abstract:
The high performance of artificial intelligence （AI） is usually dependent on large and sufficient data to train parameters. How to improve the predictive performance in the case of insufficient data， i.e.， few-shot learning， is one of the important research subjects in the AI field. An image interpolation-based few-shot learning strategy is proposed， whose feasibility is verified in the task of handwritten digit image recognition. The few-shot learning performance of dense neural network and convolutional neural network in MNIST and USPS handwritten digit image recognition is systematically studied. The calculation results show that the image interpolation-based data enhancement method can evidently promote the characteristics extraction ability and learning efficiency of neural network in small sample data. Moreover， selecting the appropriate scaling coefficient of image interpolation can further optimize the few-shot learning performance of neural network.

17 Dynamic Visual SLAM Based on Unified Geometric-Semantic Constraints

Shen Yehu , Chen Jiahao , Li Xing , Jiang Quansheng , Xie Ou , Niu Xuemei , Zhu Qixin

2022, 37(3):597-608. DOI: 10.16337/j.1004-9037.2022.03.010

[Abstract](1580) [HTML](1191) [PDF 1.53 M](8851)

Abstract:
Traditional visual simultaneous localization and mapping （SLAM） algorithms rely on the scene rigidity assumption. However， when dynamic objects exist in the scene， the stability of the SLAM system will be affected and the accuracy of pose estimation will be reduced. Currently， most of the existing methods apply probability strategies and geometric constraints to reduce the impact caused by a small number of dynamic objects. But when the number of dynamic objects in the scene is high， these methods will fail. In order to deal with this problem， a novel algorithm is proposed in this paper. It combines the dynamic visual SLAM algorithm with the multi-target tracking algorithm. Firstly， a semantic instance segmentation network together with geometric constraints is introduced to assist the visual SLAM module to effectively separate the static feature points from the dynamic ones， and at the same time， it can also achieve the better multi-target tracking performance. Furthermore， the trajectory and velocity information of the moving objects can also be estimated， which can provide decision information for autonomous robots navigation. The experimental results on KITTI dataset show that the localization accuracy of the proposed algorithm is improved by about 28% compared with ORB-SLAM2 algorithm in dynamic environments.

18 Virtual Try-on Network for Graduation Photo Generation

SHENG Peizhuo , LI Tingyu , LI Tianbao , SONG Dan , LIU An’an

2022, 37(5):1145-1156. DOI: 10.16337/j.1004-9037.2022.05.019

[Abstract](993) [HTML](839) [PDF 2.98 M](1920)

Abstract:
In order to solve the problem that the existing virtual fitting methods cannot be applied to academic uniforms， a virtual try-on method oriented to the generation of academic uniforms is proposed. The method first trains the image-based virtual try-on network composed of the clothing deformation module and the virtual try-on module， and then generates try-on results through the trained network of the portrait and the academic dress image. Then， the generated academic dress try-on results are synthesized with the specific background through the background fusion module. During the experiment， this paper constructs a new dataset of academic dress and long skirt. From the experimental results， the algorithm proposed in this paper can greatly reduce the impact of the clothes in the original portrait on the academic dress try-on， and can better complete the academic dress try-on work and generate more ideal fitting results.

19 A Survey on Application of Deep Learning in Photoacoustic Image Reconstruction from Limited-View Sparse Data

SUN Zheng , HOU Yingsa

2022, 37(5):971-983. DOI: 10.16337/j.1004-9037.2022.05.001

[Abstract](1507) [HTML](1052) [PDF 4.04 M](3982)

Abstract:
Photoacoustic imaging （PAI） is a newly emerging hybrid functional imaging modality. High-quality image reconstruction is the key to improve the imaging accuracy. Incomplete photoacoustic（PA） measurements usually lead to the reduction in the imaging depth and the quality of images which are rendered by using conventional reconstruction techniques such as back projection （BP）， time reversal （TR）， and delay and sum （DAS）. The iterative algorithms are capable of solving this issue to a certain extent at the cost of high computational burden and a properly selected regularization tool. In recent years， deep learning （DL） has exhibited promising performances in the field of medical imaging. It has also shown great potential in reconstructing images with high quality and high efficiency. This paper provides a survey on PA image reconstruction from sparely sampled data in a limited view based on DL. The current methods are summarized and classified， and their advantages and limits are also discussed.

20 A Privacy-Preserving Medical Image Classification Scheme Based on Gray Code Scrambling and Block Chaotic Scrambling

Chen Guoming , Yuan Zeduo , Long Shun , Mai Shutao

2022, 37(5):984-996. DOI: 10.16337/j.1004-9037.2022.05.004

[Abstract](965) [HTML](793) [PDF 4.70 M](1984)

Abstract:
This paper proposes a medical image encryption scheme based on Gray code scrambling and block chaotic scrambling Gray+block chaotic scrambling optimized for medical image encryption（GBCS）， which is applied to privacy protection classification. First， the image is sliced by bit-planes.Then， different bit-planes of images are scrambled by the Gray code and then divided into blocks， and chaotic encryption is carried out on these blocks. Finally， the encrypted images are classified by deep learning network. We quantitatively analyze the privacy protection and classification performance of GBCS through cross-validation simulation on public breast cancer and glaucoma datasets， and perform a safety analysis of the method by histogram， information entropy， and anti-attack ability. The experimental results prove the effectiveness of our method. The performance gap of medical images before and after GBCS encryption are within an acceptable range. The proposed scheme can better balance the contradiction between performance and privacy protection requirements， and effectively resist the attack of adversarial samples.

21 Blind Ultrasound Image Deblurring via Quadratic Sparse Extreme Channel Prior

MA Qian , HUANG Chengquan , ZHENG Zehong

2022, 37(5):1092-1100. DOI: 10.16337/j.1004-9037.2022.05.014

[Abstract](719) [HTML](523) [PDF 1.90 M](1933)

Abstract:
The blurry ultrasound image is not sparse enough after the extreme channel prior deblurring， resulting in the extreme channel sparse constraint may not exist. Therefore， in order to make full use of the image channel information， a blind ultrasound image deblurring algorithm via quadratic sparse extreme channel prior is proposed by enhancing the sparsity of the obtained ultrasound image after deblurring. First， relevant theoretical proofs and experiments are presented to illustrate the feasibility of quadratic sparse extreme channel priors for constrained blurry ultrasound images. Then， making full use of the prior information of the dark and bright channels， the half-quadratic splitting method is used to estimate the intermediate image and the blur kernel. Finally， the Fourier transform is used to obtain the final clear image and blur kernel. Experimental results on the ultrasound image set show that the feasibility and superiority of the proposed algorithm compared other current ultrasound image deblurring methods.

22 A Polarization Image Fusion Method of Visible Light in Water Navigation Scene

JIANG Yang , XIAO Changshi , WEN Yuanqiao , ZHAN Wenqiang , CHEN Qianqian

2022, 37(6):1376-1390. DOI: 10.16337/j.1004-9037.2022.06.018

[Abstract](660) [HTML](552) [PDF 3.68 M](2060)

Abstract:
In order to improve the visual perception ability of unmanned surface vehicle（USV） in harsh navigation scene， a polarization image fusion method of visible light for water navigation scene is proposed based on hue， saturation， value（HSV） color space. The fusion rules for different regions are formulated in accordance with the polarization characteristics of the water navigation scene. And based on the HSV color space， the color information of the original scene is fused， which is tested that realizes the semantic segmentation of the harsh navigation scene image. The most striking result is that the pixel accuracy（PA） value in the flare scene is 0.768 2. And the experimental results indicate that the proposed method can enhance image contrast， highlight edge contour information， and stably obtain feature information with strong contrast as well as better target characteristics in harsh navigation scene， which improves the USV’s performance in harsh navigation scene to a certain extent.

23 Improved Faster RCNN Algorithm for Moyamoya Disease Detection

XU Jiawei , WU Jie , LEI Yu , GU Yuxiang

2022, 37(6):1391-1400. DOI: 10.16337/j.1004-9037.2022.06.019

[Abstract](773) [HTML](562) [PDF 1.34 M](1597)

Abstract:
To prevent complication caused by moyamoya disease from threatening patients’ lives， timely and effective diagnosis of moyamoya disease is needed. An improved Faster RCNN algorithm for moyamoya disease detection is presented. Firstly， the digital subtraction angiography （DSA） image of internal carotid artery is extracted and enhanced. The ratio of training set， verification set and test set is 6∶2∶2. ResNet101 network is used as the feature extraction network to avoid blurring or loss of vascular features in the process of convolution and pooling. Combined with region proposal network （RPN）， the location of moyamoya disease focus is located. Then replace ROI pooling in Faster RCNN model with ROI Align for feature mapping to avoid the error impact caused by quantization. The average precision （AP） is used as the evaluation index of the detection performance of the algorithm. The AP of normal samples and moyamoya disease samples are 99.23% and 89.39%， respectively. Experimental results show that the proposed method can realize the rapid and effective detection of moyamoya disease. It can accurately detect the location of moyamoya disease lesions in the complex vascular network， and provide some technical support for the auxiliary diagnosis of moyamoya disease.

24 Blind Image Denoising and Blurring by Total Variational Extreme Channels Prior

HU Xue , HUANG Chengquan , FENG Run , ZHOU Lihua , ZHENG Lan

2022, 37(3):643-656. DOI: 10.16337/j.1004-9037.2022.03.014

[Abstract](842) [HTML](649) [PDF 4.22 M](2314)

Abstract:
Image prior is the key to solving ill-posed problems in image restoration. Since the extreme channels prior deblurring algorithm easily produces ringing artifacts and is unable to suppress noise when the image has significant noise，we take advantage of the total variation based method that can remove noise while preserving edge features， and propose an effective blind image denoising and deblurring model based on total variation before the extreme channels prior. First of all， we introduce the total variational model in the dark channel and the bright channel to protect the edge of the image and eliminating noise or ringing artifacts. Second， the half quadratic splitting technique is used to solve the non-convex problem of the model and estimate the clear image. Finally， the blur kernel of the image is estimated by the iterative multi-scale blind deconvolution. Experimental results show that the proposed model can effectively protect the edge details of the image and eliminate the ringing artifacts while suppressing the noise. Compared with the representative methods in recent years， the robustness， subjective visual effects and objective evaluation indexes of the model are significantly improved.

25 Dual-Attention Network for Acute Pancreatitis Diagnosis with CT Images

Zhang Jinyi , Wan Peng , Sun Liang , Zhang Daoqiang

2022, 37(1):147-154. DOI: 10.16337/j.1004-9037.2022.01.012

[Abstract](887) [HTML](1702) [PDF 2.27 M](2430)

Abstract:
Acute pancreatitis （AP） is one of the most common digestive disease， while the analysis based on medical images of AP still depends on simple manual features with low efficiency and accuracy， which is not commensurate with AP’s harmfulness. Due to the anatomical variation of pancreas and complications of AP， AP has complex imaging manifestations and large appearance pattern variation of lesions that exist among patients and lesion kinds. It is challenging for diagnosis of acute pancreatitis based on CT images. To address these issues， we propose a dual-attention network for acute pancreatitis diagnosis. Specifically， the dual-attention network utilizes the global feature to generate local attention feature for each local feature on different stages， and final classification is facilitated by the fusion of multi-scale attention features focusing on lesions of different scales. Meanwhile， channel-domain attention is used to produce attention features based on the dependencies between each channel to improve the model’s feature representation ability. We evaluate the proposed method on the collected real acute pancreatitis dataset. Results show that the proposed network achieve superior performance in acute pancreatitis diagnosis compared with several competing methods， with the sensitivity improved by 3.4%. And the improvement of area under the curve （AUC） the proposed network brings to ResNet is 2.7% higher than other attention model such as SENet.

For Authors

Special issue