Volume 40,Issue 5,2025 Table of Contents

Research Progress on Multimodal Continual Learning Methods

ZHANG Wei , QIAN Longyue , ZHANG Lin , LI Teng

2025, 40(5):1122-1138. DOI: 10.16337/j.1004-9037.2025.05.002

Abstract (381) HTML (347) PDF 75.38 K (555) Comment (0) Favorites

Abstract:Multimodal continual learning （MMCL）， as a significant research direction in the fields of machine learning and artificial intelligence， aims to achieve continuous knowledge accumulation and task adaptation through the integration of multiple modal data （such as images， text， audio， etc.）. Compared with traditional single-modal learning methods， MMCL not only enables parallel processing of multi-source heterogeneous data， but also effectively retains existing knowledge while adapting to new task requirements， demonstrating immense application potential in intelligent systems. This paper provides a systematic review of multimodal continual learning. Firstly， the fundamental theoretical framework of MMCL is elaborated from three dimensions： Basic concepts， evaluation systems， and classical single-modal continual learning methods. Secondly， the advantages and challenges of MMCL in practical applications are thoroughly analyzed： Despite its significant advantages in multimodal information fusion， it still faces critical challenges such as modal imbalance and heterogeneous fusion， which not only constrain the performance of current methods but also indicate future research directions. Based on this， the paper then comprehensively reviews the research status and latest advancements in MMCL methods from four main aspects： Replay-based， regularization-based， parameter isolation-based， and large model-based approaches. Finally， a forward-looking perspective on the future development trends of MMCL is presented.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1

Recognition Algorithm for Multi-agent Collaborative Open-Domain Multimodal 3D Model

LI Qiang , MA Qiuyang , ZHANG Ning , NIE Weizhi

2025, 40(5):1139-1152. DOI: 10.16337/j.1004-9037.2025.05.003

Abstract (271) HTML (2162) PDF 39.08 K (529) Comment (0) Favorites

Abstract:To address the challenge of recognizing unlabeled 3D models in open-domain， this paper proposes a multi-agent collaborative algorithm for open-domain multimodal 3D model recognition. The algorithm employs a reinforcement learning framework to simulate human cognitive processes. Within this framework， a multi-agent system is utilized to extract and fuse multimodal information， which enables a comprehensive understanding of the feature space while leveraging the similarity of multimodal samples to enhance model training. Additionally， a progressive pseudo-label generation method is introduced in the reinforcement learning environment. It dynamically adjusts clustering constraints to generate reliable pseudo-labels for a subset of unlabeled data during training， mimicking human exploratory learning of unknown data. These mechanisms collectively update the network parameters based on environmental feedback rewards， effectively controlling the extent of exploratory learning and ensuring accurate learning for unknown categories. Experimental results show that the average recognition accuracy of the method proposed in this paper on the three-dimensional dataset OS-MN40 reaches 65.6%. After transferring the method to the image domain， the classification accuracy on the CIFAR10 dataset reaches 95.6%， which provdies a universal and efficient solution for the research of open-domain three-dimensional model recognition.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1

Multi-modal Adaptive Fusion Method for Aerospace Target Tracking and Control Information

LI Wei , CHENG Xianzhe , LIU Hang , WANG Lei

2025, 40(5):1153-1164. DOI: 10.16337/j.1004-9037.2025.05.004

Abstract (438) HTML (264) PDF 34.08 K (499) Comment (0) Favorites

Abstract:Conventional single-mode information fusion methods for aerospace target tracking and control often struggle with complex scenarios such as sudden sensor failure， abrupt target interference， and intense electromagnetic jamming. To address these limitations， the paper proposes a multi-modal adaptive fusion method that accommodates the kinematic characteristics of typical aerospace targets， including ballistic missiles， near-space hypersonic glide vehicles and aircrafts. To further enhance the sensor network’s adaptability to dynamic environments， a dynamic-threshold-based multi-modal switching strategy is designed， serving as the core mechanism for adaptive fusion while effectively preventing frequent false or delayed mode switching. Through systematic innovation， a multi-modal fusion architecture is established that surpasses the adaptive capability of the single algorithm. Experimental results demonstrate that the proposed method significantly improves both the full-course tracking and control capability for aerospace targets and the accuracy of trajectory fusion processing.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1

Emotional Video Captioning Based on Fine-Grained Visual and Audio-Visual Dual-Branch Fusion

GONG Yuxuan , HAN Tingting

2025, 40(5):1165-1176. DOI: 10.16337/j.1004-9037.2025.05.005

Abstract (163) HTML (760) PDF 35.75 K (517) Comment (0) Favorites

Abstract:Emotional video captioning， as a cross-modal task integrating visual semantic parsing and emotional perception， faces the core challenge of accurately capturing the emotional cues embedded in visual content. Existing methods have two notable limitations： First， they insufficiently explore the fine-grained semantic correlations between video subjects （such as humans and objects） and their appearance and motion features， leading to a lack of refined support for visual content understanding； second， they neglect the auxiliary value of the audio modality in emotional discrimination and content semantic alignment， which restricts the comprehensive utilization of cross-modal information. To address these issues， this paper proposes a framework based on fine-grained visual and audio-visual dual-branch fusion. Specifically， the fine-grained visual feature fusion module effectively models the fine-grained semantic associations between video entities and visual contexts through pairwise interactions and deep integration of visual， object， and motion features， thereby achieving refined parsing of video content. The audio-visual dual-branch global fusion module constructs a cross-modal interaction channel to deeply fuse the integrated visual features with audio features， fully leveraging the supplementary role of audio information in emotional cue transmission and semantic constraint. Validation experiments on public benchmark datasets show that the proposed method outperforms comparative methods such as CANet and EPAN across evaluation metrics. It achieves an average improvement of 4% over EPAN method in emotional metrics， an average increase of 0.5 in semantic metrics， and an average boost of 0.7 in comprehensive metrics. Experimental results demonstrate that the proposed method can effectively enhance the quality of emotional video captioning.

0+1
1+1
2+1
3+1
4+1

Multimodal Aspect-Level Sentiment Analysis Based on GCN and Target Visual Feature Enhancement

ZHAO Xuefeng , BAI Changze , DI Hengxi , ZHONG Zhaoman , ZHONG Xiaomin

2025, 40(5):1177-1192. DOI: 10.16337/j.1004-9037.2025.05.006

Abstract (192) HTML (1149) PDF 36.72 K (539) Comment (0) Favorites

Abstract:Multimodal aspect-level sentiment analysis aims to integrate graphic modal data to accurately predict the emotional polarity of aspect words. However， the existing methods still have significant limitations in accurately locating text-related image region features and effectively processing the information interaction between modalities. At the same time， the understanding of context information within modalities is biased， which leads to additional noise. In order to solve the above problems， a multi-modal aspect-level sentiment analysis model based on graph convolutional network and target visual feature enhancement （GCN-TVFE） is proposed. First of all， this paper uses the contrastive language-image pre-training（CLIP） model to process text， aspect words， and image data. By calculating the similarity between text and image and the similarity between aspect words and image， and then combining these two similarities， the quantitative evaluation of the matching degree between text and image and the matching degree of aspect words and image is realized. Then， the Faster R-CNN model is used to quickly and accurately identify and locate the target region in the image， which further enhances the ability of the model to extract image features related to text. Secondly， through the GCN network， the text graph structure is constructed by using the dependency syntactic relationship between texts， and the image graph structure is generated by the K-nearest neighbor（KNN） algorithm， to dig the feature information in the mode deeply. Finally， the multi-layer and multi-modal interactive attention mechanism is used to effectively capture the correlation information between aspect words and text， and between target visual features and image-generated text description features， which significantly reduces noise interference and enhances feature interaction between modes. Experimental results show that the model proposed in this paper has superior comprehensive performance on the public datasets Twitter-2015 and Twitter-2017， which verifies the effectiveness of the model in the field of multimodal sentiment analysis.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1
15+1
16+1

Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework

YE Shimin , LIU Feifei , ZHANG Yan

2025, 40(5):1193-1206. DOI: 10.16337/j.1004-9037.2025.05.007

Abstract (341) HTML (867) PDF 36.18 K (499) Comment (0) Favorites

Abstract:Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data， including text， images and structured numerical information， to achieve robust inference and explainable decision-making in complex environments. Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment， information complementation and cross-source reasoning， while the inherent black-box nature of deep models limits the result interpretability. Meanwhile， the large language model（LLM） has demonstrated strong capabilities in semantic understanding， instruction following， and reasoning， yet a gap remains in their performance for time series modeling， cross-modal alignment， and real-time knowledge integration. To address these challenges， this paper proposes a LLM-guided multi-modal time series-semantic prediction framework. By combining variational inference-based time series modeling with LLM -driven semantic analysis， the approach establishes a collaborative “temporal-semantic-decision” mechanism： The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms； the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders； and both components are jointly optimized via a learnable fusion module， which also provides uncertainty annotations and explainable reports. Experiments on the StockNet， CMIN-US， and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%， an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient （MCC） elevated to 0.223. This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1

Incremental Attribute Reduction Algorithm Based on Single-Valued Medium-Intelligence Dominance Conditional Entropy

LUO Gongzhi , WANG Cong

2025, 40(5):1207-1221. DOI: 10.16337/j.1004-9037.2025.05.008

Abstract (158) HTML (159) PDF 32.90 K (491) Comment (0) Favorites

Abstract:In the big data environment， the continuous growth of data in the ordered decision information system leads to the dynamic change of the dominance relationship between objects. Efficient calculation of attribute reduction has become a key problem to be solved urgently. Therefore， an incremental single-valued medium-intelligence dominance conditional entropy is proposed， and a new incremental attribute reduction algorithm is constructed accordingly. Firstly， the single-valued medium-intelligence dominance conditional entropy is given under the single-valued medium-intelligence ordered decision information system. Subsequently， for four different types of new objects， the incremental update mechanism of single-valued medium-intelligence dominance conditional entropy is deeply studied， and then an incremental attribute reduction algorithm is designed according to this update mechanism. Finally， six UCI datasets with dominance relations are selected to conduct a comparative experimental analysis on the effectiveness and efficiency of the incremental algorithm and the non-incremental algorithm. Experimental results show that the newly given incremental attribute reduction algorithm can significantly improve the computational efficiency of data processing while maintaining the same classification accuracy.

0+1
1+1
2+1

Trestle Random Forest Based on Multiple Randomness and Privacy Protection

SONG Yilin , WANG Shitong

2025, 40(5):1222-1238. DOI: 10.16337/j.1004-9037.2025.05.009

Abstract (176) HTML (251) PDF 32.09 K (464) Comment (0) Favorites

Abstract:As an effective ensemble learning algorithm for classification and regression tasks， the random forest （RF） also faces challenges in improving generalization ability and privacy protection. In response to this challenge， this paper proposes an improved Bernoulli-multinomial stacked random forest （BMS-RF） algorithm based on multiple randomness and privacy protection. The basic idea is to introduce Bernoulli distribution Dropout partial feature vectors to select candidate feature vectors in the stage of constructing decision tree splitting features and splitting point selection. By randomly selecting splitting features and splitting points through two polynomial distributions， each decision tree adopts a non numerical query index mechanism to add noise for maintaining its privacy protection mechanism. When integrating classifiers， a multi-layer stack structure is introduced to randomly project the output of the previous layer and concatenate the source training set as new inputs， so that each forest can share the spatial information of the source samples and improve the classification performance of the base learner layer by layer. Theoretical analysis of the consistency and privacy ability of this algorithm shows that BMS-RF can significantly improve classification performance through a stack structure. Experimental results on 14 small and medium-sized datasets verify that the algorithm not only reduces running time but also has better generalization performance. When the privacy protection is strong， it can achieve classification performance similar to RF variants on the basis of simplifying the structure and improving running speed.

0+1
1+1
2+1
3+1
4+1
5+1
6+1

Service Traffic Prediction Algorithm Based on GCN-LSTM Network for 6G Wireless Networking

SUN Shilei , XU Shu , LI Chunguo , YANG Lvxi

2025, 40(5):1239-1249. DOI: 10.16337/j.1004-9037.2025.05.010

Abstract (184) HTML (314) PDF 27.60 K (431) Comment (0) Favorites

Abstract:With the rapid development of mobile communication technology， wireless networks are facing multiple challenges， including resource allocation， traffic analysis， and 6G base station optimization. Effective prediction of wireless network traffic helps to allocate network resources reasonably and provides users with more stable and efficient services， ensuring network performance. To solve the problem of low prediction accuracy in the current wireless network traffic predictions due to insufficient mining of spatial and temporal features， this paper conducts research on intelligent traffic prediction algorithms based on deep learning methods， and proposes a prediction algorithm based on graph convolutional network-long short-term memory (GCN-LSTM） model. Experimental results show that the accuracy of this algorithm is 84.71% in actual network applications， which is superior to other deep learning-based traffic prediction methods， providing strong support for the rational allocation of 6G network resources and efficient service.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1

Unsupervised Specific Emitter Identification Method Based on Directed Graph Connectivity

YANG Ning , WANG Heng , ZHANG Bangning , DING Guoru , GUO Daoxing

2025, 40(5):1250-1260. DOI: 10.16337/j.1004-9037.2025.05.011

Abstract (142) HTML (165) PDF 31.98 K (456) Comment (0) Favorites

Abstract:Specific emitter identification （SEI） refers to the technique of distinguishing emitters by utilizing unique and subtle features in received electromagnetic signals. Due to its powerful feature extraction ability， deep learning has gradually become the main means of implementing SEI. However， in non-cooperative scenarios， labeled samples generally cannot be obtained to train the neural network， and the number of emitters to be identified is unknown. Therefore， this paper proposes an unsupervised SEI method based on directed graph connectivity without specifying the number of emitters. Drawing inspiration from the idea of hierarchical clustering， the radio frequency fingerprinting feature space is first divided into multiple sub-clusters based on local density， and the relationship between feature vectors is mapped to a directed graph. Then， based on the connectivity of the directed graph， the multiple subclusters are automatically merged to obtain the final identification result. Experimental results show that under low signal-to-noise ratio conditions， the proposed method can accurately identify individual emitters， and its identification performance is improved by 7.1%—53.1% compared to the benchmark algorithms.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1

Few-Shot Specific Communication Emitter Identification Method Based on Broad Learning and Attention Mechanism

CHEN Yupeng , LIU Hui , REN Gaoxing , YANG Junan

2025, 40(5):1261-1269. DOI: 10.16337/j.1004-9037.2025.05.012

Abstract (186) HTML (219) PDF 25.16 K (465) Comment (0) Favorites

Abstract:Under the condition of few-shot specific communication emitter identification， the difficulty of extracting individual features of communication radiation source by the existing deep learning algorithm increases， and the recognition rate decreases. To solve this problem， this paper proposes a recognition method to construct a shallow neural network by fusing attention mechanism and broad learning. Firstly， broad learning is introduced to simplify the network model and reduce the overfitting phenomenon caused by small samples. Secondly， the node attention module is constructed to improve the feature extraction ability of the broad neural network under the condition of small samples. Finally， the effectiveness of the proposed method is verified on the public dataset. The results show that compared with the deep learning method with a small number of samples， the proposed method improves the overfitting phenomenon of the deep learning network， strengthens the feature extraction ability of the broad learning method， and improves the recognition accuracy.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1

Outage Performance of Multiple User Unmanned Aerial Vehicle Relay Networks Based on MIMO-NOMA

LI Xiazhao , PENG Laixian , XU Renhui , YU Xingyue , WANG Hai

2025, 40(5):1270-1282. DOI: 10.16337/j.1004-9037.2025.05.013

Abstract (155) HTML (143) PDF 35.93 K (439) Comment (0) Favorites

Abstract:Multiple-input-multiple-output （MIMO） and non-orthogonal multiple access （NOMA） techniques are widely applied to unmanned aerial vehicle （UAV） communications due to their superior spectral efficiency. UAVs can serve as relays to provide flexible and reliable connections for users， thereby adding significant research value. To overcome the problems of multi-user interference and clustering in the MIMO-NOMA UAV relay networks， a new downlink transmission model based on amplify-and-forward （AF） is proposed. First， a three-dimensional stochastic geometry framework is applied to user clustering， and an AF-based precoding scheme is proposed according to the NOMA principle. Second， the analytical expressions for the outage probability （OP） of paired users are derived following the statistics of the equivalent propagation channel of AF relay transmitting model. And the asymptotic results and diversity order of OP under high signal-to-noise ratio （SNR） are obtained using the first-order Taylor expansion. Finally， simulations validate the theoretical analysis results through the impact of key variables on OP performance. Additionally， compared with existing transmission schemes in MIMO-NOMA， the proposed scheme can effectively improve the OP performance in multiple user UAV relay networks.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1

Robust Detection Method for AI-Generated Images Based on CNN-Transformer Hybrid Architecture

KANG Xinyuan , LI Fan , ZHAO Hui , WANG Baodong , LI Xin

2025, 40(5):1283-1293. DOI: 10.16337/j.1004-9037.2025.05.014

Abstract (280) HTML (201) PDF 31.83 K (668) Comment (0) Favorites

Abstract:With the rapid development of deep generative models， the realism of synthetic images has been continuously improving， and various generative technologies have been deeply integrated into people’s daily life， from image generation to face manipulation， which brings attention to the authenticity of images. In addition， mainstream image classification models are mainly pre-trained on natural scene datasets with rich and varied styles， while a single prompt can generate a large amount of data， but there is an obvious homogeneity problem， which affects the imbalance of learning difficulty， thus making the traditional image binary classification training method in the generated image detection task have insufficient generalization ability. To address such issues， we propose a detection method under the difficulty and easy sample imbalance， which does not need to modify the existing classification model， and establishes an effective data augmentation paradigm by generating data self-enhancement to expand the diversity of generated data， thereby balancing the learning difficulty of the model. At the same time， we use the corrected class cross-entropy loss for sensitive punishment in difficult and easy samples. Finally， the proposed method achieves the best results in the computer vision application challenge： Real and fake image recognition competition held by the artificial intelligence society of shandong province in November 2023．

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1

Improved F-LOAM Algorithm Based on Three-Stage De-distortion and Hierarchical Downsampling Mechanism

XU He , ZHANG Kuo , LI Peng

2025, 40(5):1294-1305. DOI: 10.16337/j.1004-9037.2025.05.015

Abstract (159) HTML (157) PDF 25.74 K (460) Comment (0) Favorites

Abstract:The traditional fast LiDAR odometry and mapping （F-LOAM） algorithm performs a two-stage de-distortion process on the feature points， but only the first stage de-distorts the feature points， and the second-stage de-distortion is used for building the map， which leads to the lack of accuracy in the bit-position estimation. In order to solve this problem， this paper proposes an improved three-stage de-distortion mechanism combined with a voxelized grid-based hierarchical downsampling mechanism to improve the real-time performance of the algorithm. The improved F-LOAM algorithm shows excellent test results on the KITTI dataset. The three-stage de-distortion mechanism and the hierarchical downsampling strategy not only reduce the computational burden effectively， but also ensure the validity of feature points and the accuracy of the global map.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1

A Few-Shot Learning Algorithm for Defect Image Generation and Data Augmentation Based on DID-AugGAN

HUANG Lve , DENG Yafeng , YAN Huabiao , XIAO Wenxiang

2025, 40(5):1306-1321. DOI: 10.16337/j.1004-9037.2025.05.016

Abstract (230) HTML (173) PDF 36.59 K (731) Comment (0) Favorites

Abstract:To address the issues of low quality， lack of realism， and poor diversity in defect images generated by generative adversarial network （GAN） under small-sample conditions， this paper proposes a defect image generation algorithm， named defect image data augmentation GAN （DID-AugGAN）， aiming at enhancing defect image data under limited sample conditions. First， to overcome the difficulty of traditional convolutional networks in effectively learning non-rigid features in images from limited datasets， we design a learnable offset convolution to improve the model’s capability in capturing semantic information. Second， to prevent the loss of critical defect features and enhance the correlation among local features， we introduce a multi-scale coordinate attention module， which focuses on defect location information. Third， to enhance the discriminator’s ability to distinguish local details in input images， we redesign its architecture， transforming it from a conventional feedforward network into a UNet-like structure with symmetric encoding and decoding pathways. Finally， we conduct comparative experiments between DID-AugGAN and the baseline algorithm on the Rail-4c track fastener defect dataset， and validate the generated images using the MobileNetV3 classification network. Experimental results demonstrate that the proposed method significantly improves inception score （IS） while effectively reducing Fréchet inception distance （FID） and learned perceptual image patch similarity （LPIPS）. Moreover， the classification accuracy and F₁-score of MobileNetV3 are also improved. The proposed DID-AugGAN can stably generate high-quality defect images， effectively augment defect data samples， and meet the requirements of downstream tasks.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1

MonoDI：Monocular 3D Object Detection Based on Fusing Depth Instances

ZHAO Ke , DONG Haoran , YE Ning

2025, 40(5):1322-1332. DOI: 10.16337/j.1004-9037.2025.05.017

Abstract (177) HTML (179) PDF 40.42 K (471) Comment (0) Favorites

Abstract:Monocular 3D object detection aims to locate the 3D bounding boxes of objects in a single 2D input image， which is an extremely challenging task in the absence of image depth information. To address the issues of poor detection performance due to the absence of depth information during inference on 2D images and background noise interference in depth maps， this paper proposes a monocular 3D object detection method called MonoDI， which integrates depth instances. The key idea is to utilize depth information generated by an effective depth estimation network and combine it with instance segmentation masks to obtain depth instances， and then integrate the depth instances with 2D image information to aid in regressing 3D object information. To better use the depth instance information， this paper designs an iterative depth aware attention fusion module（iDAAFM）， integrating depth instance feature with 2D image feature to obtain a feature representation with clear object boundaries and depth information. Subsequently， a residual convolutional structure is introduced during training and inference to replace the general single convolutional structure to ensure stability and efficiency of the network when processing fused information. Further， we design a 3D bounding box uncertainty auxiliary task to assist the main task in learning the generation of bounding boxes in training and improving the accuracy of monocular 3D object detection. Finally， the effectiveness of the method is validated on the KITTI dataset and experimental results show that the proposed method improves 3D object detection accuracy for the vehicle class at the moderate difficulty level by 4.41 percentage points compared with the baseline， and outperforms comparative methods such as MonoCon and MonoLSS. And it also achieves superior results on the KITTI-nuScenes cross-dataset evaluation.

0+1
1+1
2+1
3+1
4+1
5+1

A Lightweight Road Crack Detection Model Based on Improved YOLOv8n

ZHU Jiahui , LIU Yi , ZHANG Dengyin

2025, 40(5):1333-1347. DOI: 10.16337/j.1004-9037.2025.05.018

Abstract (257) HTML (246) PDF 32.44 K (863) Comment (0) Favorites

Abstract:To address the challenges of road crack appearance characteristics being susceptible to environmental interference， high miss detection rate of fine cracks， and limited computational resources of inspection equipment， a lightweight detection model， MCA-YOLO-A， is proposed. The model is based on YOLOv8n， replacing the original backbone with a lighter MobileNetV3 feature extraction network， and integrating a coordinate attention （CA） module that accurately captures spatial information， thereby enhancing the capability of feature extraction. Meanwhile， the Alpha-IOU loss function suitable for lightweight networks is introduced， which makes the overall performance of the network improve. In addition， a small target detection layer is added to improve the recognition accuracy of fine cracks. The average precision of mAP_0.5 and F₁ score of MCA-YOLO-A model on road crack data sets are 0.930 and 0.893， respectively， which are 7.0% and 9.7% higher than that of the original YOLOv8n model， and the parameter quantity is only 6.0 M， which is 4.8% lower， and the detection speed reaches 95 frames/s. Experimental results demonstrate that the model is highly accurate， lightweight， and capable of generalization， making it more suitable for deployment in scenarios with limited computational resources such as embedded systems and mobile devices.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1

A Semi-supervised Detection Method for Airport Runways in PolSAR Images Based on Bidirectional Co-training

HAN Ping , ZHANG Zhizheng , ZHOU Jielong

2025, 40(5):1348-1360. DOI: 10.16337/j.1004-9037.2025.05.019

Abstract (127) HTML (133) PDF 22.78 K (595) Comment (0) Favorites

Abstract:A bidirectional co-training teacher-student framework is developed to mitigate the performance degradation caused by the scarcity of labeled polarimetric synthetic aperture radar （PolSAR） runway detection data. Within this framework， a teaching assistant module is constructed to integrate distillation loss and feedback loss. Underutilized feature representations are identified through a systematic comparison of model inferences and the generation of directional feature vectors. Experimental results demonstrate that a detection accuracy of 83.11% is achieved by the proposed method on the UAVSAR dataset， with improvements of 15.63%， 6.46%， and 17.25% being observed over the Unet， D-Unet， and Unet++ baselines， respectively.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1
15+1
16+1

EEG-TCNet for Motor Imagery Classification Based on Nonnegative Matrix Factorization

ZHANG Xuejun , SHI Baoming

2025, 40(5):1361-1370. DOI: 10.16337/j.1004-9037.2025.05.020

Abstract (160) HTML (116) PDF 32.37 K (467) Comment (0) Favorites

Abstract:In response to the limitations of deep learning approaches in motor imagery classification using electroencephalogram （EEG） signals， such as the failure to explore inter-channel correlations and fully exploit frequency， temporal， and spatial information， this study proposes a classification method named NTEEGNet， which combines nonnegative matrix factorization （NMF） with temporal convolutional network （TCN） and one compacted convolutional neural network named EEGNet to enhance the performance of motor imagery classification with a relatively small number of parameters. The NMF component of the model effectively extracts channel features and fully utilizes frequency， temporal， and spatial information. Additionally，the network’s receptive field increases exponentially under the action of TCN， leading to stronger feature extraction capabilities with fewer parameters. Experimental results on the BCI Competition Ⅳ 2a dataset demonstrate that NTEEGNet can achieve an impressive classification accuracy of 83.99%， improved by 6.64% on the basis of EEG-TCNet.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1

System Calibration and Probe Pose Optimization for Robotic-Arm-Assisted Optical Coherence Tomography

WU Chuanchao , ZHANG Zhixiang , SU Hongjin , ZHANG Jiaying , XUE Peng

2025, 40(5):1371-1380. DOI: 10.16337/j.1004-9037.2025.05.021

Abstract (128) HTML (106) PDF 37.28 K (476) Comment (0) Favorites

Abstract:Aiming at challenges encountered during traditional handheld scanning， such as difficulty in aligning the robotic-assisted optical coherence tomography （OCT） probe with the target， inaccurate probe posture， and hand tremors of the operator， this article presents a study on high-precision localization and flexible posture adjustment of the OCT probe to meet dynamic imaging requirements in surgical procedures. To improve the localization accuracy and achieve the posture adjustment， we propose a method for system calibration and OCT probe pose optimization， ensuring that the probe is positioned in the optimal imaging posture. Firstly， the system calibration is implemented with the corresponding pixel domain to spatial domain conversion coefficients and Tsai-Lenz method. The pose optimization of the OCT probe can be optimized with image processing. The Laplacian random walk algorithm is used to obtain the skin phantom surface contour， thus calculating the normal vector of the skin phantom surface. Meanwhile， the contour of vessel phantom is segmented with the convolutional neural network method. The radius vector of vessel phantom is obtained with the segmented results. There are two vectors that determine the attitude of the probe， in which Vp and Ve denote the unit vectors parallel to the optical axis and the B-scan direction， respectively. When the OCT probe is in the optimal imaging pose， Vp should be parallel to p and Ve should be parallel to b. Using a robotic-assisted OCT imaging system to image vascular and skin phantoms， 20 repeated experiments are conducted. The positioning error of the probe in the X-Y plane is ±0.21 mm， and the positioning error in the Z direction is ±0.33 mm. The system offers advantages such as multi-degree-of-freedom， high imaging accuracy， and good stability， and can assist doctors in quantitatively assessing the condition of blood vessels.

0+1
1+1
2+1
3+1
4+1
5+1
6+1
7+1
8+1
9+1
10+1
11+1
12+1
13+1
14+1

For Authors

Quick search

Volume retrieval

External Links