Special issue

1 Burmese OCR Method Based on Knowledge Distillation

MAO Cunli , XIE Xuyang , YU Zhengtao , GAO Shengxiang , WANG Zhenhan , LIU Fuhao

2022, 37(1):173-182. DOI: 10.16337/j.1004-9037.2022.01.015

[Abstract](840) [HTML](2062) [PDF 1.40 M](2172)

Abstract:
Different from traditional image text recognition tasks， the Burmese optical character recognition （OCR） requires computers to recognize complex characters nested and combined by multiple characters in a receptive field， which brings great challenges to Burmese OCR tasks. To solve this problem， a Burmese OCR method based on knowledge distillation is proposed. This paper constructs a model of teacher network and student network using the framework of convolutional neural networks （CNN）+ recurrent neural networks （RNN） to train in an integrated learning way. In the training process， the teacher integrated sub-network is coupled with the student network to realize the alignment of the local character image features corresponding to a single receptive field in the student network and the overall character image features in the teacher network， so as to enhance the acquisition of local features in long sequence character images. The experimental results show that the performance of our model is better than the baseline by 2.9% and 2.7% respectively without and with background noise images as training data sets.

2 Layer Chains Decision of Trauma Treatment Based on Multi-label Learning

ZHAO Pengfei , LIU Hua

2022, 37(2):446-455. DOI: 10.16337/j.1004-9037.2022.02.017

[Abstract](936) [HTML](521) [PDF 725.83 K](1825)

Abstract:
In modern trauma treatment， reasonable and accurate pre-hospital assessment based on the injury and making corresponding treatment decisions are of great significance for reducing the disability and mortality of patients. To improve the shortcomings of manual decision-making and achieve accurate and reasonable standardized trauma treatment decision-making， after in-depth analysis and research on the treatment decision， this study uses the multi-label learning method to divide the overall treatment decision into sub-decisions， and extracts judgment factors corresponding to the sub-decisions as a label sets. Next， to better consider the relationship between labels， this paper combines the chain idea of the Classifier Chains algorithm with the ML-KNN algorithm， and proposes a multi-label learning algorithm by improving the ML-KNN algorithm， named layer chains multi-label K-nearest neighbor （LCML-KNN）. The LCML-KNN algorithm divides labels into two layer chains according to the characteristics. After the prediction label information of the first layer chain is output， it is uniquely encoded. And the transformed lables are put into the second layer chain as new features for prediction and judgement. The LCML-KNN algorithm not only better takes into account the relationship between the labels but also expands the feature dimension through the label conversion. The experimental results with various existing multi-label learning algorithms on two trauma datasets verify the robustness and superiority of the LCML-KNN algorithm.

3 Local-Feature-Based Two-Dimensional Whitening Reconstruction

Tian Jialue , Zhu Yulian , Chen Feiyue , Liu Jiahui

2022, 37(2):308-320. DOI: 10.16337/j.1004-9037.2022.02.005

[Abstract](987) [HTML](1505) [PDF 3.45 M](2289)

Abstract:
Whitening is a preprocessing method that can remove the correlation between variables of data. Two-dimensional whitening reconstruction （TWR） is a new whitening method for a single image. In this paper， we will elaborate the equivalence between TWR and column-based ZCA whitening， that is， TWR can remove the correlation in image column. However， the correlation within the local block of the image is often much greater than that within the column. From the perspective of removing the correlation within the local block of the image， this paper proposes two improved TWR methods： reshaped-based TWR （RTWR） and patch-based TWR（PTWR）. RTWR firstly reshapes an image to form a new matrix of which each column vector corresponds to the sub-block of the original image， and then performs the TWR on the reshaped matrix. In PTWR method， TWR is directly applied to each sub-block of the image. The experimental results on ORL， CMU PIE and AR face datasets show that RTWR and PTWR are more beneficial to improving the subsequent classification performance than TWR.

Ren Minjie , Jin Guoqing , Wang Xiaowen , Chen Ruidong , Yuan Yunxin , Nie Weizhi , Liu An’an

2022, 37(2):383-395. DOI: 10.16337/j.1004-9037.2022.02.011

[Abstract](1073) [HTML](1374) [PDF 1.60 M](2298)

Abstract:
With the advent of the all-media era and the development of social networks， the popularity prediction begins to play an important role in the monitoring of public opinion and the competition of data discourse power. The existing popularity prediction researches mostly focuse on foreign media， and it is an emerging and challenging direction to predict the popularity of domestic mainstream media such as microblog. In this paper， we conduct the research on microblog， a domestic social media platform， through the analysis of microblog’s content and users， and design a variety of popularity prediction schemes. Meanwhile， we propose a microblog popularity prediction algorithm based on XGBoost， which converts the popluarity prediction problem into an interactive value file classification problem， and use the extracted and fused features for model training under the categorical framework， which can predict the popularity of microblog with user information more accurately. The proposed algorithm is verified in the microblog popularity prediction dataset， whose accuracy rate can achieve as high as 85.69%.

5 Survey on New Progresses of Deep Learning Based Computer Vision

LU Hongtao , LUO Mukun

2022, 37(2):247-278. DOI: 10.16337/j.1004-9037.2022.02.001

[Abstract](3998) [HTML](4484) [PDF 12.48 M](5833)

Abstract:
Deep learning has recently achieved great breakthroughs in some fields of computer vision. Various new deep learning methods and deep neural network models were proposed， and their performance was constantly updated. This paper makes a survey on the new progresses of applications of deep learning on computer vision since 2016 with emphases on some typical networks and models. We first investigate the mainstream deep neural network models for image classification including standard models and light-weight models. Then， we introduce some main methods and models for different computer vision fields including object detection， image segmentation and image super-resolution. Finally， we summarize deep neural network architecture searching methods.

6 Language Identification Method for Multi-task Learning Based on Contrastive Predictive Coding Model

ZHAO Jianchuan , YANG Haoquan , XU Yong , WU Lian , CUI Zhongwei

2022, 37(2):288-297. DOI: 10.16337/j.1004-9037.2022.02.003

[Abstract](885) [HTML](1714) [PDF 754.63 K](2005)

Abstract:
The key of language identification is to extract useful features from speech fragments. The time-delayed neural network （TDNN） can extract feature vectors， which contain rich context and improve system performance effectively. This paper proposes a multi-task learning method of ECAPA（Emphasized channel attention）-TDNN+contrastive predictive coding（CPC） network for language identification. ECAPA-TDNN is the main network to extract the global features of language. The improved CPC model is the auxiliary network， and the frame level features extracted by ECAPA-TDNN are compared and predicted. Finally， the joint loss function is used to optimize the network. The proposed method is tested on the 10 language data sets provided by the AP17-OLR data set．The result shows that the identification accuracy of the proposed network is higher than baseline on the 1 s， 3 s and All test data sets of AP17-OLR.

7 Topic Opinion Leader Mining Based on Multi-relational Networks

Duan Zhen , Ni Yunpeng , Chen Jie , Zhang Yanping , Zhao Shu

2022, 37(3):576-585. DOI: 10.16337/j.1004-9037.2022.03.008

[Abstract](802) [HTML](781) [PDF 1.41 M](4872)

Abstract:
Opinion leaders in social networks play an important role in the process of information dissemination. The traditional mining of opinion leaders is based on network structures and doesnot consider the role of a specific topic or event， and the current mining of opinion leaders based on topic is only based on a single network structure， without taking into account the multiple interactive relationships between nodes. This paper proposes a topic opinion leader mining method based on multi-relational networks （MRTRank）， which joins topic factors and a variety of interactive relationship between nodes. Through an attribute network representation learning algorithm， the similarity of different nodes in the multi-relationship network is obtained， and the transition probability matrix of nodes is formed. Finally， the top-k opinion leaders are obtained through the PageRank algorithm. Experimental results on real Twitter datasets verify that the proposed method is superior to traditional opinion leader mining algorithms.

8 Hot Topic Detection Method of Microblog Short Text Stream Based on Feature Extension

LI Yanhong , XIE Mengna , WANG Suge , LI Deyu

2022, 37(3):621-632. DOI: 10.16337/j.1004-9037.2022.03.012

[Abstract](987) [HTML](671) [PDF 1.00 M](5584)

Abstract:
With the rapid development of social networks and Internet， a large number of microblog short text stream data have been produced. Discovering hot topics from microblog text streams in time plays an important role in topic recommendation and public opinion monitoring. To solve the problem of sparse features of microblog， a feature extension-based hot topic detection （FE-HTD） method in microblog short text stream is proposed by using microblog comments to extend the features of microblog. To complete the feature extension of the microblog text， firstly， the comment text is selected by the influence of the comment users and the number of likes for comment text， and the feature words are extracted from the comment text by word co-occurrence and term frequency-inverse document frequency （TF-IDF） method. Then count the word pair speed， word pair acceleration and microblog text strength of the microblog short text stream. The burst feature is calculated by word pair acceleration and microblog text strength. Finally， the variable length window range of hot topic is determined according to the speed of the burst word pair， and the topic structure of hot topic in the window is obtained by clustering. In the experiment， the proposed algorithm is compared with the text-based topic detection （T-TD） method and the burst words-based topic detection （BW-TD） method. The results show that the accuracy of the proposed algorithm is 76.4%， and the recall rate is 78.7%，which are 10% higher than those of T-TD and BW-TD methods.

9 Feature Selection Based on Rough Hypercuboid and Binary PSO

WANG Sizhao , LUO Chuan , LI Tianrui , CHEN Hongmei

2022, 37(3):668-679. DOI: 10.16337/j.1004-9037.2022.03.016

[Abstract](927) [HTML](547) [PDF 1.99 M](5155)

Abstract:
Feature selection is to choose a subset without containing redundant features， while keeping the classification performance of the data unchanged. Rough hypercuboid approaches can comprehensively evaluate the feature subsets from the three aspects of the relevance， dependency and significance of features， which have been used for feature selection successfully. However， calculating the combination of all feature subsets is NP-hard， and the results obtained by traditional forward search methods is locally optimal. Therefore， a new algorithm based on the rough hypercuboid approach is designed by integrating binary particle swarm optimization. The algorithm first introduces the feature relevance to generate a set of particles， then sets the improved objective function of the rough hypercuboid method as the optimization function， and finally finds the optimal feature subset by iterative optimization of binary particle swarm. By comparing with traditional rough hypercuboid methods and the rough set method based on particle swarm optimization， etc， experimental results demonstrate the proposed algorithm is able to acquire a feature subset with fewer features and higher classification performance.

10 Review on Domain Adaptation Methods Based on Deep Learning

Tian Qing , Zhu Yanan , Ma Chuang

2022, 37(3):512-541. DOI: 10.16337/j.1004-9037.2022.03.004

[Abstract](1973) [HTML](3635) [PDF 2.90 M](11529)

Abstract:
Domain adaptation mainly deals with similar task decision across different data distributions. As an emerging branch of machine learning， domain adaptation has received much attention. With the rise of deep learning in recent years， the deep domain adaptation paradigm， as a combination of deep learning and traditional domain adaptation， has attracted more and more research. Although a variety of deep domain adaptation methods have been proposed， few systematic reviews have been published. To this end， this paper definitely reviews and analyzes the existing deep domain adaptation work and summarizes them to provide reference for relevant researchers. In conclusion， the main contributions of this work include the following aspects. Firstly， the background， concepts and application fields of domain adaptation are summarized. Secondly， according to whether the model training involves adversarial mechanism， we group the existing deep domain adaptation methods into two categories， such as deep adversarial domain adaptation and deep non-adversarial domain adaptation， and review and analyze them， respectively. Then， the benchmark datasets commonly used in the domain adaptation research are tabulated with profiles. Finally， the issues suffered in the existing deep domain adaptation work are summarized and analyzed， and future research directions are given.

11 Dynamic Visual SLAM Based on Unified Geometric-Semantic Constraints

Shen Yehu , Chen Jiahao , Li Xing , Jiang Quansheng , Xie Ou , Niu Xuemei , Zhu Qixin

2022, 37(3):597-608. DOI: 10.16337/j.1004-9037.2022.03.010

[Abstract](1741) [HTML](1248) [PDF 1.53 M](9146)

Abstract:
Traditional visual simultaneous localization and mapping （SLAM） algorithms rely on the scene rigidity assumption. However， when dynamic objects exist in the scene， the stability of the SLAM system will be affected and the accuracy of pose estimation will be reduced. Currently， most of the existing methods apply probability strategies and geometric constraints to reduce the impact caused by a small number of dynamic objects. But when the number of dynamic objects in the scene is high， these methods will fail. In order to deal with this problem， a novel algorithm is proposed in this paper. It combines the dynamic visual SLAM algorithm with the multi-target tracking algorithm. Firstly， a semantic instance segmentation network together with geometric constraints is introduced to assist the visual SLAM module to effectively separate the static feature points from the dynamic ones， and at the same time， it can also achieve the better multi-target tracking performance. Furthermore， the trajectory and velocity information of the moving objects can also be estimated， which can provide decision information for autonomous robots navigation. The experimental results on KITTI dataset show that the localization accuracy of the proposed algorithm is improved by about 28% compared with ORB-SLAM2 algorithm in dynamic environments.

12 Frequency Division Duplex Massive Multiple-input Multiple-output Downlink Channel State Information Acquisition Techniques Based on Deep Learning

GUI Guan , WANG Jie , YANG Jie , LIU Miao , SUN Jinlong

2022, 37(3):502-511. DOI: 10.16337/j.1004-9037.2022.03.003

[Abstract](1709) [HTML](901) [PDF 1.82 M](9374)

Abstract:
The evolution of massive multiple-input multiple-output （MIMO） techniques is an important support for further improving the performance of six-generation （6G） wireless communication systems. However， with the continuous expansion of large-scale antenna arrays， frequency division duplex （FDD） massive MIMO systems are facing severe challenges in acquiring downlink channel state information （CSI）. Deep learning has a powerful ability to learn and process high-dimensional data， which provides a potential solution to this challenge. In this paper， we survey FDD massive MIMO downlink CSI acquisition techniques based on deep learning， including CSI feedback and prediction techniques. Firstly， the theoretical frameworks of CSI feedback and prediction based on deep learning are presented. Then， the superior performance of relevant research results at home and abroad is analyzed， providing a reference scheme for solving the problem of acquiring downlink CSI in FDD massive MIMO systems towards 6G. Finally， unsolved open problems of FDD massive MIMO downlink CSI acquisition are discussed， followed by potential solutions correspondingly.

13 Granular Computing-Driven Support Vector Data Description Approach to Classification

Fang Yu , Cao Xuemei , Yang Mei , Wang Xuan , Min Fan

2022, 37(3):633-642. DOI: 10.16337/j.1004-9037.2022.03.013

[Abstract](1363) [HTML](551) [PDF 1.21 M](7972)

Abstract:
The effect of classification learning is closely related to the distribution of limited training samples. Support vector data description （SVDD）， as a single boundary solution model， cannot well describe the actual distribution characteristics of the data， resulting in some target objects falling outside the hypersphere. To improve its classification ability， this paper proposes a granular computing-driven SVDD （GrC-SVDD） classification method to construct a multi-granularity levels attribute sets and the corresponding multi-granular hyperspheres. Firstly，the importance of the attribute within the current granularity level is calculated through the neighborhood self-information. Secondly， the best attribute set is then chosen to retrain the hyperspheres that did not achieve the purity criterion at the previous granularity level， and so on until all hyperspheres meet the conditions or the attributes are exhausted. The experimental section discusses the effect of parameters on classification performance and learns hyperparameters. The experimental results show that GrC-SVDD has better classification performance compared with SVDD and popular classification methods.

14 Urban Facility Locating Method Based on Ranking Learning

Han Wenjun , Zhang Yaping , Chen Hong , Chen Dan , Sun Wanting , Zhao Bin

2022, 37(3):609-620. DOI: 10.16337/j.1004-9037.2022.03.011

[Abstract](720) [HTML](680) [PDF 4.02 M](6231)

Abstract:
A locating method based on learning to rank is proposed to solve the location of urban facilities and introduce the features of human mobility to improve the effectiveness. First， representation vector is extracted with two stream autoencoders， fusing the features of human mobility with others. Then the plots are sorted based on representation vector of the candidate sets and the ranking network. Extensive experiments based on real multi-source dataset verify the effectiveness of the proposed locating method.

15 Few-Shot Learning Method Based on Topic Model and Dynamic Routing Algorithm

ZHANG Shufang , TANG Huanling , ZHENG Han , LIU Xiaoyan , DOU Quansheng , LU Mingyu

2022, 37(3):586-596. DOI: 10.16337/j.1004-9037.2022.03.009

[Abstract](1448) [HTML](817) [PDF 1.89 M](8684)

Abstract:
Aiming at the problem that the training samples for few-shot learning are too few， which leads to the weak expression of features， a novel dynamic routing prototypical network based on SLDA（DRP-SLDA） is proposed based on the supervised topic model（Supervised LDA， SLDA） and dynamic routing algorithm. The SLDA topic model is used to establish the semantic mapping between words and categories， enhance the category distribution characteristics of words， and obtain the semantic representation of samples from the perspective of word granularity. The dynamic routing prototypical network（DR-Proto） is presented. The network makes full use of the semantic relationship between samples by extracting cross features， and uses the dynamic routing algorithm to iteratively generate dynamic prototype with category representation， so as to solve the problem of feature expression. The experimental results show that the DRP-SLDA model can effectively extract the category distribution characteristics of words and dynamically obtain the dynamic prototype to increase the category identification， which can obviously improve the generalization ability of few-shot text classification.

16 Multi-scale Domain Adversarial Network for Transfer Learning

LIN Jiawei , WANG Shitong

2022, 37(3):555-565. DOI: 10.16337/j.1004-9037.2022.03.006

[Abstract](945) [HTML](1347) [PDF 757.29 K](5709)

Abstract:
The effectiveness of deep learning algorithms depends on a large amount of labeled data. The purpose of transfer learning is to use a dataset with known labels （source domain） to classify a dataset with unknown labels （target domain）， so the research of deep transfer learning has become a hotspot. For the problem of insufficient training data labels， a model of multi-scale domain adversarial network（MSDAN） based on multi-scale feature fusion is proposed. This method uses the idea of generating adversarial networks and multi-scale feature fusion to obtain the feature representation of the domain data and the target domain data in a high-dimensional feature space. The feature representation extracts common geometric features and common semantic features of the source domain data and the target domain data. The feature representation of the source domain data and the source domain label are input into the classifier for classification， and finally more advanced effect is obtained in the test of the target domain dataset.

17 Data Science : From Digital World to Digital Intelligent World

ZHANG Qinghua , GAO Yu , SHEN Qiuping

2022, 37(3):471-487. DOI: 10.16337/j.1004-9037.2022.03.001

[Abstract](1879) [HTML](1179) [PDF 1.63 M](10498)

Abstract:
With the development of big data， data has become a major strategic resource for countries and its social impact is increasingly obvious. Thus， data science is proposed to explore and study basic scientific problems contained in big data. In this paper， the development of big data， the rise and connotation of data science are first introduced. Second， the research status of big data and data science is analyzed， and the application of data in various industries is discussed. Third， the big data proving ground that is constructed to explore laws and problems of data science is briefly described. Finally， in order to promote the development of data science， accelerate the transformation of the real world to the digital world， and realize the intelligent life， the key issues of data science and the new thinking in digital world are discussed.

18 Improved Grey Correlation Model for Performance Evaluation of Radar Emitter Signal Sorting and Recognition Features

PU Yunwei , WU Haixiao , JIANG Ying , YU Yongpeng

2022, 37(3):657-667. DOI: 10.16337/j.1004-9037.2022.03.015

[Abstract](806) [HTML](569) [PDF 1.45 M](2384)

Abstract:
In order to solve the problems of insufficient objective evaluation and lack of evaluation basis for the classification and identification of radar emitter signal， an improved gray correlation feature evaluation model combined with interval-valued intuitionistic fuzzy thought is constructed. The model introduces the dimension of signal-to-noise ratio （SNR） to examine the dynamic differences of data at different levels， describes feature information with interval data， and establishes an interval-valued intuitionistic fuzzy comprehensive decision matrix. Secondly， an optimization model that maximizes the total deviation between features is used to determine the weight of each indicator. Finally， based on the improved gray correlation framework， the ranking of feature schemes is achieved by combining with the approach to ideal points. The simulation results show that the proposed method can give the sorting identification feature evaluation and sorting results that are consistent with the actual situation， and is basically consistent with the analysis results by the unimproved gray correlation method， which verifies the feasibility and effectiveness of the proposed method.

19 A Model for Extracting Evaluation Objects of Cased-Involved Microblog Based on Keyword Structured Encoding

Wang Jingyun , Yu Zhengtao , Xiang Yan , Chen Long

2022, 37(5):1026-1035. DOI: 10.16337/j.1004-9037.2022.05.008

[Abstract](712) [HTML](518) [PDF 960.79 K](1749)

Abstract:
The purpose of extracting evaluation object of the microblog involved in a case is to identify the case object terms of the user evaluation from the microblog comments， which helps to grasp public thought on different aspects of a certain case. In general， the existing methods regard evaluation object extraction as a sequence labeling task， but do not take into account the domain characteristics of the microblog involved in the case， that is， comments are usually discussed around the case keywords that appear in the microblog text. For this reason， this paper proposes a sequence labeling model based on case keyword structured encoding to extract the evaluation objects of the microblog involved in the case. First of all， a number of case keywords are obtained from the text of microblogs， and the structured encoding mechanism is used to convert them into keyword structural representations. After that， the representations are integrated into the comment sentence representation through the cross attention mechanism. In the end， the evaluation target terms are extracted by the conditional random field （CRF）. Experiments are conducted on the data sets of two cases. Compared with the multiple baselines， the encouraging progress validates the effectiveness of the proposed approach.

20 Survey of Interpretable Deep TSK Fuzzy Systems

Wang Shitong , Xie Runshan , Zhou Erhao

2022, 37(5):935-951. DOI: 10.16337/j.1004-9037.2022.05.001

[Abstract](1981) [HTML](1280) [PDF 840.68 K](4042)

Abstract:
While the existing deep neural networks have earned great successes in various application scenarios，they are still facing black-box challenges that they are not very suitable for some application fields such as healthcare， finance and transportation. Therefore， explainable artificial intelligence （XAI） has been becoming a hot research topic in recent years. Among the existing XAI means， since fuzzy AI systems have the impressive ability to achieve an excellent trade-off between performance and interpretability，interpretable deep Takagi-Sugeno-Kang （TSK） fuzzy systems have been drawing more and more attentions. We first state the concept of the classical TSK fuzzy systems，then give a comprehensive overview of interpretable deep TSK fuzzy systems which are based on stacked generalization principle， including their structures，representative models and application scenarios， and finally discuss their future development direction according to their existing problems.

21 Approximate Aggregate Query Method Based on Two-Stage Stratified Sampling

Fang Jun , Zhao Bo , Zuo Changqi

2022, 37(5):1049-1058. DOI: 10.16337/j.1004-9037.2022.05.010

[Abstract](936) [HTML](1140) [PDF 1.41 M](1886)

Abstract:
The interactive query analysis technology represented by data warehouse application provides support for intelligent decision-making. With the continuous increase of data scale， accurate calculation of query results often requires global data scanning， which makes the group-by query face the problem of insufficient real-time response ability. Based on the pre-extracted sample data， it can provide fast approximate answers for aggregate queries， which is a feasible solution to this problem in many scenarios. This paper analyzes the specific conditions that stratified sampling is better than random sampling， and proposes a two-stage stratified sampling method. In the first stage， the sampling is grouped according to the business characteristics. In each grouping， the random sampling method is first used for random sampling， and the sampling effect is evaluated. To improve the effect of approximate query， the second stage sampling is carried out， and the self-organizing feature mapping （SOM） clustering method is used to group the values. Experimental results on the public data set and the actual power grid data show that， compared with random sampling， stratified random sampling and congressional sampling algorithm， performance of the proposed method can be improved by 15% at most under the same sampling rate. And SOM has better approximate query results than K-means and density-based spatial clustering of applications with noise （DBSCAN） clustering methods.

22 Difference Analysis Research of Threshold Selection in Principal Component Analysis

ZHANG Jing , LIU Qian

2022, 37(5):1012-1017. DOI: 10.16337/j.1004-9037.2022.05.006

[Abstract](1036) [HTML](1121) [PDF 1.77 M](2205)

Abstract:
Principal component analysis （PCA） is a commonly used method for feature extraction and data dimension reduction. In many applications， the components whose eigenvalues are greater than the average value are retained. However， there is no specific analysis result for the relationship between the number of principal components and the application results. Therefore， an experimental analysis of the difference in selection of PCA threshold is carried out to provide basis for the PCA threshold selection in different applications. The experiment analysis is used to reduce the dimension of handwritten digital sample set MNIST， and different neural networks are constructed according to different thresholds for classification. Furthermore， the change of classification accuracy under different thresholds is analyzed. The experimental results show that when the threshold of PCA is between 79%—81% （dimension is 41—50）， the classification accuracy is the highest， and the accuracy decreases accordingly when the threshold is lower or higher than that region. It is proved that there is no positive correlation between application results and threshold selection of PCA， and the average of the eigenvalues is not a mandatory criterion.

23 Improved Self-paced Deep Incomplete Multi-view Clustering

Cui Jinrong , Huang Cheng

2022, 37(5):1036-1048. DOI: 10.16337/j.1004-9037.2022.05.009

[Abstract](904) [HTML](739) [PDF 1.96 M](2299)

Abstract:
With the increase of the volume of data， multi-view clustering with missing view data is becoming progressively common， which is regarded as the incomplete multi-view clustering. Powered by the development of deep learning models， clustering models introduced deep learning can normally get more outstanding performance than shallow models. A novel deep incomplete multi-view clustering model is proposed， which is called improved self-paced deep incomplete multi-view clustering. In this model， the complementarity of multi-view data is fully considered， and the missing views are completed by the nearest neighbor imputation scheme based on multi-view data characteristics. Multiple encoders are exerted to obtain the low-dimensional potential features of multiple views. Meanwhile， the graph embedding strategy is introduced to maintain the geometric structure among the potential features. The consistency principle is exerted to fuse the potential features from different views to obtain consistent potential features. Experimental results indicate that， compared with the existing incomplete multi-view clustering models， our model can deal with various incomplete multi-view clustering more flexibly and efficiently， thus improving the robustness and performance of incomplete multi-view clustering.

24 A Survey on Application of Deep Learning in Photoacoustic Image Reconstruction from Limited-View Sparse Data

SUN Zheng , HOU Yingsa

2022, 37(5):971-983. DOI: 10.16337/j.1004-9037.2022.05.001

[Abstract](1632) [HTML](1093) [PDF 4.04 M](4288)

Abstract:
Photoacoustic imaging （PAI） is a newly emerging hybrid functional imaging modality. High-quality image reconstruction is the key to improve the imaging accuracy. Incomplete photoacoustic（PA） measurements usually lead to the reduction in the imaging depth and the quality of images which are rendered by using conventional reconstruction techniques such as back projection （BP）， time reversal （TR）， and delay and sum （DAS）. The iterative algorithms are capable of solving this issue to a certain extent at the cost of high computational burden and a properly selected regularization tool. In recent years， deep learning （DL） has exhibited promising performances in the field of medical imaging. It has also shown great potential in reconstructing images with high quality and high efficiency. This paper provides a survey on PA image reconstruction from sparely sampled data in a limited view based on DL. The current methods are summarized and classified， and their advantages and limits are also discussed.

25 Sparse Principal Component Analysis Algorithm Based on Same Sparse Pattern

SHAO Jianfei , PU Rong , Huang Wei , JI Jianjie , GUO Peng

2022, 37(5):1084-1091. DOI: 10.16337/j.1004-9037.2022.05.013

[Abstract](978) [HTML](561) [PDF 966.74 K](1728)

Abstract:
Sparse principal component analysis is an unsupervised method for dimensionality reduction and feature selection. An adaptive sparse principal component analysis （ASPCA） algorithm is proposed， because the principal load vectors do not have the same sparse pattern when calculating multiple principal components， and it is difficult to determine a small number of the variables that contribute the most to the principal components from the original feature space. Firstly， the group lasso model is used， and the ASPCA formula is obtained by applying block sparse constraints on the load vector. Subsequently， different adjustment parameters are used for different columns of the sparse matrix to obtain adaptive penalty. Finally， the block-coordinate descent method is used to optimize the adaptive sparse principal component analysis formula in two stages， so as to find the sparse load matrix and the orthogonal matrix and achieve the optimization of dimensionality reduction. The comparison results of the sparse principal component analysis （SPCA） algorithm， the structured and sparse principal component analysis （SSPCA） algorithm and the ASPCA algorithm show that the ASPCA algorithm has better dimensionality reduction performance and can extract more valuable features， thereby effectively improving the average classification accuracy of the classification model.

26 Chinese Event Detection with Syntax and Full Text Information Enhancement

Wang Hong , WU Haozheng

2022, 37(5):1059-1069. DOI: 10.16337/j.1004-9037.2022.05.011

[Abstract](664) [HTML](463) [PDF 923.46 K](1728)

Abstract:
Aiming at the problems of insufficient utilization of syntactic dependencies between words and lack of global semantic information in Chinese event detection， a Chinese event detection model based on syntactic and full-text information enhancement is proposed. Firstly， the model introduces graph convolutional network （GCN） to enhance the feature representation of words by capturing the dependency syntactic relationship between words. Then， bidirectional gate recurrent unit （Bi-GRU） is used to learn the context information within and between sentences respectively， and the sentence vector containing the global information of the article is obtained. Finally， the information of word， phrase and sentence is dynamically fused through the gate structure， and the conditional random field （CRF） is used to identify and label the trigger words in the sentence. Experimental results on ACE2005 and CEC Chinese data sets show that the proposed method effectively improves the effect of Chinese event detection.

27 Deep and Shallow Feature Fusion Based on Graph Convolution for Cross-Corpus Emotion Recognition

YANG Zixiu , JIN Yun , MA Yong , DAI Yanyan , YU Jiajia , GU Yu

2023, 38(1):111-120. DOI: 10.16337/j.1004-9037.2023.01.009

[Abstract](750) [HTML](497) [PDF 2.53 M](1735)

Abstract:
The traning and testing data for speech emotion recognition often come from different corpora.In this case，the mode recognition performance decreases greatly due to the domain mismatch.To address this problem， we present a new composition method using graph convolutional network to represent the topological structure between the source and target databases for cross corpus speech emotion recognition. Besides，aiming at the problem of low accuracy of single feature in emotion recognition，a novel feature fusion method is proposed.Firstly， we extract the acoustic features by OpenSMILE， then extract deep features by graph convolutional neural network. With the proceeding of convolutional layers，nodes transmit the feature information to another nodes，making the deep features contain clearer feature information and more detailed semantic information. Finally， we fusion the shallow and deep features. Two classification experiments are carried out. eNTERFACE corpus is for training and Berlin corpus is for testing， and the recognition rate is 59.375%. Berlin corpus is for training and eNTERFACE corpus is for testing， and the recognition rate is 36.111%. The experimental results are higher than the best research results in the baseline system and references， which proves the effectiveness of the method proposed in this paper.

28 Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding

SHEN Zhijie , GUO Wu

2023, 38(1):101-110. DOI: 10.16337/j.1004-9037.2023.01.008

[Abstract](1012) [HTML](797) [PDF 893.81 K](1809)

Abstract:
Based on the unsupervised pre-training technology， wav2vec 2.0 has become a research hotspot for the state of the art performance in many low-resource languages. In this paper， the Vietnamese continuous speech recognition is carried out on the basis of the pre-trained model. The phonetics information is integrated into the connectionist temporal classification （CTC） loss function based acoustic modeling， and the phones and the position dependent phones are selected as the basic modeling units. To balance the number of modeling units and the refinement of the model， a byte-pair encoding （BPE） algorithm is used to generate phone based subwords， and the contextual information is integrated into the acoustic modeling process. Experiments are carried out on the low-resource Vietnamese development set of NIST’s BABEL task， and the proposed algorithm significantly improves the wav2vec 2.0 baseline system. The word error rate is reduced from 37.3% to 29.4%.

29 Fusing Matrix Factorization and Cost-Sensitive Microbial Data Augmentation Algorithm

Wang Xi , Wen Liuying , Min Fan

2023, 38(2):401-412. DOI: 10.16337/j.1004-9037.2023.02.015

[Abstract](452) [HTML](548) [PDF 3.49 M](1599)

Abstract:
Microorganisms have a direct impact on human health， and the analysis of relevant data is helpful for disease diagnosis. However， the collected data suffers from two problems： class imbalance and high sparseness. Existing oversampling methods can alleviate the class imbalance of data to a certain extent， but it is difficult to cope with the high sparsity of microbial data. This paper proposes a data augmentation algorithm that fuses matrix factorization and cost-sensitive， which consists of three techniques. First， the original matrix is decomposed into a sample subspace and a feature subspace. Second， the positive vectors of the sample subspace and their neighbor vectors are used to generate synthetic vectors. Finally， the synthetic vectors are filtered according to their distance from all negative vectors. The proposed algorithm is compared with five oversampling algorithms on 8 microbial datasets. The results show that the proposed algorithm can enhance the diversity of positive samples and identify more positive samples with lower classification cost.

30 Person Re-identification Method Based on Improved Transformer Encoder and Feature Fusion

ZHAO Qian , XUE Chaochen , ZHAO Yan

2023, 38(2):375-385. DOI: 10.16337/j.1004-9037.2023.02.013

[Abstract](810) [HTML](874) [PDF 2.69 M](1797)

Abstract:
In order to solve the problem of low accuracy of Transformer encoder caused by the loss of person image blocks information and insufficient expression of person local features in person re-identification， an improved Transformer encoder and feature fusion algorithm for person re-identification is proposed. This algorithm uses relative position encoding to solve the problem that Transformer will lose the relative position information of person image blocks during attention operation so that the network can focus on the semantic feature information of person image blocks， thus enhancing the ability to extract pedestrian features. Secondly， the local patch attention module is embedded into the Transformer network to weighted strengthen the local key feature information and highlight the significant features of the person area. Finally， the fusion of global and local information features is used to achieve complementary advantages between features and improve the recognition ability of the model. In the training stage， Softmax and triple loss functions are used to jointly optimize the network. The proposed algorithm is experimentally compared and analyzed on the mainstream datasets of Market1501 and DukeMTMC-reID. The Rank-1 accuracy reaches 97.5% and 93.5% respectively， and the mean average precision （mAP） reaches 92.3% and 83.1% respectively. The experimental results show that the improved Transformer encoder and feature fusion algorithm can effectively improve the accuracy of person re-identification.

31 Multi-scale Object Detection Based on Non-local Feature Fusion

MA Qian , ZENG Kai , WU Jiawen , SHEN Tao

2023, 38(2):364-374. DOI: 10.16337/j.1004-9037.2023.02.012

[Abstract](709) [HTML](470) [PDF 3.56 M](1655)

Abstract:
Aiming at the problem that the fusion method used by the existing multi-scale object detection model in the face of scale variation and occlusion scene is not sufficient， and does not capture the long-distance dependency relationship， channel feature fusion aggregation module and non-local feature interaction module are designed to learn the correlation between different channel features and capture the long-distance dependence between feature maps. In addition， the current detection architecture is based on single pyramid detection structure， which exists information loss. In this paper， a double pyramid structure is designed， and the proposed fusion method is combined with the double feature pyramid structure to supplement the fusion feature information on the basis of preserving the original feature information. Experimental results on public datasets KITTI and PASCAL VOC show that the proposed method has higher detection accuracy than other advanced work， proving its effectiveness in object detection task.

32 Semantic Segmentation for Real Point Cloud Scenes via Geometric Features

Li Jiaxiang , Xuan Shibin , Liu Lixia , Wang Kuan

2023, 38(2):336-349. DOI: 10.16337/j.1004-9037.2023.02.010

[Abstract](740) [HTML](661) [PDF 3.32 M](1619)

Abstract:
Effective acquisition of spatial structural features of point cloud data is the key to semantic segmentation of point clouds. To solve the problem that the previous methods do not make good use of global and local features， a new spatial structure feature， point box feature， is proposed for semantic segmentation. A network framework of encoding-decoding structure is designed. The global spatial and local neighborhood features of point clouds are learned by using the geometric structure feature module during the downsampling process， and the full size feature map is restored step by step in the upper sampling process for semantic segmentation. The geometric structure features module contains two sub-modules， one is the global features module， which learns the “box” features of points to represent the rough geometric features of point clouds in the sampling space. Another is the local features module， which uses feature extraction， the attention mechanism structure， to represent precise， fine-grained geometric characteristics of point clouds within local neighborhoods. Experiments are performed on the public dataset S3DIS and Semantic3D and compared with other methods. The results show that mIoU is ahead of most of the current mainstream methods， and some of the detail class IoU is the highest.

33 Multi-channel Speech Enhancement Based on Joint Graph Learning

ZHANG Pengcheng , GUO Haiyan , WANG Tingting , YANG Zhen

2023, 38(2):283-292. DOI: 10.16337/j.1004-9037.2023.02.005

[Abstract](795) [HTML](544) [PDF 1.30 M](1567)

Abstract:
Considering that the spatial relationship between channels affects the noise reduction， graph signal processing can capture the potential relationship. If the spatial physical distribution map is directly used， its time-varying characteristics cannot be reflected in real time. Therefore， we propose a multi-channel speech enhancement method based on joint graph learning. Firstly， we propose a joint time-space graph learning method， which jointly optimizes the array space graph and the speech frame inner graph， for the sake of minimizing the sum of the smoothness of the multi-channel noisy speech signal on the spatial graph， the smoothness of the nosiy speech signal from the reference channel on the speech frame graph， the sparsity of the Laplace matrix and the sparsity of the adjacency matrix. Based on the learned space graph and frame inner graph， the time-space joint graph of multi-channel speech signal is constructed. On this basis， the multi-channel speech graph signal is enhanced by applying the joint graph transform and the fixed beam forming （FBF） method. Experimental results show that the proposed joint graph learning based FBF （JGL-FBF） method can significantly improve the signal-to-noise ratio （SNR） of enhanced speech and perceptual evaluation of speech quality （PESQ） compared with the traditional FBF method. In addition， the experimental results also show that the accuracy of delay compensation affects the speech enhancement performance of JGL-FBF.

34 Review of Multi-source Information Fusion Methods Based on Granular Computing

Xu Weihua , Huang Xudong , Cai Ke

2023, 38(2):245-261. DOI: 10.16337/j.1004-9037.2023.02.002

[Abstract](1416) [HTML](1432) [PDF 1.33 M](2597)

Abstract:
Multi-source data is a complex data type that integrates multiple information sources or data sets. Its main feature is that different information sources imply different knowledge structures， and represent and describe samples and relationships between samples from different perspectives. How to fuse and integrate multi-source data cooperatively and how to quickly mine the overall decision-making knowledge for users from different viewpoints have become a scientific problem that needs to be solved urgently in the field of data science. Classical rough set theory， multi-granularity method， evidence theory and information entropy are common and effective multi-source information fusion methods， which have been widely concerned and achieved fruitful results. Therefore， this paper summarizes the work of multi-source information fusion based on granular computing， reviews the basic concepts and main research ideas of each information fusion method， and puts forward some problems in the field of multi-source information fusion. The obtained results can provide a theoretical reference for the follow-up research in this field.

35 Cross-Corpus Emotion Recognition Based on Deep Domain Adaptation and CNN Decision Tree

SUN Linhui , ZHAO Min , WANG Shun

2023, 38(3):704-716. DOI: 10.16337/j.1004-9037.2023.03.018

[Abstract](536) [HTML](504) [PDF 1.39 M](1033)

Abstract:
In cross-corpus speech emotion recognition， the mismatch between target domain and source domain samples leads to poor performance of emotion recognition. In order to improve the cross-corpus speech emotion recognition performance， this paper proposes a cross-corpus speech emotion recognition method based on deep domain adaptation and convolutional neural network （CNN） decision tree model. Firstly， a local feature transfer learning network based on joint constrained deep domain adaptation is constructed. By minimizing the joint difference between the target and source domains in the feature space and Hilbert space， the correlation between the two corpora is mined and the transferable invariant features from the target domain to the source domain are learned. Then， in order to reduce the classification error of confusable emotions among multiple emotions in the cross-corpus context， a CNN decision tree multi-level classification model is constructed based on the emotional confusion degree， and multiple emotions are first coarsely classified and then finely classified. The experiments are validated using three corpora， CASIA， EMO-DB and RAVDESS. The results show that the average recognition rate of the proposed cross-corpus speech emotion recognition method are 19.32%—31.08% higher than that of CNN baseline method， and the system performance is greatly improved.

36 Analysis on Evolution of Netizens’ Emotional in Emergencies Based on Epidemic Model

Zhong Zhaoman , Li Heng , Yang Hong , Guan Yan

2023, 38(3):676-689. DOI: 10.16337/j.1004-9037.2023.03.016

[Abstract](653) [HTML](746) [PDF 2.14 M](972)

Abstract:
After the occurrence of an emergency， it is of great practical significance to accurately analyze the emotional state of netizens and guide the evolution of the emotional state of netizens to control public opinion on an emergency and maintain social stability. According to the characteristics of netizens’ comments on emergencies， a complete set of netizens’ emotional states are constructed， and different emotional sets are established from the perspectives of stakeholders and emergencies themselves. According to the transmission mode of epidemic model， the evolution models of netizens’ emotional states EP-SIS and EO-SIS are established based on susceptible-infectious-susceptible（SIS） epiddemic model. An empirical study is made on the model by using the Weibo comments of netizens on the “New pneumonia virus”， and the weight of influencing factors is obtained. The negative emotion conversion rate of the model for netizens is 0.72. EP-SIS and EO-SIS， the emotional evolution models of netizens in emergencies constructed in this study， can intervene from different angles to make the negative emotional state of netizens evolve in emergencies.

37 Joint Inference of Visual Attention and Semantic Perception for Scene Text Recognition

Tong Guoxiang , Dong Tianrong , HU Hengzhang

2023, 38(3):665-675. DOI: 10.16337/j.1004-9037.2023.03.015

[Abstract](798) [HTML](476) [PDF 2.82 M](1124)

Abstract:
Irregular text recognition in scenes is still a challenging problem. For arbitrary shapes and low-quality text in scenes， this paper proposes a multimodal network that combines a visual attention module and a semantic perception module. The visual attention module uses a parallel attention-based approach to extract visual features of images combined with positional encoding. The semantic perception module based on weak supervised learning is used to learn linguistic information to compensate for the deficiencies of visual features. The module uses a Transformer-based variant that improves the model’s contextual semantic inference by randomly masking a character in a word for training. The visual semantic fusion module interacts information from different modalities through a gating mechanism to generate robust features for character prediction. The proposed approach is demonstrated through extensive experiments to be effective in recognizing arbitrarily shaped and low-quality scene text， and competitive results are obtained on several benchmark datasets. In particular， accuracy rates of 93.6% and 86.2% are achieved for the datasets SVT and SVTP， which contain low-quality text， respectively. Compared with the method containing only the visual module， the accuracy is improved by 3.5% and 3.9%， respectively， which fully demonstrates the importance of semantic information for text recognition.

38 Improved K-means Clustering Algorithm Based on Tukey Rule and Initial Center Point Optimization

Liu Jing , Qiu Ziying , Gao Maozu , Yu Donghua

2023, 38(3):643-651. DOI: 10.16337/j.1004-9037.2023.03.013

[Abstract](635) [HTML](491) [PDF 941.15 K](1150)

Abstract:
Aiming at shortcomings of the K-means algorithm to be improved， such as selection of initial center points and the problems that abnormal points and outliers can easily affect the clustering results， this paper proposes an improved K-means algorithm based on Tukey rules and optimizing initial center points selection. The proposed algorithm uses Tukey rules to construct core and non-core subsets， and divides the clustering process into two stages. At the same time， the strategy of increasing the center points one by one is implemented on the core subset to optimize the initial center points. The clustering results on 20 real-world datasets from UCI show that the proposed algorithm is better than the most popular K-means++ clustering algorithm and effectively improves the clustering performance.

39 Unsupervised Truth Discovery Method Based on Multi-feature Fusion

Chen Huafeng , Dong Yongquan , Yang Haolin , Zhang Guoxi

2023, 38(3):629-642. DOI: 10.16337/j.1004-9037.2023.03.012

[Abstract](649) [HTML](506) [PDF 1020.11 K](1043)

Abstract:
Truth discovery is one of the challenging research hotspots in the field of data integration. Traditional methods use the interaction between data sources and values to infer the truth， which lack sufficient feature information. Deep learning-based methods can effectively perform feature extraction， but their performance depends on a large number of manual annotations， and it is difficult to obtain a large number of high-quality truth labels in practical applications. To overcome these problems， this paper proposes an unsupervised truth discovery method based on multi-feature fusion（MFUTD）. First， ensemble learning is used to label truth without supervision. Then， the pre-training Bert model and the one-hot coding method are used to obtain the semantic features and interactive features of the values. Finally， the initial training set is constructed by fusing multiple features of the values and using their “truth” labels to train the truth prediction model by self-training. Experimental results on two real data sets show that the proposed method has the higher truth discovery accuracy than the existing methods.

40 Gaussian Kernel Approximation Model Selection Algorithm Based on Random Fourier Feature Space

Zhang Kai , Men Changqian , Wang Wenjian

2023, 38(3):616-628. DOI: 10.16337/j.1004-9037.2023.03.011

[Abstract](574) [HTML](784) [PDF 1.45 M](1219)

Abstract:
Kernel method transforms the linear non-separable problem in low-dimensional space into the linear separable problem in high-dimensional space. It is widely used in a variety of learning models. However， the existing kernel selection methods have low computational efficiency and high time cost in large-scale data. Aiming at above problems， this paper introduces the random Fourier feature to transform the original kernel feature space into another relatively low dimensional explicit random feature space. The theoretical analysis of the upper bound of the kernel approximation error and the upper bound of the error of training the learning model in the kernel approximation random feature space are given. The convergence consistency of kernel approximation and the relationship between error upper bound and kernel approximation parameters are obtained. Moreover， the optimal model parameters are selected based on random Fourier feature space， which can avoid the large-scale search for the optimal original Gaussian kernel model parameters， so as to greatly reduce the time cost required for the selection of the original Gaussian kernel model. Experiments show that the error upper bound proved in this paper is controlled by the kernel approximation parameters. The optimal model selected by the kernel approximation has good performance compared with the original Gaussian kernel function model， and the model selection time is greatly reduced compared with the grid search method.

41 Segmentation of Al-Si Alloy Microscopic Image by Fusing Class Attention

SHEN Tao , JIN Kai , SI Changkai , ZHENG Jianfeng , LIU Yingli

2023, 38(3):574-585. DOI: 10.16337/j.1004-9037.2023.03.007

[Abstract](479) [HTML](615) [PDF 4.29 M](1241)

Abstract:
An improved model of class attention network （CA-Net） incorporating a class attention block （CAB） is proposed to extract the primary silicon regions of the microscopic images of Al-Si alloys in this paper. The correlation information of each channel to each class is calculated from the feature map by class attention block， and the correlation information of different classes is fused to generate attention weights for correlating the weights of feature channels with their contributions to the class in the task， thus the representation of important features is enhanced and the interference of irrelevant features is suppressed. Experiments are conducted on the Al-Si alloy microscopic image dataset， and the proposed method obtains results of 94.82%， 90.16%， 94.54%， 98.80%， and 97.97% for Dice coefficient， Jaccard similarity， sensitivity， specificity， and segmentation accuracy， respectively. The proposed CA-Net can effectively improve the segmentation effect of the primary silicon region in Al-Si alloy microscopic images compared with CCNet， SPNet， TA-Net， and other methods.

42 Multi-shapelet : A Multivariate Time Series Classification Method Based on Shapelet

ZHAN Xi , LI Wei , PAN Zhisong

2023, 38(2):386-400. DOI: 10.16337/j.1004-9037.2023.02.014

[Abstract](860) [HTML](963) [PDF 1.85 M](1713)

Abstract:
Shapelet is the most identifiable subsequence in time series， which has been extensively studied by researchers from various fields since it was proposed. In this process， many effective shapelet discovery techniques have been proposed for time series classification. However， candidate shapelets of multivariate time series may have different lengths and different sources of variables， making it difficult to directly compare them， which presents a unique challenge to the classification method of multivariable time series based on shapelet. we propose Multi-shapelet， a multivariate time series classification method based on unsupervised representation learning and shapelets. Firstly， Multi-shapelet uses a hybrid model DC-GNN （Dilated convolution neural network and graph neural network） as an encoder to embed candidate shapelets of different lengths into a unified shapelet selection space for comparison between shapelets. Secondly， a new loss function is proposed to train the encoder in an unsupervised learning manner， so that after DC-GNN encodes the shapelet to obtain the corresponding embedding， the topology and the original space formed by the relative positions between the embeddings corresponding to the shapelet belonging to the same class. The relationship between the topologies formed by the relative positions of the shapelet in the middle is closer to a proportional reduction， which is very important for the subsequent similarity-based pruning process. Finally， the K-means clustering and simulated annealing algorithm are proposed to prune and select shapelets to select a set of shapelets with strong classification ability. Experimental results on 18 UEA multivariable time series datasets show that the overall accuracy of Multi-shapelet is significantly better than other methods.

43 Video-Based Person Re-identification Algorithm Based on Feature Block Reconstruction

WANG Jinhua , ZHOU Fei , BAI Menglin , SHU Haofeng

2023, 38(3):565-573. DOI: 10.16337/j.1004-9037.2023.03.006

[Abstract](500) [HTML](446) [PDF 1.48 M](1087)

Abstract:
Video-based person re-identification （Re-ID） is to match a video track with a clipped video frame， so as to recognize the same pedestrian under different cameras. However， due to the complexity of the real scene， the collected pedestrian trajectories will have serious appearance loss and dislocation， and the traditional 3D convolution will no longer be suitable for the video pedestrian re-identification task. Therefore， a 3D feature block reconstruction model（3D-FBRM） is proposed， which uses the first feature map to align subsequent feature maps at the level of horizontal blocks. In order to fully mine the time information of the trajectory under the premise of ensuring the quality of the features， a 3D convolution kernel is added after the FBRM， and it is combined with the existing 3D ConvNets. In addition， a coarse-to-fine feature block reconstruction network（CF-FBRNet） is introduced， which not only enables the model to perform feature reconstruction in two different scales of spatial dimensions， but also further reduces computational overhead. Experiments show that the CF-FBRNet achieves state-of-the-art results on the MARS and DukeMTMC-VideoReID datasets.

44 Hyperspectral Image Denoising Based on Superpixel Block Clustering and Low-Rank Characteristics

ZHANG Minghua , WU Xuan , SONG Wei , MEI Haibin , HE Qi , SU Cheng

2023, 38(3):549-564. DOI: 10.16337/j.1004-9037.2023.03.005

[Abstract](725) [HTML](562) [PDF 10.70 M](1546)

Abstract:
Hyperspectral images are usually contaminated by Gaussian noise， impulse noise， dead lines and stripes. So， denoising is an essential step. The existing denoising methods based on low-rank characteristics introduce spatial information to improve the noise reduction effect. But because they often only use local similarity or non-local self-similarity， it has poor removal effect of sparse noise with structural information in the spectral dimension. Therefore， we propose a hyperspectral image denoising method based on superpixel block clustering and low-rank characteristics. The method realizes the adaptive partition and clustering of blocks， and makes full use of the non-local spatial self-similarity while retaining the local details. The experiments show that the same object block composed of clustered superpixel blocks has a good spatial-spectral dual low-rank attributes. Firstly， a superpixel segmentation method is applied to hyperspectral images， and the superpixel blocks are clustered to obtain the same object blocks. Secondly， the low-rank matrix restoration model is established and solved， and finally the denoised image is obtained. We conduct experiments on simulated data and real data respectively， and compare with other methods based on low-rank characteristics. The results show that this method has better denoising performance for mixed noise， especially sparse noise with structural information.

45 Multi-label Feature Selection Based on Label Complementarity

YU Ying , ZHANG Zhiqiang , QIAN Jin , WAN Ming

2023, 38(3):539-548. DOI: 10.16337/j.1004-9037.2023.03.004

[Abstract](484) [HTML](437) [PDF 1.67 M](1216)

Abstract:
Multi-label feature selection is an important research component in the field of multi-label learning. Existing multi-label feature selection methods mainly measure the importance of each feature based on the dependency between features and labels， and the redundancy among features. Then， feature ranking is performed based on feature importance， often ignoring the influence of label relationships on feature importance. To solve this problem， a multi-label feature selection algorithm based on label complementarity（MLLC） is designed， which introduces neighbourhood mutual information. The algorithm takes dependency， redundancy and label relationships as the evaluation elements of feature importance. And then it redesigns the feature importance evaluation function based on these three elements， so as to select features with stronger discriminative power and achieve better classification performance. Finally， the effectiveness and robustness of the algorithm are verified on six classical multi-label datasets.

46 Solution Method of Gaussian Mixture Model with Statistical-Aware Strategy

Chen Jiaqi , He Yulin , Huang Zhexue , Fournier-Viger Philippe

2023, 38(3):525-538. DOI: 10.16337/j.1004-9037.2023.03.003

[Abstract](694) [HTML](1042) [PDF 3.72 M](1157)

Abstract:
Gaussian mixture model （GMM） is a classic probability model， which is usually used in the field of unsupervised learning to determine the class distribution of unlabeled samples. As an important method for solving GMM parameters， the expectation-maximization （EM） algorithm determines the parameters and component coefficients by calculating the optimal solution of the GMM likelihood function. The use of EM algorithm to solve GMM has the following two defects： EM algorithm is prone to getting stuck in a local optimal solution， and the relevant parameters of the GMM basic model determined by the EM algorithm are unstable， especially for high-dimensional data. For this reason， this paper proposes a GMM solution method based on statistical-aware （SA） strategy， i.e. SA-GMM method. Starting from the estimation of the unknown probability density function of a given data set， the method establishes the correlation between kernel density estimation （KDE） technology and GMM. To avoid the selection of KDE’s over-smoothing bandwidth， the goal is to simultaneously minimize the empirical risk between KDE and GMM and the structural risk of KDE’s bandwidth. The experiments on 11 standard probability distributions confirm the feasibility， rationality， and effectiveness of SA-GMM. And it is also shown that the proposed SA-GMM method can obtain the better performance on probability density function estimation than EM-based GMM and its variant.

47 Unsupervised Learning Pedestrian Re-identification Based on Localized Instance Matching

WU Haili , Zhang Yueqin , PANG Junqi

2023, 38(4):947-958. DOI: 10.16337/j.1004-9037.2023.04.017

[Abstract](586) [HTML](662) [PDF 2.44 M](949)

Abstract:
Unsupervised domain adaptation （UDA） methods leverage global feature distribution matching to realize knowledge transfer from source domain to target domain， while ignoring fine-grained local instance information. An unsupervised person re-identification method based on two-tiered domain adaptation （TTDA） is proposed， in which the omni-scale network（OSNet） is selected as the backbone network， and global feature distribution matching and localized instance matching between source and target domains are performed jointly in an end-to-end deep learning framework. And in order to effectively mine transferable useful knowledge from associations of different pedestrian IDs between source and target domains， the cross-domain adaptability is improved with a knowledge selection mechanism. Experimental results on multiple large-scale public datasets show that compared with other state-of-the-art methods， the proposed method achieves significant improvements in terms of mean average precision （mAP） and top-k hit rate for unsupervised cross-domain person re-identification tasks.

48 Residual Inception and Bidirectional ConvGRU Empowered Intelligent Segmentation for Skin Lesion

GU Minjie , LI Xue , CHEN Siguang

2023, 38(4):937-946. DOI: 10.16337/j.1004-9037.2023.04.016

[Abstract](562) [HTML](773) [PDF 1.32 M](829)

Abstract:
The shape， color and texture of skin lesions are very different， and the boundaries are not clear， which makes it difficult for the traditional deep learning methods to segment them accurately. Based on the above challenge， this paper proposes a residual Inception and bidirectional convolutional gated recurrent unit （ConvGRU） empowered intelligent segmentation model for skin lesion. Specifically， a cloud-edge collaboration intelligent segmentation service network model for skin lesion is firstly designed. By this network model， users can obtain quick and accurate segmentation services. Furthermore， a novel intelligent segmentation model for skin lesion is developed. By integrating residual Inception and bidirectional ConvGRU， this model can fuse multi-scale features and make full use of the relationship between low-level features and semantic features. It improves the ability of the model to extract features and capture global context information， and leads to better segmentation performance. Finally， experimental results on ISIC 2018 dataset show that the proposed intelligent segmentation model achieves higher accuracy and Jaccard coefficient than several recently proposed U-Net extended models.

49 Expert Recommendation Method Combining Multi-features and Bi-directional Graph Classification

DING Jingxian , LI Xiang , SUN Jizhou , ZHOU Hong

2023, 38(5):1214-1225. DOI: 10.16337/j.1004-9037.2023.05.019

[Abstract](604) [HTML](716) [PDF 1.15 M](860)

Abstract:
Expert recommendation is a research hotspot in the field of recommendation system. The rationality of expert information feature extraction directly affects the accuracy of recommendation. However， most expert recommendation methods donot build text graphs of feature relation for multi-source information， and ignore the correlation between attribute features. Additionally， most expert recommendation methods cannot expand the features of knowledge field according to the relevance of text graph. Therefore， we propose CMFBG， an expert recommendation method combining multi-features and bi-directional graph classification. Specifically， CMFBG obtains multi-feature information of experts through multi-source information fusion， and construct text graphs for different attribute features within categories. Then， CMFBG employs bidirectional encoder representation from transformer （BERT） and graph convolutional network （GCN） models to extract features and fuse them. Finally， CMFBG employs the bidirectional attention mechanism to enhance the extension of the source data to the graph features and realize the classification of the graph structure. The experimental analysis on the same expert data set shows that the precision of CMFBG is 91.71% higher than other algorithms in the task of graph classification.

50 Aspect-Based Sentiment Analysis of Emergencies Based on Interactive Attention

ZHONG Zhaoman , HUANG Xianbo , XIONG Yulong

2023, 38(5):1206-1213. DOI: 10.16337/j.1004-9037.2023.05.018

[Abstract](513) [HTML](772) [PDF 1.35 M](875)

Abstract:
In order to accurately analyze the sentiment of Internet users towards different objects in breaking events， a method of fine-grained sentiment analysis of breaking events based on RoBERTa word embedding and interactive attention is proposed. By constructing a RoBERTa-CRF comment object extraction model， the extraction of comment objects related to breaking events is completed. The RoBBETa-IAN model is constructed using the interactive attention mechanism and pre-training model to achieve the sentiment analysis of comment objects. Finally， the sentiments of Internet users towards different objects in breaking events are analyzed and visualised. On the constructed Weibo news comment dataset， the F1 values of the RoBERTa-CRF comment object extraction model and the RoBERTa-IAN sentiment analysis model are 0.76 and 0.79 respectively.

51 Hesitant Fuzzy Linguistic Information Option Prioritizing Method Based on Data-Driven

ZHU Jun , CHEN Lu , XU Haiyan

2023, 38(5):1191-1205. DOI: 10.16337/j.1004-9037.2023.05.017

[Abstract](559) [HTML](454) [PDF 1.08 M](777)

Abstract:
Data-driven makes it more convenient and effective for decision-makers to obtain information. Under the theoretical framework of graph model for conflict resolution， this paper firstly mines conflict strategies based on data-driven， and realizes the rational construction of conflict strategies. Secondly， considering that decision-makers’ choice of a certain strategy is more likely to be a possibility of being selected in real conflicts， this paper effectively integrates the hesitant fuzzy linguistic information with the theory of graph model for conflict resolution， and uses the hesitant fuzzy linguistic information for evaluation. Based on the rough set theory， the information of hesitant fuzzy semantic evaluation is aggregated to represent this possibility. Furthermore， a new option prioritizing method for graph model of conflict resolution based on hesitant fuzzy linguistic information is proposed. Finally， the cross-border water pollution of the Shu River are modeled and analyzed to compare the novel and classic methods， so as to verify the rationality of the method proposed in this paper.

52 Recognition of Vietnamese Text in Natural Scene Based on Modified DAN

Wang Libing , Feng Yate , Wen Yimin

2023, 38(5):1058-1068. DOI: 10.16337/j.1004-9037.2023.05.005

[Abstract](661) [HTML](551) [PDF 3.88 M](847)

Abstract:
Vietnamese characters which are composed of Latin characters and diacritic symbols make recognition more challenging. On the one hand， diacritic symbols are more likely to lead to attention drift. On the other hand， Vietnamese characters include many categories， and the differences between characters are small， for example some characters only differ from diacritical symbols， which further increases difficulty of recognition. Based on the decoupled attention network （DAN） algorithm， this paper designs a visual feature and sequence feature fusion module （VSFM）， which utilizes bidirectional gated recurrent unit （Bi-GRU） to model sequences in the horizontal and vertical directions， further alleviating attention drift and enhancing correlation between diacritics and Latin characters. And an enhanced decoupled text decoder module （ETDM） is designed， which employs more feature information to identify similar characters more effectively. A series of experiments validate the effectiveness of the proposed method.

53 Generate Adversarial Depth Repair Under Structural Constraints

Lu Qi , Gong Xun

2023, 38(5):1048-1057. DOI: 10.16337/j.1004-9037.2023.05.004

[Abstract](579) [HTML](433) [PDF 2.89 M](874)

Abstract:
Unlike RGB images， pixels in depth images represent the distance from the acquisition device to the points of the scene， and the direct use of inpainting methods for the natural image can not effectively restore the scene structure of missing areas in deep images. This paper proposes a two-stage code structure generation counter-network to solve the problem of deep image inpainting. Unlike standard generative adversarial network （GAN） models， the generator network in this paper includes depth build G1 and depth repair G2 modules. G1 obtains the predicted depth from the RGB image， replacing the missing area of the depth image to be repaired， and ensuring the local structure consistency of the repair area. G2 introduces RGB image edge structure to ensure global structure consistency. The consistency of the missing areas， which is not considered in the existing image inpainting methods， is solved by a structure consistency attention module （SCA） embeded into G2. The proposed depth image repairing model is verified on several mainstream data sets， showing that the effect of structural constraints， and the combination of the generator and discriminator is evident.

54 Deep Learning Based Salient Object Detection: A Survey

SUN Han , LIU Yishan , LIN Yuhan

2023, 38(1):21-50. DOI: 10.16337/j.1004-9037.2023.01.002

[Abstract](2592) [HTML](1318) [PDF 5.89 M](4897)

Abstract:
Salient object detection has been widely used in computer vision tasks such as image understanding， semantic segmentation， and object tracking by simulating the human visual system to find the most attractive targets for visual attention. With the rapid development of deep learning technology， salient object detection research has made great breakthroughs. This paper presents a comprehensive and systematic survey of salient object detection based on RGB images， RGB-D/T （Depth/Thermal） images， and light field images in the past five years. Firstly， the task characteristics and research difficulties of the three research branches are analyzed. Then the research technical route of each branch is expounded and the advantages and disadvantages are analyzed. At the same time， the mainstream datasets and common performance evaluation indexes of three kinds of research branches are introduced. Finally， possible future research trends are prospected.

55 An Interference Suppression Scheme Based on Deep Residual Neural Networks for GNSS Receivers

ZHANG Guomei , ZHANG Xin , YIN Jiawen , WANG Hua

2023, 38(2):293-303. DOI: 10.16337/j.1004-9037.2023.02.006

[Abstract](860) [HTML](852) [PDF 2.47 M](1716)

Abstract:
In the complex application environment of the global satellite navigation system （GNSS）， where various kinds of suppressive interference and spoofing randomly exist， the traditional interference suppressing method that first estimates the interference parameters and then canceles interference signal， will be designed difficultly and has low generality， because the special parameter estimators and the interference reducing methods are needed for various types of interference. Therefore， an interference suppression scheme based on deep residual neural networks （DRNNs） is proposed in this paper. First， the corresponding DRNN is built and trained for each typical GNSS interference. It can directly extract the target satellite signal from the interfered signal. Second， according to the interference classification and recognition result， the corresponding DRNN is selected. The time-frequency two dimensional （2D） signals obtained by short-time Fourier transform over the received one-dimensional signal are then entered into the chosen DRNN. The output is the 2D time-frequency spectrum of the useful signal， where the impact of the interference has been suppressed. In our scheme， the same procedure is applied for different kinds of suppressive interference and spoofing. It is not required to design the special designs about the parameter estimation and the interference reduction for various interferences. Experimental results show that the proposed scheme can effectively suppress various GNSS interference， compared with the traditional scheme. It demonstrates a certain of commonality.

56 Document Level Relationship Extraction Based on Context Coreference Entity Dependence

Xia Zhengxin , Su Chong , Liu Yong

2023, 38(5):1226-1234. DOI: 10.16337/j.1004-9037.2023.05.020

[Abstract](497) [HTML](633) [PDF 1.50 M](779)

Abstract:
Document relationship extraction （DRE） is designed to identify the relationship between entities in multiple sentences， and entities may correspond to multiple mentions across sentence boundaries， in which the pronoun entity mention is a common grammatical phenomenon due to the connection between sentences， and is also an important factor affecting sentence reasoning. However， most of the previous studies focused on the relationship between common entity references， but paid little attention to the co-reference and relational capture of pronoun entity references. Therefore， we propose a contextual coreference entity dependency （CCED） model， that is， by integrating common entity and pronoun entity representation to build a context graph structure of co-referring entity dependency， and carry out global interactive reasoning between entity pairs on the graph， so as to model the interdependence of entity relations. We evaluated the CCED model in the public datasets DocRED， DialogRE and MPDD， respectively. The results showed that the CCED model improved Ign F₁ performance by 0.55% on the DocRED dataset compared with DocuNet-BERT， the best baseline model. And F₁ score performance increased by 0.35%. In terms of the DialogRE and MPDD datasets， the CCED model improved F₁ performance by 1.02% in DialogRE test sets and ACC performance by 1.19% in MPDD test sets compared with COLN， the best-performing baseline model. The experimental results verify the effectiveness of the new model for document-level relationship extraction.

57 Research on Self-organizing Map Network Based on Gaussian Neuron

LIU Da , CHEN Songcan

2023, 38(1):85-92. DOI: 10.16337/j.1004-9037.2023.01.006

[Abstract](725) [HTML](332) [PDF 1.66 M](1016)

Abstract:
Self-organizing map network （SOM） is a classic unsupervised learning method with self-organizing and online learning functions. Due to its simplicity and practicality， SOM variants have been emerging to adapt to various problems. However， these work basically adopts deterministic neurons to build networks， ignoring the uncertainty information implicit in the data itself. This results in a lack of interpretability reflected by confidence in the results of these models， implying that the uncertainty characterization ability of SOM neurons is insufficient. This article proposes a new variant of SOM， called the Gaussian neuron SOM network （GNSOM）. Its neuron nodes are no longer deterministic， but modeled as Gaussian neurons with Gaussian distribution. Thus， SOM is equipped with an uncertainty function to express the uncertainty of the data. In implementation， the input data are also Gaussianized， and the Jensen-Shannon （JS） divergence is used to replace the Euclidean distance as the similarity matching metric in GNSOM learning， thereby obtaining the uncertainty representation. The experimental results show that GNSOM has a better training effect， and can reflect the uncertainty of the data through the covariance matrix of the neuron node. Since this Gaussization of neurons is independent of SOM itself， it can be extended to other neuron models.

58 Zero Resource Korean ASR Based on Acoustic Model Sharing

Wang Haoyu , Jeon Eunah , Zhang Weiqiang , Li Ke , Huang Yukai

2023, 38(1):93-100. DOI: 10.16337/j.1004-9037.2023.01.007

[Abstract](845) [HTML](713) [PDF 1.22 M](1966)

Abstract:
A precise speech recognition system usually is based on a large amount of training data with handcrafted transcription， which sets a barrier to the recognition of many low-resource languages. Acoustic model sharing， which is based on the similarity of certain rich and low resource language pair， provides a new method to solve the problem and helps to build an automatic speech recognition （ASR） system without any training data of the given low resource language. This paper expands the method to Korean speech recognition. Specifically， we train an acoustic model on Mandarin data， and lay down a set of mapping rules between Mandarin and Korean phonemes. A character error rate （CER） of 27.33% is achieved on Zeroth Korean test set without using any Korean speech data. Moreover， we also test the difference between source-to-target and target-to-source phoneme mapping rules， and prove that the latter is more appropriate for acoustic model sharing.

59 Design of FPGA Accelerator for Radar Intelligent Anti-jamming Decision-Making Based on Deep Reinforcement Learning

Li Ziyu , Ge Fen , Zhang Jindong , Zhao Jiachen

2023, 38(5):1151-1161. DOI: 10.16337/j.1004-9037.2023.05.013

[Abstract](972) [HTML](858) [PDF 1.67 M](1107)

Abstract:
Aiming at the continuous intelligent anti-jamming decision-making and high real-time requirements of radar in high-dynamic environment， this paper constructs a deep Q network （DQN） model for radar intelligent anti-jamming decision-making， and proposes a hardware decision acceleration architecture based on field programmable gate array(FPGA）. In this architecture， an on-chip access mode is designed for radar intelligent decision-making environment interaction to improve real-time performance， which simplifies the iterative process of continuous decision-making of the DQN agent through the on-chip quantitative storage and state iterative calculation for environment interaction. In the proposed architecture， both the parallel computing and pipeline control acceleration of agent deep neural network are adopted， which further improves the real-time performance of decision-making. Simulation and experimental results show that， on the premise of ensuring the accuracy of decision-making， the designed intelligent anti-jamming decision-making accelerator achieves a speedup of nearly 46 times in single decision-making and a speedup of nearly 84 times in continuous decision-making compared with the existing decision-making system based on the CPU platform.

60 Hyperspectral Image Fusion via Deep Unfolding and Dual-stream Networks

LIU Cong , YAO Jiahao

2023, 38(6):1406-1421. DOI: 10.16337/j.1004-9037.2023.06.015

[Abstract](736) [HTML](463) [PDF 3.02 M](1088)

Abstract:
Hyperspectral image fusion algorithms based on deep learning typically stack multiple convolutional layers to learn mapping relationships， which suffer from the problems of not fully utilizing the characteristics of the task and lack of interpretability. To address these problems， this paper proposes a deep network combining deep unfolding and dual-stream networks. Firstly， an image fusion model is established using convolutional sparse coding， which maps low-resolution hyperspectral images （LR-HSI） and high-resolution multispectral images （HR-MSI） into a low-dimensional subspace. In the design of the fusion model， we consider the common information of LR-HSI and HR-MSI as well as the unique information of LR-HSI， and add HR-MSI to the model as auxiliary information. Next， the fusion model is unfolded into a learnable interpretable deep network. Finally， the dual-stream network is used to get more accurate high-resolution hyperspectral images （HR-HSI）. Experiments prove that the network obtains excellent results in the hyperspectral image fusion task.

61 Automatic Sleep Staging Based on Deep Learning: A Review

LIU Ying , CHU Haoran , ZHANG Haowei

2023, 38(4):759-776. DOI: 10.16337/j.1004-9037.2023.04.002

[Abstract](2119) [HTML](1754) [PDF 5.02 M](2436)

Abstract:
Sleep staging is a vital process for analyzing polysomnographic recordings， which plays a key role in sleep monitoring and diagnosis of sleep disorders. Traditional manual sleep staging requires expertise， which is cumbersome and time-consuming. Deep learning constructs models by simulating the mechanism of human brain to interpret information， and has powerful automatic feature extraction and feature expression functions. Applying deep learning method to the research of sleep staging does not rely on manually designed features and can realize the automation of sleep staging. This article emphasizes on some typical automatic sleep staging studies since 2017， and conducts a systematic review of deep learning model applied in automatic sleep staging from two aspects of single-view and multi-view input. Then， the difficulties of deep learning model based on multi-view input are analyzed and its potential research value is pointed out. Finally， possible future research direction is discussed.

62 Robust Nonnegative Matrix Factorization with Local Similarity Learning

HOU Xingrong , PENG Chong

2023, 38(5):1125-1141. DOI: 10.16337/j.1004-9037.2023.05.011

[Abstract](575) [HTML](366) [PDF 2.38 M](793)

Abstract:
The existing nonnegative matrix factorization methods mainly focus on learning global structure of the data， while ignoring the learning of local information. Meanwhile， for those methods that attempt to exploit local similarity， the manifold learning is often adopted， which suffers some issues. To solve this problem， a new method named the robust nonnegative matrix factorization with local similarity learning （RLS-NMF） is proposed. In this paper， a new local similarity learning method is adopted， which is starkly different from the widely used manifold learning. Moreover， the new method can simultaneously learn the global structural information of the data， and thus exploit the intra-class similarity and the inter-class separability of the data. To address the issues of outliers and noise effects in real word applications， the l2,1 norm is used to fit the residuals to filter the redundant noise information， ensuring the robustness of the algorithm. Extensive experimental results show the superior performance of the proposed method on several benchmark datasets， further demonstrating its effectiveness.

63 Contrast-Enhanced Ultrasound Analysis Based on Machine Learning: A Survey

WAN Peng , LIU Han , ZHAO Junyong , XUE Haiyan , LIU Chunrui , SHAO Wei , KONG Wentao , ZHANG Daoqiang

2023, 38(4):741-758. DOI: 10.16337/j.1004-9037.2023.04.001

[Abstract](1410) [HTML](1426) [PDF 3.62 M](1901)

Abstract:
Contrast-enhanced ultrasound （CEUS） is a powerful diagnostic tool that enhances blood flow signals from tumor micro-vessels through the peripheral venous injection of ultrasound contrast agents. This enables clinical physicians to dynamically evaluate tumor angiogenesis in real-time. CEUS imaging is widely used for the diagnosis， postoperative evaluation， and treatment planning of multiple organs. In recent years， deep learning techniques have made considerable progress， offering new opportunities for the intelligent analysis of dynamic CEUS. Deep learning methods have widened the scope of clinical applications largely， improving its efficacy of diagnosis and treatment. However， similar to the traditional ultrasound imaging， CEUS is faced with the challenges of interference from speckle noise， respiratory motion， and low standardization， making the analysis of spatial-temporal information of dynamic perfusion become difficult. This paper systematically reviews recent research on the intelligent analysis of CEUS， covering clinical applications such as benign-malignant differentiation， malignant grading， therapeutic prediction， and the selection of diagnosis and treatment plans. We summarize the latest advances of radiomic and deep learning methods in the area of CEUS imaging analysis， and highlight the limitations of current research and future directions for development.

64 Weakly Supervised Video Anomaly Detection Based on Spatio-Temporal Dependence and Feature Fusion

LIU Deyun , LI Ying , ZHOU Zhen , JI Genlin

2024, 39(1):204-214. DOI: 10.16337/j.1004-9037.2024.01.018

[Abstract](706) [HTML](557) [PDF 2.44 M](981)

Abstract:
Weakly supervised video anomaly detection has become a hot spot in video anomaly detection research due to its strong anti-interference and low data labeling requirements. In the existing methods， most of the weakly supervised video anomaly detection methods assume that the clips in each video distribute independently， and determine whether it is abnormal for each video clip independently， ignoring the temporal and spatial information between video clips. To alleviate these problems， this paper proposes a weakly supervised anomaly detection method based on spatio-temporal dependence and feature fusion. Retaining the original characteristics of video clips， this method uses the distance of index and the similarity of features between video clips to fit the time dependence and the spatial dependencies of video， which builds the relationship characteristics of video clips. By fusing the original features and relationship features， the dynamic characteristics and temporal relationship of videos can be better expressed. Extensive experiments on two benchmark datasets， UCF-Crime and ShanghaiTech， demonstrate that the proposed method outperforms other methods with the AUC values reaching 80.1% and 94.6%， respectively.

65 A Distributed Local Clustering Method for Large-Scale Resource Discovery

MENG Xinyu , PAN Wenyu , MA Yining

2024, 39(1):215-222. DOI: 10.16337/j.1004-9037.2024.01.019

[Abstract](457) [HTML](368) [PDF 701.27 K](881)

Abstract:
In large-scale resource environments， traditional resource indexing mechanisms lead to a rapid increase in the number of Peer nodes and a decrease in load balancing performance， affecting query efficiency and system stability. This paper introduces a centroid model-based local resource clustering method， which clusters similar resources at a single node and selects a representative key value， effectively reducing the scale of Peer nodes in the peer-to-peer（P2P） network. Additionally， the local clustering mechanism focuses on processing closely related key values， thus preventing excessive expansion of resource coverage. Experimental results demonstrate that the Skip Graph algorithm based on the centroid model not only reduces query complexity and improves load balancing performance， but also exhibits excellent scalability in terms of network size， data volume， and query complexity， better adapting to the needs of large-scale resource discovery.

66 Polyp Segmentation Network Based on Multiple Attention and schatten-p Norm

LI Su , LIU Guoqi , LIU Dong , ZHAO Manqi

2024, 39(1):223-235. DOI: 10.16337/j.1004-9037.2024.01.020

[Abstract](674) [HTML](490) [PDF 4.76 M](952)

Abstract:
Automatic and accurate polyp localization and segmentation methods can detect polyps in a timely manner in the early stage of colorectal cancer lesions， greatly reducing the risk of cancer transformation. The encoder-decoder architecture， as the most mainstream network structure in polyp segmentation in recent years， has been greatly improved， such as improving the model’s ability to capture global contextual and local features， and using deep features to guide shallow decoding. However， polyps vary in shape and size， and due to their convolutional nature， they are prone to getting too caught up in local information mining and losing remote information dependencies during encoding. Some polyp images also have low contrast and complex spatial characteristics， which makes it easy to confuse the polyp with the background. Based on this， this paper proposes a polyp segmentation network based on multiple attention and schatten-p norm（MASNet）. Among them， the axial multiple attention module utilizes axial attention to supplement remote contextual relationships in the image， while also paying attention to boundary and background information to achieve feature complementarity. It enhances the capture of local detail features while paying attention to global features. By utilizing the correlation between matrix singular values and matrix implicit information， the schatten-p norm is introduced as a constraint to analyze the data from a matrix perspective and assist the model in distinguishing foreground and background. By setting up a large number of experiments， the effectiveness of the proposed method is proven， and MASNet achieves the best segmentation results by comparing different advanced methods on the Kvasir-SEG dataset.

67 Cooperative Cognitive Jamming in Low-Altitude Intelligent Network Based on Digital Twin and Reinforcement Learning

SHEN Gaoqing , CAI Shengsuo , LEI Lei , BEN De

2024, 39(1):15-30. DOI: 10.16337/j.1004-9037.2024.01.003

[Abstract](1668) [HTML](2430) [PDF 2.45 M](1384)

Abstract:
To address the issue of resource allocation for multiple electronic jamming unmanned aerial vehicles （UAVs） against multiple multifunctional radars in the low-altitude intelligent network cooperative cognitive jamming decision-making process， a cognitive jamming decision-making approach based on digital twinning and deep reinforcement learning is proposed. Firstly， a cognitive jamming decision-making system model is established by treating the cooperative electronic jamming problem as a Markov decision process. Considering the constraints related to jamming target， jamming power， and jamming pattern selection comprehensively， the agents’ action space， state space， and reward function are constructed. Secondly， an adaptive learning rate proximal policy optimization （APPO） algorithm is proposed based on the proximal policy optimization （PPO） algorithm. Additionally， to enhance the training speed of the deep reinforcement learning algorithm in a high-fidelity manner， a digital twin-based cooperative electronic jamming decision-making model training method is presented. Simulation results demonstrate that compared with existing deep reinforcement learning algorithms， the interference efficiency of the APPO algorithm is improved by more than 30%， and the proposed training method increases the model training speed by more than 50%.

68 Graph Neural Network-Based Representation and Optimization Techniques for Unmanned Aerial Vehicle Networks

CHENG Nan , FU Lianhao , WANG Xiucheng , YIN Zhisheng

2024, 39(1):44-59. DOI: 10.16337/j.1004-9037.2024.01.005

[Abstract](877) [HTML](2111) [PDF 1.77 M](1201)

Abstract:
As an important component of low-altitude intelligent networking， unmanned aerial vehicles （UAVs） have been widely used in the field of wireless communications. Nevertheless， the existing solutions often encounter numerous challenges when dealing with the continuously evolving scale and topology of UAV networks， such as slow convergence speed， insufficient real-time response capability， high training costs， and limited generalization abilities. To address these issues， this paper proposes an observation representation and decision-making scheme based on graph neural networks （GNNs） for UAV networks. The study initially models the relationships between UAVs and their observational entities using graph modeling techniques， designs a GNN-based representation scheme， and utilizes machine learning algorithms for pre-training to adapt to the dynamically changing observation space. For the dynamic characteristics of the decision space， the paper further introduces an edge-decision-based GNN model， which enhances adaptability to the dynamic decision space through graph modeling and edge weight fitting. Moreover， through the study of two UAV network cases， the effectiveness and superiority of the proposed scheme are validated， demonstrating its potential in practical UAV network applications.

69 High Dynamic Range 3D Recontruction Based on Event Information and Deep Learning

Wang Jie , Wei Zhendong , Wang Qijiang , Zhang Qican , Wang Yajun

2024, 39(2):337-347. DOI: 10.16337/j.1004-9037.2024.02.007

[Abstract](947) [HTML](1015) [PDF 3.90 M](1044)

Abstract:
Three-dimentional（3D） measurement of high dynamic range （HDR） surfaces using optical 3D imaging technology， such as metal parts， black objects， and translucent objects， remains a challenging problem. Currently， traditional methods have limitations in reconstructing HDR scenes with low reflection and translucent areas， as well as difficulty in eliminating internal reflection noise of translucent objects. Existing deep learning-based methods typically use strong laser intensification， which can potentially damage the sample and result in overexposure of the acquired image， necessitating tedious adjustments to the laser intensity. To address these issues， this paper proposes a 3D measurement method for HDR scenes utilizing an event camera and the deep learning algorithm. By asynchronously recording the brightness changes of individual pixels， the event camera is with a high dynamic range response， and thus has the ability to fully capture the laser fringe of HDR scenes. In addition， we introduce a deep convolutional neural network （DCNN） to eliminate the noises caused by the reflections inside transparent objects and overexposure area of high reflection from metallic objects， while enhancing the weak laser stripes on the surface. Experimental results demonstrate that the proposed method can successfully achieve high-quality 3D reconstruction of HDR scenes utilizing low-power line laser scanning.

70 Invisible WFRFT Communication Method with Jump Vector

LIU Fang , HUANG Keting , HOU Yu , FENG Yongxin

2024, 39(2):445-455. DOI: 10.16337/j.1004-9037.2024.02.017

[Abstract](581) [HTML](401) [PDF 2.76 M](798)

Abstract:
The weighted fractional Fourier transform （WFRFT） technology can greatly change the characteristics of the signal and diversify the statistical characteristics of the signal. Thus the security of communication information is ensured. In order to solve the problem of insufficient anti-scanning ability of single-parameter WFRFT communication， taking single-parameter WFRFT as an entry point， the formation mechanism of single-parameter fractional domain is deeply studied， and its potential microscopic features and dark features are analyzed. So an implicit WFRFT communication method of jump vector （IWVJ） is proposed. Using the relationship between the modulation order and the constellation diagram， the hopping matrix and the hopping vector are established. And the control rules are formulated. In addition， the dynamic modulation order is obtained through the hopping vector control to achieve safe communication. Simulation results show that the IWVJ method has higher inverse transform demodulation similarity and lower bit error rate for licensed receivers， which is better than unlicensed receivers with universal scanning capability. At the same time， the appropriate suggestions for the setting of the demodulation order error， the basic modulation order and the jump frequency are given， so that the IWVJ method can be better applied to communication systems， and provide security information with anti-jamming， anti-interception and anti-spoofing capabilities.

71 Semi-supervised Multi-label Classification Method for Financial Events

Yang Zhuofeng , Li Yang , Li Deyu

2024, 39(2):385-394. DOI: 10.16337/j.1004-9037.2024.02.011

[Abstract](574) [HTML](493) [PDF 1.09 M](817)

Abstract:
With the continuous development of the digital financial service industry， the Internet and financial service systems have accumulated a large amount of text data. The automatic classification of financial events described in the financial text is a realistic demand of financial technology， and also a widespread concern in the field of natural language processing and machine learning. At present， the deep learning method has been widely used in text classification. Addressing the issues of lack of labeled data in multi label classification of financial events in text data， frequent resource consumption of existing deep learning methods， and failure to explore the specific characteristics of financial event texts， a semi-supervised multi-label classification method of financial events is proposed by using ALBERT， TextCNN and other presentation tools， introducing the subject word attention mechanism. Firstly， the problem of insufficient labeled data is alleviated through unsupervised data augmentation （UDA） methods； Secondly， the subject word attention mechanism is introduced， and the ALBERT dynamic word vector representation method is used to represent the words in the text； Then， TextCNN is used to represent the text comprehensively； Finally， cross entropy and KL divergence are used to measure the loss of labeled data and unlabeled data to train the model. The effectiveness of the proposed method is verified on the financial text dataset.

72 Speech Emotion Recognition with Multi-task Learning

LI Yunfeng , YAN Zulong , GAO Tian , FANG Xin , ZOU Liang

2024, 39(2):424-432. DOI: 10.16337/j.1004-9037.2024.02.015

[Abstract](922) [HTML](820) [PDF 1.60 M](1136)

Abstract:
In recent speech emotion recognition， researchers attempt to identify emotion from speech signals using deep learning models. However， traditional single-task learning-based models do not pay enough attention to speech acoustic emotional information， resulting in low accuracy of emotion recognition. In view of this， this paper proposes a multi-task learning， end-to-end speech emotion recognition network to mine acoustic emotion in speech and improve the accuracy of emotion recognition. In order to avoid the loss of information caused by using frequency domain features， this paper adopts the Wav2vec2.0 as the backbone network of the model to extract the acoustic and semantic features of speech， and the attention mechanism is used to integrate the two kinds of features as self-supervised features. To make full use of the acoustic sentiment information in speech， using emotion-related phoneme recognition as an auxiliary task， a multi-task learning model is used to mine acoustic sentiment in self-supervised features. Experimental results on the public dataset IEMOCAP show that， the proposed multi-task learning model achieves a weighted accuracy rate of 76.0% and an unweighted accuracy rate of 76.9%， with significantly improved model performance compared to the traditional single-task learning model. Meanwhile， ablation experiments verify the effectiveness of auxiliary task and self-supervised network fine-tuning strategy.

73 Recent Advancement in Multi-granulation Three-Way Decisions

Qian Jin , Zheng Mingchen , Zhou Chuanpeng , Liu Caihui , Yue Xiaodong

2024, 39(2):361-375. DOI: 10.16337/j.1004-9037.2024.02.009

[Abstract](981) [HTML](651) [PDF 2.79 M](1482)

Abstract:
Multi-granulation three-way decisions utilizes three-way decision theory to analyze and process complex problems from multiple of views and levels， gradually becoming an efficient and reliable intelligent decision-making method. This paper reviews the research work on multi-granulation three-way decisions， mainly introduces multi-granulation fusion strategy， multiview three-way decisions， and multilevel three-way decisions， discusses multi-granulation three-way decisions from both qualitative and quantitative perspectives， illustrates the relationships between different multi-granulation three-way decisions models， and points out several problems for the existing multi-granulation three-way decisions. The obtained results can provide some references for the deep research in this field.

74 Audio Adversarial Examples Generation Method Based on Self-attention Mechanism

LI Zhuhai , Guo Wu

2024, 39(2):416-423. DOI: 10.16337/j.1004-9037.2024.02.014

[Abstract](744) [HTML](815) [PDF 1.40 M](1037)

Abstract:
With the widespread of personal speech and development of automatic speaker recognition algorithms， personal privacy protection is in a high-risk situation. Audio adversarial examples can protect personal voiceprint features through disabling automatic speaker recognition algorithms while the subjective hearing of the human ear remains unchanged. We improve the typical adversarial attacks algorithm FoolHD with multi-head self-attention mechanism， and we call it FoolHD-MHSA. First， convolutional neural networks are introduced as the encoder to extract adversarial perturbation spectrograms. Second， we use self-attention mechanism to extract correlation features of different parts of perturbation spectrogram from a global perspective ， focus the network on the important information and suppress the useless information. Finally， the processed perturbation spectrogram is steganographed into the input spectrogram with a decoder to get adversarial example spectrogram. Experimental results show that FoolHD-MHSA can generate adversarial examples with higher attack success rate and average PESQ score than FoolHD.

75 Distributed Sparse Soft Large Margin Clustering

Xie Yunxuan , CHEN Songcan

2024, 39(2):376-384. DOI: 10.16337/j.1004-9037.2024.02.010

[Abstract](434) [HTML](411) [PDF 712.48 K](728)

Abstract:
Soft large margin clustering （SLMC） has been proved to achieve better clustering performance and interpretability than other algorithms， such as K-Means. However， when facing large scale distributed data storage， computing involved kernel matrix requires large time cost. One of the effective strategies to reduce this time cost is to use random Fourier feature transform to approximate the kernel function， and the feature dimension on which approximating accuracy depends is often too high， which implies the risk of overfitting. This paper embeds the sparsity into kernel SLMC and combines the alternating direction method of multipliers （ADMM） with SLMC. Finally， we propose a distributed sparse soft large margin clustering algorithm （DS-SLMC） to overcome scalability problem and achieve better interpretability through sparsity.

76 Domain-Specific Foundation-Model Customization: Theoretical Foundation and Key Technology

Chen Haolong , Chen Hanzhi , Han Kaifeng , Zhu Guangxu , Zhao Yichen , Du Ying

2024, 39(3):524-546. DOI: 10.16337/j.1004-9037.2024.03.003

[Abstract](3117) [HTML](2892) [PDF 2.11 M](3183)

Abstract:
As ChatGPT and other foundation-model-based products demonstrate powerful general performance， both academia and industry are actively exploring how to adapt these models to specific industries and application scenarios， a process known as the customization of domain-specific foundation models. However， the existing general-purpose foundation models may not fully accommodate the patterns of domain-specific data or fail to capture the unique needs of the field. Therefore， this paper aims to discuss the methodology for customizing domain-specific foundation models， including the definition and types of foundation models， the description of their general architecture， the theoretical foundations behind the effectiveness of foundation models， and several feasible methods for constructing domain-specific foundation models. By presenting this content， we hope to provide guidance and reference for researchers and practitioners in the customization of domain-specific foundation models.

77 Time Series Imputation Method Combining Tensor Completion and Recurrent Neural Network

HE Jun , LAI Zhaoyuan , SHI Kan

2024, 39(3):598-608. DOI: 10.16337/j.1004-9037.2024.03.008

[Abstract](818) [HTML](779) [PDF 1.48 M](1048)

Abstract:
The existing imputation methods are roughly divided into statistical methods and deep learning methods. The statistical methods can only capture the linear time relationship， which makes it impossible to accurately capture the relationship of non-linear time series data. The deep learning imputation methods usually donot consider the correlation between different time series. To solve these problems， a new model jointing the tensor completion and the recurrent neural network is proposed. Firstly， the multivariate time series are modeled as a tensor， and the correlation of different time series is captured by low rank tensor completion. Secondly， a time based dynamic weight is proposed to fuse the tensor completion results with the prediction results of the recurrent neural network to avoid the accumulation of prediction error caused by continuous missing. The proposed method is evaluated on several real time series datasets， and the results show that the proposed model outperforms the existing models in term of imputation accuracy， which is helpful for improving classification and regression accuracy.

78 Blind Face Restoration Algorithm Based on Feature Fusion and Embedding

HUO Zhiyong , HU Shanlin

2024, 39(3):609-616. DOI: 10.16337/j.1004-9037.2024.03.009

[Abstract](694) [HTML](702) [PDF 2.70 M](846)

Abstract:
Blind face restoration is to recover high quality face from unknown degradation， and the ill-posed problem often results in local texture missing or mismatched facial components for restored images， therefore a degraded blind face restoration algorithm based on feature fusion and embedding optimization is proposed. By extracting face prior features from degraded inputs， using multi-headed cross-attention for feature interaction fusion and global context modeling， embedding facial priors into the latent space of pre-trained generative networks， and carrying out optimization based on loss functions， local textures lost or damaged due to degradation are repaired to achieve a balance between realism and fidelity. Numerical experiments are conducted on three real degraded datasets， which outperform existing methods in terms of objective metrics and subjective quality， and the final ablation experiments validate the effectiveness of the degraded blind face restoration algorithm.

79 Three-Way Decision Model Based on Intuitionistic Fuzzy Similarity Relation

LYU Mingming , XUE Zhan’ao , YANG Mengli , XIN Xianwei , SUN Lin

2024, 39(3):617-633. DOI: 10.16337/j.1004-9037.2024.03.010

[Abstract](579) [HTML](516) [PDF 2.38 M](780)

Abstract:
Intuitionistic fuzzy similarity relations cause the similarity degree between objects in the intuitionistic fuzzy set too concentrated or the dissimilarity degree too high， leading to nreasonable classification results， and when constructing intuitionistic fuzzy similarity relation， the similarity degree and dissimilarity degree between objects are vulnerable to unimportant attributes. Therefore， a three-way decision model based on intuitionistic fuzzy similarity relation is proposed according to the intuitionistic fuzzy sets and the possibility theory. Firstly， the definitions of possibility measure and necessity measure are given. Combining with the Hausdorff measure， a distance formula is constructed and its properties are proved. The similarity degree and dissimilarity degree between objects in intuitionistic fuzzy sets are defined， and a new intuitionistic fuzzy similarity relationship is constructed.Then，the （λ₁，λ₂）-cut set under intuitionistic fuzzy similarity relation and the similar class under intuitionistic fuzzy （λ₁，λ₂）-cut set are defined， and the positive， negative and boundary fields of target set are further obtained. Finally， the rationality and effectiveness of the proposed model are verified through UCI data sets and examples.

80 Enhanced Growing Neural Gas Based Many-Objective Evolutionary Algorithm

Xue Ming , Wang Peng , Tong Xiangrong

2024, 39(3):634-648. DOI: 10.16337/j.1004-9037.2024.03.011

[Abstract](643) [HTML](568) [PDF 1.04 M](718)

Abstract:
With the in-depth research on many-objective optimization problems， many-objective optimization problems with irregular Pareto frontiers pose challenges to existing methods due to their complex Pareto frontiers distribution. To address the above issues， a many-objective evolutionary algorithm based on the enhanced growing neural gas is proposed. This algorithm combines the learning characteristics of growing neural networks with the optimization characteristics of binary quality indicators to enhance the convergence pressure of the population at the irregular Pareto frontier. Firstly， an enhanced growing type of neural gas network is designed， which utilizes the topological information of the Pareto optimal frontier to guide the population to converge towards the Pareto optimal frontier direction. Then， a joint metric is proposed to comprehensively evaluate the convergence of individuals in conjunction with Pareto dominance information. Finally， an adaptive reference point based environment selection is proposed to enhance the diversity of the population in high-dimensional target space. To verify the performance of the proposed algorithm， 44 irregular many-objective optimization problems in the DTLZ and WFG benchmark problem sets are compared with five advanced many-objective evolutionary algorithms. Experimental results show that the overall performance of the proposed many-objective evolutionary algorithm based on enhanced growing neural gas is superior to the comparison algorithms.

81 Model Pruning Algorithm Based on Sparse Optimization and Nesterov Momentum Strategy

ZHOU Qiang , CHEN Jun , BAO Lei , TAO Qing

2024, 39(3):659-667. DOI: 10.16337/j.1004-9037.2024.03.013

[Abstract](681) [HTML](539) [PDF 1.51 M](725)

Abstract:
With the rapid development of deep learning， the number of parameters and computational complexity of models have exploded， which pose challenges for deployment on mobile terminals. Model pruning has become the key to the implementation and application of deep learning models. At present， the pruning method based on regularization usually adopts L2 regularization combined with the importance standard based on the order of magnitude. It is an empirical method lacking theoretical basis， and its accuracy is difficult to guarantee. Inspired by the Proximal gradient method for solving sparse optimization problems， we propose a Prox-NAG optimization method that can directly generate sparse solutions on deep neural networks and a corresponding iterative pruning algorithm is designed. This method is based on L1 regularization and uses Nesterov momentum to solve the optimization problem. It overcomes the dependence of the original regularization pruning method on L2 regularization and order of magnitude standards， and is a natural extension of sparse optimization from traditional machine learning to deep learning. Pruning experiments are conducted on the ResNet series models on the CIFAR10 dataset， and the results show that the Prox-NAG pruning algorithm has improved its performance compared to the original pruning algorithm.

82 Task-Oriented Dialogue Understanding with Explicit Knowledge Injection

LI Shuaipeng , WANG Pinghui , SUN Wangchun , YANG Yang , DU Youtian , MA Xiaoke , DU Yongjie

2024, 39(3):668-677. DOI: 10.16337/j.1004-9037.2024.03.014

[Abstract](591) [HTML](543) [PDF 1.52 M](752)

Abstract:
Dialogue understanding aims to detect user intent given dialogue history. Due to the lack of domain knowledge， traditional dialogue understanding models fail to understand domain-specific entities. Knowledge-enhanced approaches are proposed to improve model performance with structured knowledge， where the knowledge is implicitly injected with knowledge embeddings. However， knowledge embeddings have to be updated with the update of the knowledge base， which brings extra costs. Besides， existing methods suffer from the knowledge noise and incorporate the context-irrelevant knowledge that changes the semantics of the utterance. To address the above issues， this paper proposes a multi-task learning dialogue understanding model with explicit knowledge injection（K-CAM）. K-CAM injects knowledge into the model using natural language knowledge without retraining the model for updated knowledge embeddings. A multi-task learning objective of joint intent detection， slot filling， and relevant knowledge recognition is further proposed to resist the knowledge noise problem. Extensive experimental results show that the proposed model K-CAM achieves a significant improvement of 4.87% and 2.09% in macro F₁ on the intent detection and slot filling tasks compared to other baselines.

83 Map-Constrained Trajectory Recovery Mechanism Based on Transformer

MEI Yusheng , ZHAO Zhuofeng

2024, 39(3):678-688. DOI: 10.16337/j.1004-9037.2024.03.015

[Abstract](931) [HTML](805) [PDF 1.46 M](932)

Abstract:
Trajectory reconstruction is a research field for trajectory restoration of low-sampling rate trajectory data. In recent years， in order to improve the accuracy of trajectory reconstruction， some work used deep learning models such as Seq2Seq to improve the efficiency and accuracy of trajectory recovery. However， most of the existing work ignores the long-distance dependencies between trajectory points， resulting in poor accuracy for trajectory reconstruction. Therefore， this paper proposes a trajectory recovery model， called ZTrajRec （Zero-based trajectory recovery） based on Transformer， which captures the long-distance dependency between trajectories through Transformer encoder， and uses the attention mechanism to take into account the similarity between current trajectory and historical trajectories to reconstruct the trajectory directly on the road network. Experimental results show that， on the real Beijing taxi dataset， ZTrajRec improves the recall rate by 3%—4%， compared to the results of the benchmark models. Finally， the result is visually analyzed to demonstrate its plausibility.

84 Few-Shot Learning Method Based on Class Enhancement and Multi-scale Adaptation

Dong Chijing , Zhang Sunjie , Ren Han

2024, 39(3):689-698. DOI: 10.16337/j.1004-9037.2024.03.016

[Abstract](596) [HTML](561) [PDF 1.55 M](845)

Abstract:
In order to solve the problems of the insufficient feature information extraction and the difficulty in capturing local obvious feature information accurately in few-shot learning， a method combining class enhancement and multi-scale adaptation is proposed. Firstly， the class enhancement is performed on the image at the level of features， and rich semantic structures are encoded by associating each activation of the feature map with its neighborhood， thus making the extracted intra class features obvious and more conducive to the current classification task. Secondly， low-level representations of image features at different scales are extracted through multi-scale feature generation. Finally， the semantic correlation matrix on each scale is weighted and similarity elements are maximized to calculate the semantic similarity between the query image and each support set category image. After the fusion of multi-scale information， the target images are classified. In the 5-way 1-shot and 5-way 5-shot settings， the mean average precision （mAP） of this method on the miniImageNet dataset is 56.83% and 75.76% respectively， and it achieves 79.33% and 93.92%， 66.33% and 85.78% on the commonly used fine grained image dataset Standard Cars and CUB-200-2011 classification benchmarks， respectively， which are superior to the best results of the existing methods.

85 Gender Opposition Speech Recognition Method of Fusing Multi-feature and Emoji Sentiment Lexicon

MA Zichen , ZHANG Shunxiang , LIU Yunduo , ZHU Guangli

2024, 39(3):699-709. DOI: 10.16337/j.1004-9037.2024.03.017

[Abstract](702) [HTML](662) [PDF 2.24 M](810)

Abstract:
To identify relevant extreme speech， a gender opposition speech recognition method of fusing multi-features and emoji sentiment lexicon is proposed. Firstly， BERT（Bidirectional encoder representation from transformer） is used to extract the character features of the input texts， and Word2Vec is used to extract the Wubi， Zhengma and Pinyin features of the input texts. Then， these features are fused and fed into the Bi-GRU（Bi-directional gated recurrent unit） network to obtain the deeper semantic information. Finally， the sentiment polarities are calculated with the full-connected layer and SoftMax function combining the emoji sentiment lexicon to determine whether the input texts are related gender opposition. Compared with the method without adding multi-features and emoji sentiment lexicon， the experiments on the self-collected Chinese gender opposition dataset show that the proposed model is improved on the F₁ value by 5.19%. In addition， the generalization of the proposed method is verified by experiments on the public Chinese sentiment analysis dataset Weibo_senti_100k.

86 Epilepsy Identification Method Based on Multi-modal Multi-grained Fusion Network

Qi Xiaoyu , Ding Weiping , Ju Hengrong , Cheng Xueyun , Huang Jiashuang

2024, 39(3):710-723. DOI: 10.16337/j.1004-9037.2024.03.018

[Abstract](1182) [HTML](605) [PDF 2.10 M](1016)

Abstract:
Structural brain network （SC） and functional brain network （FC） can reflect the changes in brain structure information caused by epilepsy from different perspectives. Currently， the fusion of two types of brain network information for auxiliary diagnosis of epilepsy has become one of the important studies in the field. However， common fusion models only fuse the information of the two types of brain networks at a single granularity， ignoring the multi-grained attribute of brain networks. This paper proposes an epilepsy identification method based on multi-modal multi-grained fusion network （MMFN）， which integrates the features of the multi-modal brain network from global and local granularities to take full advantage of multi-modal brain network information. Specifically， at the local granularity， two modules （i.e.， edge features fusion module and node features fusion module） are designed to reconstruct the feature maps of edge layer and node layer of two types of brain network， so that these two modes can learn features interactively. At the global granularity， a multimodal decomposition bilinear pooling module is designed to learn the joint representation of the two types of brain networks. Compared to current methods， experimental results show that the proposed method can improve the accuracy of epilepsy recognition significantly and assist doctors in the diagnosis of epilepsy.

87 Research Progress in Evaluation Techniques for Large Language Models

ZHAO Ruizhuo , QU Zichang , CHEN Guoying , WANG Kunlong , XU Zhewei , KE Wenjun , WANG Peng

2024, 39(3):502-523. DOI: 10.16337/j.1004-9037.2024.03.002

[Abstract](2321) [HTML](1025) [PDF 1.54 M](3540)

Abstract:
With the widespread application of large language models， the evaluation of large language models has become crucial. In addition to the performance of large language models in downstream tasks， some potential risks should also be evaluated， such as the possibility that large language models may violate human values and be induced by malicious input to trigger security issues. This paper analyzes the commonalities and differences between traditional software， deep learning systems， and large model systems. It summarizes the existing work from the dimensions of functional evaluation， performance evaluation， alignment evaluation， and security evaluation of large language models， and introduces the evaluation criteria for large models. Finally， based on existing research and potential opportunities and challenges， the direction and development prospects of large language models evaluation technology are discussed.

88 Coordination Framework for Collaborative Disposal of Multi-intelligent Agents Based on Large Language Models

WU Xiaoning , LI Ruixin , WANG Lang , LIU Wenjie , WANG Hongwei , ZHU Xinli , SONG Jiangfan , YUAN Meng

2024, 39(3):559-576. DOI: 10.16337/j.1004-9037.2024.03.005

[Abstract](1043) [HTML](1979) [PDF 3.29 M](1108)

Abstract:
Addressing the decision-making conundrum faced by commanders in response to major sudden incidents， this paper proposes a coordination framework for collaborative disposal of multi-intelligent agents based on large language models. The framework optimizes collective decision-making efficiency and action planning through strategies such as agent role generation， multi-level Monte-Carlo tree and interactive prompt learning. It introduces hierarchical mechanisms and workflow management concepts， enhancing collaboration efficiency through the reward function shared among agents. A transparent and implicit communication model ensures node status consistency. Experimental results demonstrate that the framework performs well under various scenarios， significantly improving reaction speed and response efficiency compared to traditional task allocation methods.

89 “Aiwu Large Model+”: Development and Empirical Study of Military Large Model System

CUI Xiaolong , GAO Zhiqiang , JI Weitong , SHEN Jianan , ZHANG Min , QIU Xinyuan

2024, 39(3):588-597. DOI: 10.16337/j.1004-9037.2024.03.007

[Abstract](2948) [HTML](2364) [PDF 1.90 M](2866)

Abstract:
Intelligent command is an important direction for the new command and control theories， and large language models are important support for the realization of intelligent command capabilities such as intelligent interaction， task planning， and auxiliary decision-making. Combining theory and practice， we outline the military capability requirements of the large model and design a large language model application framework for intelligent command. Then， the system architecture， information process， and collaborative architecture of the “Aiwu large model+” system are proposed and the key technologies for engineering implementation are proposed. Empirical cases of intelligent command are used to verify the multimodal interaction and military language understanding of the system. Collaboration and command control of manned/unmanned platforms can be expanded， which provides reference for research and implementation of the major national defense and military special projects and the intelligent command in the future.

90 Knowledge Distillation of Large Language Models Based on Chain of Thought

Li Ronghan , Pu Rongcheng , Shen Jianan , Li Dongdong , Miao Qiguang

2024, 39(3):547-558. DOI: 10.16337/j.1004-9037.2024.03.004

[Abstract](1427) [HTML](1747) [PDF 1.65 M](1300)

Abstract:
The chain of thought （CoT） prompts enable large language models to process complex tasks according to specific reasoning steps， allowing them to demonstrate stronger capabilities in common sense reasoning， mathematical logic reasoning， and interpretability. However， the main drawback of the CoT approach lies in its reliance on massive language models， which typically have billions of parameters and face challenges in large-scale deployment. To address this issue， this paper proposes a large model knowledge distillation method based on the CoT， aiming to fully leverage the thinking and reasoning capabilities of large language models. Through knowledge distillation techniques， the main goal is to guide smaller models in solving complex tasks.This study adopts a large model as the teacher model and a small model as the student model， fine-tuning the student model by acquiring reasoning data from the teacher model. Through a series of carefully designed methods， such as changing data generation methods， clustering-based sampling of question-answer examples， heuristic correction of examples， and adaptive generation of answers， this study makes the generation process of the teacher model more efficient， resulting in higher-quality and larger quantities of reasoning data. This enables better fine-tuning of the student model， allowing it to acquire strong reasoning capabilities and achieve efficient knowledge distillation. The framework of this study aims to establish an effective knowledge transfer mechanism， allowing the deep thinking of large models to effectively guide smaller models， providing more intelligent and efficient solutions for solving complex tasks. Through this approach， we hope to overcome the challenges of deploying large models and promote the application and advancement of language models in the real world.

91 Fine-Tuning Method for Pre-trained Model RoBERTa Based on Federated Split Learning and Low-Rank Adaptation

XIE Sijing , WEN Dingzhu

2024, 39(3):577-587. DOI: 10.16337/j.1004-9037.2024.03.006

[Abstract](988) [HTML](857) [PDF 1.26 M](940)

Abstract:
Fine-tuned large language models （LLMs） perform exceptionally well in various tasks， but centralized training poses user privacy leakage risks. Federated learning （FL） mitigates data sharing issues through local training， yet the large parameter size of LLMs challenges resource-constrained devices and communication bandwidth， making deployment in edge networks difficult. Considering split learning （SL）， federated split learning can effectively address these issues. Given the more pronounced influence of deep-layer model weights and the discovery that training certain layers yields slightly lower accuracy compared to training the entire model， we opt to split the model based on Transformer layers. Additionally， utilizing low-rank adaption （LoRA） can further reduce resource overhead and enhance security. Therefore， at each device， we only perform LoRA and training on the final few layers. These adapted layers are then uploaded to the server for aggregation. From the perspective of cost reduction and ensuring model performance， we propose a fine-tuning method for the pre-trained model RoBERTa based on federated split learning and LoRA. By jointly optimizing the computational frequency of edge devices and the rank of model fine-tuning， we maximize the rank to improve model accuracy under resource constraints. Simulation results indicate that only training the last three layers of the LLMs can improve model accuracy within a certain range （1—32） by increasing the rank. Additionally， increasing the per-round delay and the energy threshold of devices can further enhance model accuracy.

92 Artificial Intelligence-Assisted Magnetic Resonance Imaging in Assessment of Neoadjuvant Chemotherapy for Breast Cancer: A Review

LIU Kaiwen , JIN Yingying , WANG Shouju

2024, 39(4):794-812. DOI: 10.16337/j.1004-9037.2024.04.003

[Abstract](1347) [HTML](1266) [PDF 2.75 M](1232)

Abstract:
Neoadjuvant chemotherapy has become a standard treatment strategy for breast cancer， and magnetic resonance imaging （MRI） is the preferred imaging method for assessing the response of breast cancer to neoadjuvant chemotherapy. Although MRI can provide detailed information of tumor， including location， size， and microenvironment， the precise assessment of neoadjuvant chemotherapy of breast cancer suffers from the diverse changes in tumors present in MRI images. Artificial intelligence methods based on machine learning and deep learning have demonstrated the ability to recognize complex patterns in MRI data. Through clinical radiologic feature analysis， radiomics analysis， and habitat analysis， artificial intelligence technology has significantly enhanced the performance and efficiency of assessments for breast cancer neoadjuvant chemotherapy， aiding in the realization of personalized treatment strategies. This paper introduces the MRI data and performance indicators in assessing breast cancer neoadjuvant chemotherapy， summarizes the progress of artificial intelligence applications in this field， and discusses the current challenges and potential future research directions for artificial intelligence technology in practical applications.

93 A Double-Decoding Model for Polyp Segmentation Based on Feature Fusion

WU Gang , QUAN Haiyan

2024, 39(4):954-966. DOI: 10.16337/j.1004-9037.2024.04.015

[Abstract](605) [HTML](682) [PDF 2.84 M](831)

Abstract:
In the early screening of colorectal cancer， diagnostic efficiency and accuracy can be improved by automated polyp detection and segmentation of colonoscopy images. Due to the complexity of internal environment of intestines and the limitation of image quality， automated polyp segmentation is still a challenging problem. Aiming at this problem， this paper proposes a dual-decoding model for polyp segmentation using Transformer and null convolution to achieve feature fusion （FTDC-Net）. ResNet50 is used as an encoder in order to be able to better extract deep image features. The Transformer coding module is used， which has a self-attention mechanism to capture long distance dependencies between the inputs， and different dilated-convolutions are used in the model to expand the sensory field of the model to allow the model to capture a larger range of information in the colonoscopy image. The decoding part of the network model in this paper uses a dual-decoding structure， including an autoencoder branch that reconstructs the inputs and a coding branch for segmenting the results. The output of the autoencoder is used in the model to generate an attention map as an attention mechanism. This map will be used to guide the segmentation results. Experimental validation is carried out on the Kvasir-SEG and ETIS-LARIBPOLYPDB standard datasets， and experimental results show that FTDC-Net can effectively segment colon polyps， and achieves a high level of improvement in all evaluation metrics compared to the current mainstream polyp segmentation models.

94 Graph Structure Learning Method for Multi-site Autism Diagnosis Based on Multi-view Low-Rank Subspace

HUANG Jianhui , MA Di , ZHANG Li

2024, 39(4):984-995. DOI: 10.16337/j.1004-9037.2024.04.017

[Abstract](442) [HTML](446) [PDF 2.19 M](584)

Abstract:
Autism spectrum disorder （ASD） stands as one of the most prevalent and genetically inherited neurodevelopmental disorders， characterized by a multitude of clinical symptoms， notably featuring social communication deficits. Effective identification of biomarkers holds paramount significance in facilitating early interventions for ASD. Many current methods leverage multi-site imaging data to augment sample size， thereby enhancing diagnostic accuracy. However， the heterogeneity of data across multiple sites， resulting from variations in imaging devices， imaging parameters， and data processing workflows， is frequently overlooked. To overcome the above problem， this paper proposes a graph structure learning method for multi-site autism diagnosis based on multi-view low-rank subspace （MVLL-GSL）. Firstly， the multiple views of brain network are constructed for each sample， encompassing diverse topological information. Subsequently， samples from different classes are projected into their respective low-rank subspaces to mitigate the impact of data heterogeneity. Finally， the integration of graph structure learning with multi-task graph embedding learning， incorporating prior subnetworks and multi-view consistency regularization constraints， aims to extract more discriminative and coherent features from multi-view low-rank subspaces. The autism public ABIDE （Autism brain imaging data exchange） database is used to verify the proposed method. Experimental results show that the MVLL-GSL method improves the performance of ASD disgnosis and explains the association of different prior sub-networks with ASD pathogenesis.

95 Chinese Named Entity Recognition Based on Prompt Learning and Multi-level Feature Fusion

Wang Xin , Wei Chuyuan , Zhang Lei , Wan Shanshan

2024, 39(4):1020-1032. DOI: 10.16337/j.1004-9037.2024.04.020

[Abstract](578) [HTML](486) [PDF 1.46 M](599)

Abstract:
The current named entity recognition task based on the pre-training-fine-tuning model has a gap between pre-training and fine-tuning， which makes it difficult to effectively model the relationship between entities and contexts， and the current Chinese named entity recognition methods cannot obtain sufficient character or word meanings. To address above problems， this paper proposes a named entity recognition method based on cue learning and incorporating multi-level feature information. Firstly， the cue text is constructed based on the cue learning mechanism， and then the character， word and entity-level feature information of the input text is spliced with it， which is taken as the input of the pre-trained model to effectively capture the semantic information between the contexts， narrow the gap between the pre-trained model and the downstream task， and improve the perceptive ability of the model for named entity recognition. The proposed method makes full use of prior knowledge to increase the learning ability of the model and improve the effectiveness of named entity recognition in the complex and variable semantic environment of Chinese. The F₁ values reach 97.09%， 96.68%， 83.44%， 97.48% and 76.05% on the People’s Daily， MSRA， Weibo， Resume and CMeEE datasets， respectively. Experimental results show that the proposed method is generally better than the current mainstream Chinese named entity recognition methods.

96 Rolling Bearing Fault Detection Based on Few-Shot Learning

Cao Yingying , Huan Zhan , Chen Zhen , Chen Ying

2024, 39(4):1033-1042. DOI: 10.16337/j.1004-9037.2024.04.021

[Abstract](669) [HTML](819) [PDF 1.57 M](780)

Abstract:
Bearing fault types are complex， and it is difficult to obtain enough training samples for each fault type under different working conditions. Convolutional neural network with training interference （TICNN）with wide convolutional kernel is introduced as the subnetwork of the Siamese network used to extract features， reducing the impact of industrial environment noise. Siamese network is a structure commonly used for few-shot learning. By inputting the same or different categories of samples for training， the mapping relationship between different attribute samples and features is learned， and the similarity between samples is used as measure index. The test sample is classified by finding the class of the nearest neighbor. Experimental results on the standard Case Western Reserve University （CWRU） bearing fault diagnosis benchmark dataset show that， in the case of limited data， the proposed model shows better results in fault diagnosis. The performance of the proposed few shot learning model exceeds the baseline model with a reasonable noise level when testing with the least training data in different noise environments， and the accuracy of fault diagnosis reaches 94.41%. When evaluating on test sets with new fault types or new working conditions， the proposed model also performs well.

97 Image Inpainting Based on Perceptual Inference and External Spatial Prior Features

WU Peng , ZHANG Sunjie , WANG Yongxiong , CHEN Yuanfeng , QIN Haiwang

2024, 39(4):933-943. DOI: 10.16337/j.1004-9037.2024.04.013

[Abstract](670) [HTML](682) [PDF 4.41 M](742)

Abstract:
Image inpainting based on deep learning has made a lot of remarkable progress. However， when there is a large area mask， due to the lack of reasonable prior information guidance， the repair results often appear artifacts and blurred textures. Therefore， we propose an image inpainting algorithm that combines prior features with image predictive filtering. It consists of two branches： Image filtering kernel prediction branch and feature inference and image filtering branch. The features are extracted from the decoder part of the image filter kernel prediction branch. The multi-scale external spatial feature fusion is used to reconstruct the mask region features， and the decoding stage is passed to another branch as a prior feature to provide richer semantic information for image inpainting. Then， a spatial feature-aware inference block is introduced in the feature inference and image filtering branches， which can filter out the distracting features and capture the informative long-distance image context for inference. Finally， the image prediction filter kernel is used to filter and eliminate artifacts. Compared with other repair networks on CelebA and Places2 datasets， the superiority of the method in repair quality is proved.

98 Graph Learning-Based Methods for Generating Missing Brain Networks and Multi-modal Fusion Diagnosis

GONG Rongfang , HUANG Linya , ZHU Qi , LI Shengrong

2024, 39(4):843-862. DOI: 10.16337/j.1004-9037.2024.04.006

[Abstract](954) [HTML](1086) [PDF 6.06 M](863)

Abstract:
The multi-modal brain network， which integrates the brain structural and functional networks， can effectively extract the complementary information from different modalities， significantly improving the diagnostic accuracy of neurological diseases such as epilepsy. However， due to the long acquisition time and high acquisition cost of multi-modal data collection， it often faces the problem of modality missingness in practical applications， leading to decreased diagnostic accuracy and generalization ability of the model. To address the issue of modality data completely missing， we propose a method based on graph learning methods and cycle-consistent generative adversarial networks， named Graph-CycleGAN method. This method captures feature information between different brain regions in the brain network by introducing graph neural networks， such as graph convolutional neural networks and graph attention mechanisms. Besides， it strengthens the feature extraction ability of the generative framework and realizes the mutual generation of brain structural network and functional network. In addition， to address the lack of diagnostic result-based evaluations for the quality of generated data， this paper proposes a classification model that integrates real and generated brain networks. Experimental results on the epilepsy dataset indicate that the proposed Graph-CycleGAN method can effectively realize the generation of missing brain network by utilizing the existing modality information.

99 Image Captioning Method for Fusing Multi-temporal Dimensional Visual and Semantic Information

CHEN Shanxue , WANG Cheng

2024, 39(4):922-932. DOI: 10.16337/j.1004-9037.2024.04.012

[Abstract](600) [HTML](576) [PDF 1.01 M](600)

Abstract:
Traditional image captioning methods use only the visual and semantic information of the current moment to generate prediction words without considering the visual and semantic information of the past moments， which leads to the output of the model to be relatively homogeneous in terms of temporal dimension. As a result， the generated captioning is lacking in terms of accuracy. To address this problem， an image captioning method that fuses multi-temporal dimensional visual and semantic information is proposed， which effectively fuses visual and semantic information of past moments and designs a gating mechanism to dynamically select both kinds of information. Experimental validation on the MSCOCO dataset shows that the method is able to generate captioning more accurately， and the performance is considerably improved in all evaluation metrics when compared with the most current state-of-the-art image captioning methods.

100 Fusion Fine-Grained Feature Encoding for Point Cloud Classification and Segmentation

TAO Zhiyong , DOU Miaosen , LI Heng , LIN Sen

2024, 39(4):944-953. DOI: 10.16337/j.1004-9037.2024.04.014

[Abstract](648) [HTML](570) [PDF 1.41 M](655)

Abstract:
Effective acquisition of point cloud features is the key to analyzing and processing 3D point cloud scenes. To address the problem that current deep learning methods have inadequate feature information extraction and difficulty in capturing deep semantic information， a fusion fine-grained feature encoding network is proposed to improve the accuracy of point cloud classification and segmentation tasks. First， the feature extraction module contains two sub-modules， one is the dilation graph convolution module， which can extract richer geometric information than graph convolution； and the other is the fine-grained feature encoding module， which can capture detailed features of local regions. Second， the two modules are dynamically fused by learnable parameters to efficiently learn the contextual information of each point. Finally， all the extracted features are summed and pass the channel-wise affinity attention module， assisting the feature map to avoid redundancy by emphasizing its distinct channels. Point cloud classification experiment is performed on the ModelNet40 and ScanObjectNN datasets， and the overall accuracy is 93.3% and 80.0%， respectively. The mean intersection over union （mIoU） is 85.6% for part segmentation experiments on the ShapeNet Part dataset. Experimental results show that the proposed method performs better than the current mainstream methods.

101 Diagnosis of Brain Diseases Based on Multi-scale Residual Fusion Graph Convolutional Networks

HAO Xiaoke , HE Zilong , LU Xinchu , MA Mingming , LIU Shiyu

2024, 39(4):827-842. DOI: 10.16337/j.1004-9037.2024.04.005

[Abstract](904) [HTML](767) [PDF 2.38 M](854)

Abstract:
In recent years， functional brain networks have been used in the diagnosis of brain disorders such as autism spectrum disorder （ASD）. Existing studies have shown that combining resting-state functional magnetic resonance imaging （rs-fMRI） data as well as non-imaging information to form a population graph， and then learning and classifying the data by using graph neural network （GNN） is very effective in the diagnosis of ASD. However， most studies still face two challenges： First， the construction of functional connectivity matrices using methods such as Pearson correlation coefficient cannot effectively identify and analyze localized brain regions and biomarkers associated with diseases； second， it is difficult to efficiently learn multi-scale information about node features in population graphs on GNN. To solve these problems， a multi-scale residual fusion graph convolutional networks （MSRF-GCN） based on the attention mechanism is proposed. The algorithm efficiently localizes and identifies brain regions useful for diagnosis by designing a functional connection generator to extract temporally relevant features with remote dependencies. Meanwhile， the multi-scale information in the population graph is learned by designing a multi-scale residual fusion algorithm. The Edge Sparse strategy is also introduced to increase the sparsity of node connections by randomly discarding edges in the initial population graph， which in turn reduces the risk of overfitting during training. The effectiveness of MSRF-GCN in the diagnosis of ASD is demonstrated by the results of experiments performed on the autism brain imaging data exchange （ABIDE） program.

102 Kalman-Filter-Based Acoustic Feedback Cancellation with State Detection Model for Fast Recovery from Abrupt Path Changes

GUO Haocheng , CHEN Kai , LU Jing

2024, 39(5):1126-1134. DOI: 10.16337/j.1004-9037.2024.05.006

[Abstract](1185) [HTML](604) [PDF 1.89 M](830)

Abstract:
The partitioned block frequency domain Kalman filter （PBFDKF） has been applied in acoustic feedback cancellation （AFC） due to its fast convergence and low steady-state misalignment. However， the Kalman filter at steady state might encounter the issue of deadlock when the feedback path experiences abrupt changes， exhibiting suboptimal tracking capabilities. In this paper， the Kalman-filter-based AFC with state detection model （KFSD） is proposed to effectively improve the robustness against abrupt path changes. The narrowband energy of the microphone signal， the residual signal and the update of Kalman filter are used as the input to the state detection model. And then， the state detection results are merged into the state estimation error covariance matrix of the Kalman filter， achieving better re-convergence performance against the abrupt path changes. Experimental results demonstrate the superior performance of the proposed KFSD algorithm， showcasing a high true positive rate， a low false alarm rate， and a short state detection latency. These advantages lead to faster re-convergence and enhanced acoustic feedback cancellation..

103 Detection and Classification of Banded Carbide in Steel Based on Improved Cascade R-CNN

HAO Liang , ZHOU Shiyang , MO Yunyang , CHEN Yongyong , XU Yong , SU Jingyong

2024, 39(5):1228-1239. DOI: 10.16337/j.1004-9037.2024.05.014

[Abstract](815) [HTML](765) [PDF 4.23 M](841)

Abstract:
In the steel industry， carbide is a vital constituent， whose distribution in steel materials holds significant reference value for evaluating steel quality. However， the current detection methods for carbide in steel bars primarily rely on manual inspection， which is costly and lacks stability. This study introduces advanced deep learning techniques from the domain of artificial intelligence， which collects and annotates 3 192 high quality images of banded carbides on steel bars， alongside 11 complete samples to create a banded carbide dataset on object detection for steel bars （BCDOD）. Common deep learning methods for object detection are applied to the dataset through experimental analysis. With a focus on the specific characteristics of the application scenario and data， the cascade R-CNN model is enhanced with rotation data augmentation， improvement to the Focal Loss function and negative sample fine-tuning， resulting in performance improvement. The achieved average precision reaches 96%， with 100% recognition accuracy on complete sample data， showcasing promising outcomes that address the existing gap in artificial intelligence technology within the field of carbide metallographic detection.

104 Medical Image Segmentation Method with Integrated Self-attention

ZHAO Fan , ZHANG Xuedian

2024, 39(5):1240-1250. DOI: 10.16337/j.1004-9037.2024.05.015

[Abstract](1009) [HTML](858) [PDF 2.15 M](856)

Abstract:
Aiming at the limitations of the UNet architecture in capturing local features and preserving edge details in medical image segmentation， this paper presents an improved UNet algorithm integrating self-attention mechanism. The proposed algorithm is based on traditional encoder-decoder structure， incorporating a multi-scale convolution （MSC） block for multi-granularity feature extraction， and a convolution mixer attention （CMA） block， which combines the modeling of local features by convolutional layers with global contextual modeling by self-attention layers. In the segmentation task of BUSI and DDTI datasets， compared with the existing classical network architecture， a large number of experimental data verify the excellent segmentation ability of the model. Additionally， Statistical data analysis and ablation studies further confirm the effectiveness of the MSC and CMA modules. This research provides an innovative approach for high-precision medical image segmentation， holding significant theoretical and practical implications for enhancing the accuracy and efficiency of medical diagnoses.

105 Robust Optimization Design for Multicast Transmission in IRS-Aided Cognitive Satellite and Terrestrial Network

MA Biao , ZHAO Bai , JI Mingyi , DING Changfeng , LIN Min

2024, 39(5):1251-1259. DOI: 10.16337/j.1004-9037.2024.05.016

[Abstract](722) [HTML](405) [PDF 1.40 M](719)

Abstract:
To improve spectrum efficiency， this paper proposes a robust multicast transmission algorithm for intelligent reflecting surface （IRS） aided cognitive satellite and terrestrial network （CSTN）. Specifically， the satellite uses multicast technology to serve multiple primary users， while the terrestrial base station （BS）， sharing spectrum resources with the satellite network， serves direct users and blocked users through space division multiple access technique and intelligent reflecting surfaces， respectively. Then， a joint optimization problem is formulated to minimize the BS transmit power， while satisfying the outage constraints of both the signal-to-interference-plus-noise ratio of terrestrial users and the interference power of the primary users. To address this nonconvex problem， the nonconvex outage constraint is first transformed into a deterministic form with the assistance of the cumulative distribution function of the exponential distribution. Then， a robust beamforming algorithm combining alternating optimization with semi-positive definite relaxation is proposed to obtain a solution with better performance. Computer simulation results demonstrate the robustness and superiority of the proposed algorithm.

106 Direction-of-Arrival Estimation for Hybrid mMIMO Systems via Sparse Bayesian Learning

MU Xinru , FU Haijun , DAI Jisheng

2024, 39(5):1260-1270. DOI: 10.16337/j.1004-9037.2024.05.017

[Abstract](639) [HTML](387) [PDF 820.57 K](644)

Abstract:
The direction-of-arrival （DOA） estimation is the premise of beamforming for hybrid massive multiple-input multiple-output （mMIMO） systems. The subspace methods based on covariance matrix reconstruction suffer from a large performance loss under the conditions of correlated signals and limited snapshots. To address the above challenges， this paper proposes a DOA estimation method for hybrid mMIMO systems via sparse Bayesian learning （SBL）. It can be seen that the problem of DOA estimation for hybrid mMIMO systems is transformed into the issue of sparse signal recovery， bypassing the spatial covariance matrix reconstruction and avoiding the performance loss caused by the subspace methods. By using variational Bayesian inference （VBI）， unknown parameters are estimated adaptively， which significantly improves the robustness of noise and correlated signals and enhances the performance of DOA estimation in the case of limited snapshots. Numerical simulation results verify the superiority of the proposed method.

107 Data-Driven Decision Support System Construction Based on Graph Model for Conflict Resolution

XU Haiyan , KONG Yang , DAI Sifan

2024, 39(5):1147-1162. DOI: 10.16337/j.1004-9037.2024.05.008

[Abstract](1090) [HTML](459) [PDF 3.87 M](748)

Abstract:
Nowadays， conflicts frequently occur due to issues such as economy， technology， geostrategy， and international order， and the scale of conflicts is shifting from individual and small-scale group conflicts to complex large-scale group conflicts. Compared to conflicts between individuals， large-scale group conflicts have a longer duration and wider scope， which have a negative impact on China’s social order and economic development. Graph model for conflict resolution（GMCR） has been widely applied to water resources， environmental management and economic policy as a theoretical tool for solving conflict problems， and has achieved good results. However， the increasing number of participants and strategies in conflict have led to an exponential increase in situation， and the uncertainty of the subject’s preference behavior is enhanced， so the traditional decision support system GMCRⅡ is difficult to solve such complex conflicts. Based on the algebraic expression of strength preference conflict analysis theory， this paper designs a conflict analysis WEB system SP-GMCRDSS based on .NET platform， including four modules： feasible state generation， state transition setting， strength preference sequence generation and stability analysis engine. Compared with existing systems， SP-GMCRDSS can more efficiently assist conflict analysts in solving large and complex data-driven conflicts. The text mining technology is used to extract strategy， which can assist analysts to determine the input of decision support system， and reduce the subjectivity of model building. Finally， modeling， solving， and analysis functions of the system are demonstrated through the case “Lanzhou Water Pollution Conflict Event”.

108 State of the Art and Prospects of Deep Learning-Based Speaker Verification

LI Jianchen , HAN Jiqing

2024, 39(5):1062-1084. DOI: 10.16337/j.1004-9037.2024.05.003

[Abstract](1397) [HTML](982) [PDF 1.60 M](1245)

Abstract:
With the development of deep learning， speaker verification has made great progress. Compared with other biometric identification technologies， this technology has advantages of remote operation， low cost， easy human-computer interaction， etc.， thus it shows a wide range of application prospects in the fields of public security， criminal investigation， and financial services. A systematic overview of the development lineage of deep learning-based speaker verification techniques is provided. Firstly， the development history and research status of deep learning-based speaker representation model are introduced in four aspects： Model input and structure， pooling layer， supervised loss function， and self-supervised learning and pre-training model. Then， the challenges faced by speaker verification are discussed， such as cross-domain mismatch problems like noise interference， channel mismatch and far-field speech， and the corresponding domain adaptation and domain generalization methods are outlined. Finally， the further research directions are presented.

109 Target Position Detection Based on Bidirectional Fusion of Texture and Depth Information

ZHANG Yawei , FU Dongxiang

2024, 39(5):1214-1227. DOI: 10.16337/j.1004-9037.2024.05.013

[Abstract](639) [HTML](607) [PDF 4.29 M](720)

Abstract:
Aiming at the problem of how to obtain accurate positional information of objects in unstructured scenes by depth cameras with limited hardware device resources， a target position detection method based on bidirectional fusion of texture and depth information is proposed. In the learning phase， two networks adopt the full-flow bidirectional fusion （FFB6D） module， the texture information extraction part introduces the lightweight Ghost module to reduce the computation of the network， and adds the attention mechanism CBAM that can enhance useful features， and the depth information extraction part extends the local features and multilevel feature fusion to obtain more comprehensive features. In the output stage， in order to improve the efficiency， the instance semantic segmentation results are utilized to filter background points， then 3D keypoint detection is performed， and finally the position information is obtained by the least square fitting algorithm. Validations are carried out on LINEMOD， Occlusion LINEMOD and YCB-Video public datasets， whose accuracies reach 99.8%， 66.3% and 94%， respectively， and the amount of parameters is reduced by 31%， showing that the improved position estimation method can canreduce the number of parameters while guaranteeing the accuracy.

110 Unsupervised Video Person Re-identification Based on Multiple Kernel Dilated Convolution

LIU Zhongmin , ZHANG Changkai , HU Wenjin

2024, 39(5):1192-1203. DOI: 10.16337/j.1004-9037.2024.05.011

[Abstract](790) [HTML](656) [PDF 3.15 M](704)

Abstract:
Person re-identification aims to identify specific individuals across surveillance cameras， overcoming challenges such as pose variations， occlusions， and background noise that often lead to insufficient feature extraction. This paper proposes a novel unsupervised video-based person re-identification method that utilizes multi-kernel dilated convolution to provide a more comprehensive and accurate representation of individual differences and features. Initially， we employ a pre-trained ResNet50 as an encoder. To further enhance the encoder’s feature extraction capability， we introduce a multiple kernel dilated convolution module. Enlarging the receptive field of convolutional kernels allows the network to more effectively capture both local and global feature information， offering a more comprehensive depiction of a person’s appearance features. Subsequently， a decoder is employed to restore high-level semantic information to a more fundamental feature representation， thereby strengthening feature representation and improving system performance under complex imaging conditions. Finally， a multi-scale feature fusion module is introduced in the decoder output to merge features from adjacent layers， reducing semantic gaps between different feature channel layers and generating more robust feature representations. Offline experiments are conducted on three mainstream datasets， and results show that the proposed method achieves significant improvements in both accuracy and robustness.

111 Terrain-Adaptive Motion Imitation Based on Multi-task Reinforcement Learning

Yu Hao , LiAng Yuchen , Zhang Chi , Liu Yuehu

2024, 39(5):1182-1191. DOI: 10.16337/j.1004-9037.2024.05.010

[Abstract](839) [HTML](432) [PDF 1.74 M](693)

Abstract:
Terrain adaptive ability is the basis for the stable movement of agents under complex terrain conditions. Due to the complexity of the dynamical systems of these agents， such as humanoid robots， it is usually difficult for traditional inverse dynamics methods to have such ability. Recent research has used the advantages of reinforcement learning in solving sequential decision-making problems to train agents to adapt to terrain. However， these single-task learning methods cannot effectively learn the correlation in various terrains. In fact， complex terrain adaptive tasks can be considered as a multi-task problem， and the relationship between sub-tasks can be measured by different terrain factors. And then， the problem of incomplete acquisition of data distribution information can be solved by mutual learning of sub-task models. Therefore， this paper proposes a multi-task reinforcement learning method. It contains an execution layer which is consist of pre-trained subtask models and a decision layer based on reinforcement learning method. Moreover， the decision layer uses soft constraints to fuse models of the execution layer. Experiments on LeggedGym terrain simulator prove that the agent trained by the method in this paper is more stable in movement and has fewer falls down on complex terrains， showing better generalization performance.

112 Asynchronous Federated Model of Public Health Emergency Monitoring Based on Smart Contract and Federated Storage

LIU Xingchen , DU Junping , LIANG Meiyu , LI Ang

2024, 39(6):1532-1542. DOI: 10.16337/j.1004-9037.2024.06.020

[Abstract](745) [HTML](601) [PDF 1.11 M](466)

Abstract:
With the increasing emphasis on data security in public safety emergencies， federated learning has gained attention for its ability to perform computations without uploading data to a central server， thereby reducing the risk of privacy breaches. However， current federated learning approaches based on smart contracts face challenges such as inefficiency due to their computational demands. To address it， this paper proposes an asynchronous federated learning method for detecting public health emergencies， integrating smart contracts and federated storage. This approach allows federated nodes to join and leave the federated learning process at any time. By leveraging smart contracts and distributed storage， it enhances data security and training efficiency in the public health domain. Furthermore， adaptive differential privacy is employed to dynamically protect the gradients uploaded to distributed storage nodes， further reducing the risk of privacy leakage. Extensive experiments conducted on public datasets and public health security datasets demonstrate that the proposed method outperforms existing approaches in terms of accuracy and requires less time to achieve the same level of precision.

113 An Expression Recognition Model Based on Pyramid Split Attention and Joint Loss

GU Rui , GU Jiale , SONG Cuiling

2024, 39(6):1493-1504. DOI: 10.16337/j.1004-9037.2024.06.017

[Abstract](528) [HTML](656) [PDF 2.10 M](553)

Abstract:
How to extract multi-scale features and model semantic dependencies between remote channels remains a challenge for expression recognition networks. This paper proposes a residual network based on pyramid split attention （PSA-ResNet）， which replaces the 3 × 3 convolution in the ResNet50 residual module with PSA to effectively extract multi-scale features and enhance the correlation of cross channel information. In order to reduce the differences between similar expressions and expand the distance between different types of expressions， a joint loss function optimization parameter of Softmax loss and Center loss is introduced during the training process. The proposed model is simulated on two publicly available datasets， Fer2013 and CK+， and achieves accuracies of 74.26% and 98.35%， respectively， further confirming that this method has better recognition results compared to cutting-edge algorithms.

114 Deep Reinforcement Learning Model for Job Shop Scheduling Problems with Uncertainty

WU Xinquan , YAN Xuefeng , WEI Mingqiang , GUAN Donghai

2024, 39(6):1517-1531. DOI: 10.16337/j.1004-9037.2024.06.019

[Abstract](884) [HTML](909) [PDF 2.47 M](481)

Abstract:
Job shop scheduling problem （JSSP） is a non-deterministic polynomial （NP）-hard classical combinatorial optimization problem. In JSSP， it is usually assumed that the scheduling environment information is known and remains unchanged during the scheduling process. However， the actual scheduling process is often affected by many uncertain factors （such as machine failures and process changes）. A proximal policy optimization with hybrid prioritized experience replay （HPER-PPO） scheduling algorithm is proposed for solving JSSPs with uncertainties. The JSSP is modeled as a Markov decision process where the state features， reward function， action space， and scheduling policy networks are designed. In order to improve the convergence of the proposed deep reinforcement learning model， a new hybrid prioritized experiential replay training method is proposed. The proposed scheduling method is evaluated on standard datasets and datasets generated based on standard datasets. The results show that in static scheduling experiments， the proposed scheduling model achieves more accurate results than existing deep reinforcement learning methods and priority dispatching rules. In dynamic scheduling experiments， the proposed scheduling model can achieve more accurate scheduling results in a reasonable time for JSSP with process order uncertainty.

115 Real-Time Semantic Segmentation of Road Scene Based on Multi-level Attention Feature Optimization

ZHANG Peng , PENG Zongju , ZHANG Wenrui , LUO Yingguo , WEI Wei , WANG Peirong

2024, 39(6):1505-1516. DOI: 10.16337/j.1004-9037.2024.06.018

[Abstract](655) [HTML](562) [PDF 3.81 M](506)

Abstract:
Aiming at the problems of overlapping targets in complex and changeable road scenes， it is difficult to segment image edges and extract small target features. A multi-level attention feature optimization method for real-time semantic segmentation of road scenes is proposed. Firstly， a lightweight residual attention module is designed， taking into account the difference in feature weights at different levels， and optimizing local features of the image through a compressed attention mechanism， thereby improving the edge effect between pixels. Then， the channel attention and depth aggregation pyramid pooling module are designed to further strengthen the extraction of semantic context information， thereby solving the problem of small target information loss. Finally， the attention fusion module is designed to fuse feature information at different scales from top to bottom. It can achieve effective interaction of global feature information and enhance the network’s expression of important features. Experimental tests are carried out on the Cityscapes and CamVid road scene datasets， and the segmentation accuracy is 74.4% and 67.7%， respectively， and the inference speed are 138 frames/s and 148 frames/s. Compared with the excellent methods in recent years， this method improves the loss of image edge information and optimizes the segmentation accuracy of small objects in the image.

116 Convolutional Transformer EEG Emotion Recognition Model Based on Multi- domain Information Fusion

ZHANG Xuejun , WANG Tianchen , WANG Zetian

2024, 39(6):1543-1552. DOI: 10.16337/j.1004-9037.2024.06.021

[Abstract](852) [HTML](963) [PDF 1.93 M](598)

Abstract:
Current emotion recognition methods for eletroencephalogram（EEG） signals seldom fuse spatial， temporal and frequency information， and most methods can only extract local EEG features， resulting in limitations in global information correlation. The article proposes an EEG emotion recognition method based on 3D-CNN-Transformer mechanism （3D-CTM） model with multi-domain information fusion. The method first designs a 3D feature structure based on the characteristics of EEG signals， simultaneously fusing the spatial， temporal， and frequency information of EEG signals. Then a convolutional neural network module is used to learn the deep features for multi-domain information fusion， and then the Transformer self-attention module is connected to extract the global correlations within the feature information. Finally， the global average pooling is used to integrate the feature information for classification. Experimental results show that the 3D-CTM model achieves an average accuracy of 96.36% in the SEED dataset for triple classification and 87.44% in the SEED-Ⅳ dataset for quadruple classification， which effectively improves the emotion recognition accuracy.

117 Emotional Video Captioning Based on Fine-Grained Visual and Audio-Visual Dual-Branch Fusion

GONG Yuxuan , HAN Tingting

2025, 40(5):1165-1176. DOI: 10.16337/j.1004-9037.2025.05.005

[Abstract](163) [HTML](760) [PDF 35.75 K](517)

Abstract:
Emotional video captioning， as a cross-modal task integrating visual semantic parsing and emotional perception， faces the core challenge of accurately capturing the emotional cues embedded in visual content. Existing methods have two notable limitations： First， they insufficiently explore the fine-grained semantic correlations between video subjects （such as humans and objects） and their appearance and motion features， leading to a lack of refined support for visual content understanding； second， they neglect the auxiliary value of the audio modality in emotional discrimination and content semantic alignment， which restricts the comprehensive utilization of cross-modal information. To address these issues， this paper proposes a framework based on fine-grained visual and audio-visual dual-branch fusion. Specifically， the fine-grained visual feature fusion module effectively models the fine-grained semantic associations between video entities and visual contexts through pairwise interactions and deep integration of visual， object， and motion features， thereby achieving refined parsing of video content. The audio-visual dual-branch global fusion module constructs a cross-modal interaction channel to deeply fuse the integrated visual features with audio features， fully leveraging the supplementary role of audio information in emotional cue transmission and semantic constraint. Validation experiments on public benchmark datasets show that the proposed method outperforms comparative methods such as CANet and EPAN across evaluation metrics. It achieves an average improvement of 4% over EPAN method in emotional metrics， an average increase of 0.5 in semantic metrics， and an average boost of 0.7 in comprehensive metrics. Experimental results demonstrate that the proposed method can effectively enhance the quality of emotional video captioning.

118 Non-invasive Continuous Chinese Language Semantic Decoding and Reconstruction

MA Lei , CUI Wenhao , YANG Wenwen , WANG Zhaoxin

2025, 40(3):616-636. DOI: 10.16337/j.1004-9037.2025.03.005

[Abstract](371) [HTML](502) [PDF 3.03 M](393)

Abstract:
Language is an important tool for communication and cognition. Multiple functional areas of the brain， connected through complex neural networks， jointly participate in the perception， comprehension， and production of language. Exploring the neural mechanisms of Chinese semantic decoding is crucial for the development of Chinese brain-computer interface （BCI）. This study aims to establish a long-sequence continuous semantic decoding method based on fMRI data， termed Chinese long-sequence continuous semantic decoder（CLCSD）. Through signal processing workflows and algorithm optimization， it seeks to achieve efficient decoding of continuous Chinese semantics. The CLCSD framework is composed of four components： neural response dimensionality reduction， an encoding model， a word rate model， and a beam search decoding model. Neural response dimensionality reduction is performed through cortical reconstruction， image registration， and brain region parcellation to reduce four-dimensional brain response data into a two-dimensional matrix. The encoding model is constructed using L2-regularized regression （ridge regression） to establish the relationship between stimulus features and brain responses， with noise covariance estimated via bootstrapping to enhance generalization. The word rate model follows a similar approach to the encoding model， where brain response features are mapped to predicted word rate. The beam search decoding model uses the prior probability of the language model and likelihood probabilities of the encoding model to generate the most probable semantic sequence through beam search. On publicly available dataset SMN4Lang， CLCSD achieves a mean BERTScore of 0.674， outperforming other long-sequence Chinese continuous semantic decoding models. The proposed method provides an efficient long-sequence continuous Chinese semantic decoding approach， offering both theoretical foundations and methodological references for the advancement of Chinese BCI technologies.

119 Trestle Random Forest Based on Multiple Randomness and Privacy Protection

SONG Yilin , WANG Shitong

2025, 40(5):1222-1238. DOI: 10.16337/j.1004-9037.2025.05.009

[Abstract](176) [HTML](252) [PDF 32.09 K](464)

Abstract:
As an effective ensemble learning algorithm for classification and regression tasks， the random forest （RF） also faces challenges in improving generalization ability and privacy protection. In response to this challenge， this paper proposes an improved Bernoulli-multinomial stacked random forest （BMS-RF） algorithm based on multiple randomness and privacy protection. The basic idea is to introduce Bernoulli distribution Dropout partial feature vectors to select candidate feature vectors in the stage of constructing decision tree splitting features and splitting point selection. By randomly selecting splitting features and splitting points through two polynomial distributions， each decision tree adopts a non numerical query index mechanism to add noise for maintaining its privacy protection mechanism. When integrating classifiers， a multi-layer stack structure is introduced to randomly project the output of the previous layer and concatenate the source training set as new inputs， so that each forest can share the spatial information of the source samples and improve the classification performance of the base learner layer by layer. Theoretical analysis of the consistency and privacy ability of this algorithm shows that BMS-RF can significantly improve classification performance through a stack structure. Experimental results on 14 small and medium-sized datasets verify that the algorithm not only reduces running time but also has better generalization performance. When the privacy protection is strong， it can achieve classification performance similar to RF variants on the basis of simplifying the structure and improving running speed.

120 Alzheimer’s Disease Classification Based on 3D Multi-modal Convolutional Network and Cross-Modal Feature Integration

ZHU Houyuan , ZHENG Lele , SHANG Hao , ZANG Xuefeng , WU Shaoqi , ZHOU Guangchao , SUN Jiande , QIAO Jianping

2025, 40(4):912-921. DOI: 10.16337/j.1004-9037.2025.04.006

[Abstract](330) [HTML](304) [PDF 1.51 M](457)

Abstract:
Multi-modal neuroimaging technology provides crucial technical support for the early and precise diagnosis of Alzheimer’s disease （AD）. However， due to the inherent heterogeneity in imaging principles and feature representations across different neuroimaging modalities， the fusion of inter-modal information poses significant challenges. To address this issue， this study proposes a multi-modal fusion network （MFN） based on a 3D ResNet architecture for the early auxiliary diagnosis of AD. The proposed method first employs a 3D ResNet to separately extract feature representations from T1- and T2-weighted magnetic resonance images. Subsequently， an innovative cross-modal feature integration module （CFIM） is designed to overcome the limitations of direct concatenation. CFIM adopts a hierarchical fusion strategy， consisting of global information fusion module， local feature learning module and key factor module. Finally， the fused multimodal features are fed into a fully connected neural network for classification. Compared to early concatenation （fixed-weight fusion） and late fusion （shallow aggregation）， this strategy more effectively identifies disease-relevant diagnostic features. Experiments conducted on the Alzheimer’s disease neuroimaging initiative （ADNI） database demonstrate that the proposed method achieves higher accuracy and superior performance in AD classification tasks compared to existing approaches. Ablation studies further validate the effectiveness of each module， offering new technical insights for multi-modal neuroimaging analysis.

121 Wuxin： Architecture Design and Empirical Study for Vertical-Domain Large Language Model System

ZHU Xinli , GAO Zhiqiang , JI Weitong , LI Shaohua , LI Songjie

2025, 40(3):637-646. DOI: 10.16337/j.1004-9037.2025.03.006

[Abstract](461) [HTML](425) [PDF 2.68 M](418)

Abstract:
In customized scenarios， it is urgent to enhance the understanding and generation capabilities of large language models （LLMs） in specific vertical domains. We propose a paradigm for developing vertical-domain LLM system named “Wuxin”， which covers a series of development methods for LLM systems， including architecture， data， model， and training. Wuxin utilizes human-in-the-loop data augmentation to improve the quality of military training injury question and answer datasets， and employs the GaLore strategy to perform efficient full-parameter fine-tuning on small LLMs. Experimental results show that the adopted full-parameter fine-tuning method outperforms LoRA fine-tuning in terms of convergence and accuracy. Furthermore，Wuxin demonstrates significant advantages in understanding professional military training injury knowledge， as well as overcoming hallucinations. Our achievements can provide references for the design and application of question-answering LLM systems in vertical domains.

122 LLM-KG Bidirectional Inference Optimization and Hallucination Suppression for Special Equipment

ZHENG Qiang , XU Zhenbin

2025, 40(3):647-658. DOI: 10.16337/j.1004-9037.2025.03.007

[Abstract](403) [HTML](412) [PDF 1016.51 K](422)

Abstract:
Existing studies have constructed knowledge graph （KG） intelligent question-answering systems based on large language models （LLMs） in the field of special equipment. However， limited by the inincomplete entity relationships of KG， LLMs are still prone to hallucination in knowledge-intensive tasks. To suppress the generation of hallucinations， the fusion KG reasoning technology is proposed to enhance the knowledge representation by completing the entity relationship links. Furthermore， in view of the deficiencies of the existing KG reasoning methods in semantic association and topological structure parsing， a dynamic reasoning mechanism based on LLM is further introduced. By leveraging its deep semantic understanding ability， high-order logic rules are automatically generated to achieve the precise expansion of KG， thereby constructing a bidirectional collaborative optimization mechanism between LLM and KG. The results show that this method significantly outperforms the baseline model in terms of mean reciprocal rank （MRR）， first hit rate （Hits@1）， and top ten hit rate （Hits@10） on the Family， Kinship， and UMLS datasets.

123 Multi-modal Medical Entity Recognition Based on Multi-scale Attention and Graph Neural Networks

HAN Pu , LIU Senling , CHEN Wenqi

2025, 40(4):922-933. DOI: 10.16337/j.1004-9037.2025.04.007

[Abstract](266) [HTML](256) [PDF 1.38 M](405)

Abstract:
With the rapid development of information technology， multi-modal data such as Chinese texts and images in the medical and health field has shown explosive growth. Multi-modal medical entity recognition （MMER） is a key step in multi-modal information extraction， and has attracted great attention recently. Aiming at the problems of image detail loss and insufficient text semantic understanding in multi-modal medical entity recognition tasks， this paper proposes a novel MMER model based on multi-scale attention and dependency parsing graph convolution（MADPG）. This model introduces a multi-scale attention mechanism based on ResNet to collaborate to extract visual features fused with different spatial scales and to reduce the loss of important details of medical images. Thus the image feature representation and complementing text semantic information are enhanced. Then， the dependency syntactic structure is used to construct the graph neural network to capture the complex grammatical dependencies between words in medical texts， so as to enrich the semantic expression of texts and promote the deep integration of image text features. Experiments show that the F₁ value of the proposed model reaches 95.12% on the multi-modal Chinese medical data set， and the performance of the proposed model is significantly improved compared with the mainstream single- and multi-modal entity recognition models.

124 Incremental Attribute Reduction Algorithm Based on Single-Valued Medium-Intelligence Dominance Conditional Entropy

LUO Gongzhi , WANG Cong

2025, 40(5):1207-1221. DOI: 10.16337/j.1004-9037.2025.05.008

[Abstract](158) [HTML](159) [PDF 32.90 K](491)

Abstract:
In the big data environment， the continuous growth of data in the ordered decision information system leads to the dynamic change of the dominance relationship between objects. Efficient calculation of attribute reduction has become a key problem to be solved urgently. Therefore， an incremental single-valued medium-intelligence dominance conditional entropy is proposed， and a new incremental attribute reduction algorithm is constructed accordingly. Firstly， the single-valued medium-intelligence dominance conditional entropy is given under the single-valued medium-intelligence ordered decision information system. Subsequently， for four different types of new objects， the incremental update mechanism of single-valued medium-intelligence dominance conditional entropy is deeply studied， and then an incremental attribute reduction algorithm is designed according to this update mechanism. Finally， six UCI datasets with dominance relations are selected to conduct a comparative experimental analysis on the effectiveness and efficiency of the incremental algorithm and the non-incremental algorithm. Experimental results show that the newly given incremental attribute reduction algorithm can significantly improve the computational efficiency of data processing while maintaining the same classification accuracy.

125 Multimodal Aspect-Level Sentiment Analysis Based on GCN and Target Visual Feature Enhancement

ZHAO Xuefeng , BAI Changze , DI Hengxi , ZHONG Zhaoman , ZHONG Xiaomin

2025, 40(5):1177-1192. DOI: 10.16337/j.1004-9037.2025.05.006

[Abstract](192) [HTML](1149) [PDF 36.72 K](539)

Abstract:
Multimodal aspect-level sentiment analysis aims to integrate graphic modal data to accurately predict the emotional polarity of aspect words. However， the existing methods still have significant limitations in accurately locating text-related image region features and effectively processing the information interaction between modalities. At the same time， the understanding of context information within modalities is biased， which leads to additional noise. In order to solve the above problems， a multi-modal aspect-level sentiment analysis model based on graph convolutional network and target visual feature enhancement （GCN-TVFE） is proposed. First of all， this paper uses the contrastive language-image pre-training（CLIP） model to process text， aspect words， and image data. By calculating the similarity between text and image and the similarity between aspect words and image， and then combining these two similarities， the quantitative evaluation of the matching degree between text and image and the matching degree of aspect words and image is realized. Then， the Faster R-CNN model is used to quickly and accurately identify and locate the target region in the image， which further enhances the ability of the model to extract image features related to text. Secondly， through the GCN network， the text graph structure is constructed by using the dependency syntactic relationship between texts， and the image graph structure is generated by the K-nearest neighbor（KNN） algorithm， to dig the feature information in the mode deeply. Finally， the multi-layer and multi-modal interactive attention mechanism is used to effectively capture the correlation information between aspect words and text， and between target visual features and image-generated text description features， which significantly reduces noise interference and enhances feature interaction between modes. Experimental results show that the model proposed in this paper has superior comprehensive performance on the public datasets Twitter-2015 and Twitter-2017， which verifies the effectiveness of the model in the field of multimodal sentiment analysis.

126 Incomplete Multimodal Brain Tumor Segmentation Method Based on the Combination of U-Net and Transformer

TANG Zhanjun , JIAN Hong , WANG Jian

2025, 40(4):934-949. DOI: 10.16337/j.1004-9037.2025.04.008

[Abstract](322) [HTML](337) [PDF 4.07 M](468)

Abstract:
Given inherent variations among patients， discrepancies in imaging protocols， and potential data corruption， existing brain tumor segmentation methods based on magnetic resonance imaging （MRI） are often challenged by the issue of missing modality data， resulting in low segmentation accuracy. To address this， an innovative incomplete multimodal brain tumor segmentation method based on the combination of U-Net and Transformer （IM TransNet） is proposed. Firstly， a modality-specific encoder is developed for four distinct MRI modalities to enhance the model’s ability to capture unique characteristics of each modality. Secondly， a dual-attention Transformer module is embedded within the U-Net to mitigate the issue of incomplete information arising from missing modalities， thus alleviating the limitations imposed by long-range context interactions and spatial dependencies within the U-Net framework. Additionally， a skip-cross attention mechanism is incorporated into the U-Net’s skip connections to dynamically focus on features from various hierarchical levels and modalities， effectively facilitating feature fusion and reconstruction even in the presence of missing modalities. Furthermore， an auxiliary decoding module is devised to counteract the training imbalance induced by missing modalities， ensuring that the model can consistently and effectively segment brain tumors across diverse subsets of incomplete modalities. Finally， the model’s performance is validated on the publicly accessible BRATS dataset. Experimental results indicate that the proposed model attains average Dice scores of 63.19%， 76.42%， and 86.16% for enhancing tumor， tumor core， and whole tumor， respectively， highlighting its superiority and robustness in handling incomplete multimodal data. This approach offers a viable technical solution for accurate， efficient， and reliable brain tumor segmentation in clinical practice.

127 Fine-Grained Image Recognition Method Based on Attention and Multi-scale Ensemble Learning

JI Shengyu , JIANG Zhikang , MA Xiang , YANG Lvxi

2025, 40(2):384-400. DOI: 10.16337/j.1004-9037.2025.02.009

[Abstract](498) [HTML](759) [PDF 4.54 M](387)

Abstract:
Fine-grained image recognition （FGIR） is an important research topic in the field of computer vision. Its main goal is to distinguish subclasses with high similarity in appearance under the same category. This paper focuses on the research of weakly-supervised fine-grained image recognition technology. Given the problems of insufficient use of feature of fine-grained images and difficulty in digging discriminative regions existing in the research of FGIR， the attention and multi-scale ensemble-learning based network （AMEN） is proposed. This method introduces a progressive learning network， which uses the strategy of ensemble learning to construct multi-scale base-classifiers based on three levels of output features of deep neural network in parallel， and uses the label smoothing method to carry out progressive training for multi-scale base-classifiers， so as to greatly improve the utilization of low-level features. At the same time， the efficient dual channel attention is used to impose channel weights on features， so that the network can independently select features at the channel level， so as to improve the utilization of high information correlation channels. This method also introduces a self-attention region proposal network， which promotes the model to gradually locate the more discriminative region by constructing a circular feedback mechanism， and fuses the feature information of the complete image and the discriminative region in the final classification module. Experimental results show that the recognition accuracy of AMEN on three fine-grained image datasets of CUB-200-2011， FGVC Aircraft and Stanford Cars has reached the advanced level of the field.

128 Panoramic Image Recognition of Rock Borehole Based on Deep Learning

XIAN Yongli , CHEN Xuejian , PENG Zhenming , WANG Jie , PENG Bo

2025, 40(3):675-685. DOI: 10.16337/j.1004-9037.2025.03.009

[Abstract](420) [HTML](321) [PDF 3.98 M](443)

Abstract:
Geotechnical borehole monitoring， as one of the most common tunneling advanced detection techniques， can truly reflect the material properties， characteristics， and groundwater conditions of geomaterials， which is vital to ensure construction safety. Based on the characteristics of the geotechnical borehole monitoring objectives， a smart visual system based on panoramic cameras is developed. The system is suitable for close-range and dynamic high-resolution imaging of the inner walls of long geotechnical boreholes. Based on the improved EfficientNetV2 network and the sliding window prediction， the rapid intelligent recognition of eight types of rock borehole images is realized. Experimental results show that the visual system can meet the requirements for close-range high-resolution panoramic imaging of long boreholes and achieve intelligent state assessment of rock materials. The recognition success rate reaches 91.49% on the test set， and the system preliminarily possesses the comprehensive intelligent evaluation capability of geotechnical borehole status.

129 Dual Contrastive Learning Model Based Background Debiasing in SAR ATR

ZHANG Wenqing , WANG Jing , HUANG Xueqin , TIAN Sirui , HE Cheng , ZHANG Jingdong , LI Hongtao

2025, 40(3):686-698. DOI: 10.16337/j.1004-9037.2025.03.010

[Abstract](362) [HTML](314) [PDF 2.60 M](374)

Abstract:
Contrastive learning， as a self-supervised approach， enables the extraction of target representations from unlabeled SAR images， serving as a critical technique for automatic target recognition （ATR） in SAR. However， existing models often encode targets and backgrounds holistically， resulting in feature representations contaminated by background interference， which diminishes the model’s ability to focus on targets. To address this issue， this paper proposes a novel multi-branch dual contrastive learning model. Firstly， the model retains the conventional instance contrastive branch while introducing an innovative background correction contrastive branch， establishing a multi-branch contrastive learning framework. Secondly， through a random recombination strategy of targets and backgrounds in positive and negative samples， combined with the ResNet50 backbone network and self-attention pooling to enhance semantic feature extraction， an optimized dual contrastive loss function is employed to refine target feature learning and mitigate spurious correlations between backgrounds and targets. Finally， Shapley value analysis based on the MSTAR dataset validates the model’s effectiveness， and target classification results demonstrate that this approach significantly enhances the causality of feature extraction， substantially improving the generalization performance of SAR ATR algorithms.

130 Context-Aware Image Restoration Based on Fused Semantic Information

ZU Yi , ZHANG Sunjie , WU Peng , MA Yueheng

2025, 40(2):401-416. DOI: 10.16337/j.1004-9037.2025.02.010

[Abstract](338) [HTML](501) [PDF 4.22 M](381)

Abstract:
In recent years， generative adversarial networks have been widely used in the field of image restoration and have achieved good results. However， current methods do not consider problems of blurred structures and textures in high-resolution images （512×512）， which mainly come from the lack of effective feature information. To address this problem， this paper proposes a generative adversarial network that combines image features with semantic information. Based mainly on semantic information， a context-aware image restoration model is proposed， which adaptively fuses semantic information with image features， and adaptive convolution is proposed to replace the traditional convolution， as well as a multi-scale context aggregation module is added after the decoder to capture long-distance information for contextual inference. Experiments are conducted on Places2， CelebA-HQ， Paris Street View， and Openlogo datasets， whose results show that the proposed method improves in terms of L₁ loss， peak signal-to-noise ratio （PSNR）， and structural similarity （SSIM） in comparison with the existing methods.

131 Path Connectivity Based Neighbor-Awareness Node Classification Algorithm

ZHENG Wenping , WANG Xiaomin , HAN Zhaorong

2025, 40(1):134-146. DOI: 10.16337/j.1004-9037.2025.01.010

[Abstract](426) [HTML](455) [PDF 1.21 M](434)

Abstract:
Graph convolutional neural networks obtain the node representation by aggregating the neighbor node information with high similarity，and selecting the appropriate neighborhood for the node and conducting effective aggregation are the keys to the graph convolutional networks. Most of the existing graph convolutional neural networks directly aggregate the node information in the multi-hop neighborhood，without considering the difference of the aggregation weights of different hop neighborhoods on different nodes in the network. Aiming at this，a path connectivity based neighbor-awareness node classification algorithm （PCNA） is proposed. The node neighborhood is determined by the path connectivity information in the network，and the influence weight of different length paths on the similarity calculation between nodes is adaptively perceived to guide the neighborhood aggregation process of graph convolutional neural network. Specifically，PCNA is composed of a neighborhood perceptron and a node classifier. The neighborhood perceptron adaptively obtains the aggregated neighborhood of each node and the influence weights of paths with different lengths based on the reinforcement learning mechanism，and then uses the path connectivity information between nodes to obtain the similarity matrix. The node classifier uses the obtained similarity matrix to perform neighborhood aggregation to obtain node representation and classify nodes. The comparison experiments with 10 classical algorithms on eight real datasets show that the proposed algorithm has better performance in node classification tasks.

132 Time-Series Decomposition and Attention Graph Neural Network Based Traffic Forecasting

YANG Yongpeng , YANG Zhen , YANG Zhenzhen

2025, 40(2):417-430. DOI: 10.16337/j.1004-9037.2025.02.011

[Abstract](411) [HTML](547) [PDF 1.85 M](331)

Abstract:
In order to address challenges on how to accurately capture the spatial-temporal dependency， dynamic information and spatial heterogeneity information in traffic forecasting， we propose a time-series decomposition and attention graph neural network （TDAGNN） based traffic forecasting. Specifically， the model first adopts the dual time-series decomposition convolutional neural network （DTDCNN） to extract temporal dependency from traffic data. Secondly， the multi-head interactive attention （MIA） network is introduced to capture spatial heterogeneity and dynamicity from traffic data via the interactivity between the original features and the local augmentation features. Thirdly， the self-scaling dynamic diffusion graph neural network （SDDGNN） is introduced for capturing the spatial dependence and dynamicity from the traffic data. Finally， extensive experiments are carried out for some datasets. Experimental results demonstrate that the average MAE， RMSE and MAPE of the proposed model can be improved up to 14.64， 23.68 and 9.41% respectively， compared to other classic algorithms， proving its high prediction accuracy.

133 Large Language Model-Guided Multi-modal Time Series-Semantic Prediction Framework

YE Shimin , LIU Feifei , ZHANG Yan

2025, 40(5):1193-1206. DOI: 10.16337/j.1004-9037.2025.05.007

[Abstract](341) [HTML](867) [PDF 36.18 K](499)

Abstract:
Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data， including text， images and structured numerical information， to achieve robust inference and explainable decision-making in complex environments. Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment， information complementation and cross-source reasoning， while the inherent black-box nature of deep models limits the result interpretability. Meanwhile， the large language model（LLM） has demonstrated strong capabilities in semantic understanding， instruction following， and reasoning， yet a gap remains in their performance for time series modeling， cross-modal alignment， and real-time knowledge integration. To address these challenges， this paper proposes a LLM-guided multi-modal time series-semantic prediction framework. By combining variational inference-based time series modeling with LLM -driven semantic analysis， the approach establishes a collaborative “temporal-semantic-decision” mechanism： The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms； the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders； and both components are jointly optimized via a learnable fusion module， which also provides uncertainty annotations and explainable reports. Experiments on the StockNet， CMIN-US， and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%， an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient （MCC） elevated to 0.223. This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.

134 Improved F-LOAM Algorithm Based on Three-Stage De-distortion and Hierarchical Downsampling Mechanism

XU He , ZHANG Kuo , LI Peng

2025, 40(5):1294-1305. DOI: 10.16337/j.1004-9037.2025.05.015

[Abstract](159) [HTML](157) [PDF 25.74 K](460)

Abstract:
The traditional fast LiDAR odometry and mapping （F-LOAM） algorithm performs a two-stage de-distortion process on the feature points， but only the first stage de-distorts the feature points， and the second-stage de-distortion is used for building the map， which leads to the lack of accuracy in the bit-position estimation. In order to solve this problem， this paper proposes an improved three-stage de-distortion mechanism combined with a voxelized grid-based hierarchical downsampling mechanism to improve the real-time performance of the algorithm. The improved F-LOAM algorithm shows excellent test results on the KITTI dataset. The three-stage de-distortion mechanism and the hierarchical downsampling strategy not only reduce the computational burden effectively， but also ensure the validity of feature points and the accuracy of the global map.

135 A Few-Shot Learning Algorithm for Defect Image Generation and Data Augmentation Based on DID-AugGAN

HUANG Lve , DENG Yafeng , YAN Huabiao , XIAO Wenxiang

2025, 40(5):1306-1321. DOI: 10.16337/j.1004-9037.2025.05.016

[Abstract](230) [HTML](173) [PDF 36.59 K](731)

Abstract:
To address the issues of low quality， lack of realism， and poor diversity in defect images generated by generative adversarial network （GAN） under small-sample conditions， this paper proposes a defect image generation algorithm， named defect image data augmentation GAN （DID-AugGAN）， aiming at enhancing defect image data under limited sample conditions. First， to overcome the difficulty of traditional convolutional networks in effectively learning non-rigid features in images from limited datasets， we design a learnable offset convolution to improve the model’s capability in capturing semantic information. Second， to prevent the loss of critical defect features and enhance the correlation among local features， we introduce a multi-scale coordinate attention module， which focuses on defect location information. Third， to enhance the discriminator’s ability to distinguish local details in input images， we redesign its architecture， transforming it from a conventional feedforward network into a UNet-like structure with symmetric encoding and decoding pathways. Finally， we conduct comparative experiments between DID-AugGAN and the baseline algorithm on the Rail-4c track fastener defect dataset， and validate the generated images using the MobileNetV3 classification network. Experimental results demonstrate that the proposed method significantly improves inception score （IS） while effectively reducing Fréchet inception distance （FID） and learned perceptual image patch similarity （LPIPS）. Moreover， the classification accuracy and F₁-score of MobileNetV3 are also improved. The proposed DID-AugGAN can stably generate high-quality defect images， effectively augment defect data samples， and meet the requirements of downstream tasks.

136 Heuristic Kernel Density Estimator for Modal-Proximity Data

HE Yulin , CHEN Chunjia , HUANG Zhexue , LI Junjie , FOURNIER-VIGER Philippe

2025, 40(3):711-729. DOI: 10.16337/j.1004-9037.2025.03.012

[Abstract](297) [HTML](303) [PDF 2.91 M](328)

Abstract:
Different from the classical probability density estimator construction strategies based on the Parzen window method， we propose a heuristic kernel density estimator （HKDE） based on nearest neighbor error measurement function， to improve the accuracy of fitting probability density function of modal-proximity data. From the perspective of data and model uncertainties， we analyze the defects of traditional kernel density estimators in solving the problem of probability density estimation of modal-proximity data. The heuristic probability density values that can reduce the uncertainty of observed data are obtained by referring to the convergence of probability density values with respect to the histogram box width. Based on the heuristic probability density value， we construct the sophisticated objective function to determine the optimal bandwidth for kernel density estimator by reducing the model uncertainty. Extensive experiments on 18 modal-proximity datasets are conducted to validate the feasibility， rationality and effectiveness of the designed HKDE. Results show that HKDE can obtain a better approximate performance of probability distribution than seven existing representative probability density function estimators. HKDE has lower estimation error and closer probability density function estimates to the real density values than other kernel density estimators.

137 A Simplified Implementation Method of CSI Feedback Transformer Network Based on Data Clustering

HUAN Dongrui , ZHANG Yifan , JIANG Ming

2025, 40(2):431-445. DOI: 10.16337/j.1004-9037.2025.02.012

[Abstract](407) [HTML](591) [PDF 4.37 M](376)

Abstract:
In order to cope with the increasing overhead of channel state information （CSI） feedback in massive multiple-input multiple-output （MIMO） systems， deep learning-based CSI feedback networks （such as Transformer） have received extensive attention and become very promising intelligent transmission technologies. To this end， this paper proposes a simplification method of CSI feedback Transformer network based on data clustering， which uses clustering-based approximate matrix multiplication （AMM） to reduce the computational complexity of the Transformer network in the feedback process. In this paper， we focus on the computation of the fully connected layer in the Transformer network （equivalent to matrix multiplication）， adopt the simplification methods such as product quantization （PQ） and MADDNESS， analyze their influence on the computational complexity and system performance， and optimize the algorithm according to the characteristics of neural network data. Simulation results show that the performance of the CSI feedback network based on the MADDNESS method is close to that of the exact matrix multiplication method with an appropriate parameter adjustment， and the computational complexity can be greatly reduced.

138 Polyphonic Sound Event Detection Based on Transfer Learning Convolutional Retentive Network

CHEN Pengfei , XIA Xiuyu

2025, 40(3):730-740. DOI: 10.16337/j.1004-9037.2025.03.013

[Abstract](287) [HTML](252) [PDF 2.23 M](352)

Abstract:
Aiming at the problems of limited strong annotation datasets and the sharp degradation of detection performance in real-world scenarios for polyphonic sound event detection tasks， a method for polyphonic sound event detection based on Transfer learning convolutional retentive network is proposed. Firstly， the method utilizes convolutional blocks with pre-trained weights to extract local features of audio data. Subsequently， the local features， along with orientation features， are input into the residual feature enhancement module for feature fusion and channel dimension reduction. The fused features are then fed into the retentive network with regularization methods to further learn the temporal information in the audio data. Experimental results demonstrate that， compared to the champion system model of the DCASE challenge， the method achieves a reduction in error rates by 0.277 and 0.106， and an increase in F₁ scores by 22.6% and 6.6% on the development and evaluation sets of the DCASE 2016 Task3 dataset， respectively. On the development and evaluation sets of the DCASE 2017 Task3 dataset， the error rates are reduced by 0.22 and 0.123， and the F₁ scores increase by 17.2% and 14.4%， respectively.

139 MonoDI：Monocular 3D Object Detection Based on Fusing Depth Instances

ZHAO Ke , DONG Haoran , YE Ning

2025, 40(5):1322-1332. DOI: 10.16337/j.1004-9037.2025.05.017

[Abstract](177) [HTML](179) [PDF 40.42 K](471)

Abstract:
Monocular 3D object detection aims to locate the 3D bounding boxes of objects in a single 2D input image， which is an extremely challenging task in the absence of image depth information. To address the issues of poor detection performance due to the absence of depth information during inference on 2D images and background noise interference in depth maps， this paper proposes a monocular 3D object detection method called MonoDI， which integrates depth instances. The key idea is to utilize depth information generated by an effective depth estimation network and combine it with instance segmentation masks to obtain depth instances， and then integrate the depth instances with 2D image information to aid in regressing 3D object information. To better use the depth instance information， this paper designs an iterative depth aware attention fusion module（iDAAFM）， integrating depth instance feature with 2D image feature to obtain a feature representation with clear object boundaries and depth information. Subsequently， a residual convolutional structure is introduced during training and inference to replace the general single convolutional structure to ensure stability and efficiency of the network when processing fused information. Further， we design a 3D bounding box uncertainty auxiliary task to assist the main task in learning the generation of bounding boxes in training and improving the accuracy of monocular 3D object detection. Finally， the effectiveness of the method is validated on the KITTI dataset and experimental results show that the proposed method improves 3D object detection accuracy for the vehicle class at the moderate difficulty level by 4.41 percentage points compared with the baseline， and outperforms comparative methods such as MonoCon and MonoLSS. And it also achieves superior results on the KITTI-nuScenes cross-dataset evaluation.

140 A Lightweight Road Crack Detection Model Based on Improved YOLOv8n

ZHU Jiahui , LIU Yi , ZHANG Dengyin

2025, 40(5):1333-1347. DOI: 10.16337/j.1004-9037.2025.05.018

[Abstract](257) [HTML](246) [PDF 32.44 K](863)

Abstract:
To address the challenges of road crack appearance characteristics being susceptible to environmental interference， high miss detection rate of fine cracks， and limited computational resources of inspection equipment， a lightweight detection model， MCA-YOLO-A， is proposed. The model is based on YOLOv8n， replacing the original backbone with a lighter MobileNetV3 feature extraction network， and integrating a coordinate attention （CA） module that accurately captures spatial information， thereby enhancing the capability of feature extraction. Meanwhile， the Alpha-IOU loss function suitable for lightweight networks is introduced， which makes the overall performance of the network improve. In addition， a small target detection layer is added to improve the recognition accuracy of fine cracks. The average precision of mAP_0.5 and F₁ score of MCA-YOLO-A model on road crack data sets are 0.930 and 0.893， respectively， which are 7.0% and 9.7% higher than that of the original YOLOv8n model， and the parameter quantity is only 6.0 M， which is 4.8% lower， and the detection speed reaches 95 frames/s. Experimental results demonstrate that the model is highly accurate， lightweight， and capable of generalization， making it more suitable for deployment in scenarios with limited computational resources such as embedded systems and mobile devices.

141 Heart Sound Classification Using Bi-LSTM and Self-attention Mechanism

LU Guanming , LI Qijian , LU Junhe , QI Jirong , ZHAO Yuhang , WANG Yang , WEI Jinsheng

2025, 40(2):456-468. DOI: 10.16337/j.1004-9037.2025.02.014

[Abstract](445) [HTML](358) [PDF 1.48 M](352)

Abstract:
Heart sound auscultation is an effective diagnostic method for early screening of heart disease. In order to improve the performance of abnormal heart sound detection， this paper proposes a heart sound classification algorithm based on bi-directional long short-term memory （Bi-LSTM） network and self-attention mechanism （SA）. Firstly， the heart sound signal is partitioned into frames， and the Mel-frequency cepstral coefficients （MFCC） features are extracted from each frame of the heart sound signal. Next， the MFCC feature sequence is input into the Bi-LSTM network to extract the temporal contextual features of the heart sound signals. Then， the weights of the features output from the Bi-LSTM network at each time step are dynamically adjusted through self-attention mechanism， and more discriminative heart sound features that are conducive to classification are obtained. Finally， the Softmax classifier is used to classify normal/abnormal heart sounds. The proposed algorithm is evaluated using 10-fold cross-validation on the heart sound dataset provided by PhysioNet/CinC Challenge 2016， and achieves sensitivity of 0.942 5， specificity of 0.943 7， accuracy of 0.836 7， F₁ score of 0.886 5， and accuracy of 0.943 4， respectively， which are superior to typical comparative algorithms. Experimental results show that the proposed algorithm can effectively detect abnormal heart sounds without the need for heart sound segmentation， and has potential clinical application prospects.

142 Multi-view 3D Reconstruction Network Based on Dilated Attention and Depth Optimal Correction

XU Lei , LEI Youyuan , ZHU Jun , ZHOU Jie , SHAO Genfu , ZHANG Jiaming

2025, 40(4):1023-1034. DOI: 10.16337/j.1004-9037.2025.04.015

[Abstract](228) [HTML](236) [PDF 2.20 M](364)

Abstract:
The memory consumption issue in MVSNet reconstruction networks， compared with CVP-MVSNet and CasMVSNet networks， reduces memory usage when processing high-resolution images and improving the accuracy of reconstructed point clouds. However， both networks still exhibit significant errors in point cloud completeness. To address this issue， this paper proposes DA-MVSNet， a multi-view 3D reconstruction network based on dilated attention and depth optimal correction. DA-MVSNet uses CasMVSNet as the baseline network， with an additional feature enhancement network that integrates a parallel dilated convolution and attention module， incorporating the concept of depth-wise separable convolutions. This enhancement strengthens the network’s ability to capture global features of input views， improving point cloud completeness. To further enhance the accuracy of output depth maps and prevent the feature enhancement network from extracting irrelevant background information， which can degrade the accuracy of the reconstructed point cloud， an optimization correction mechanism based on nonlinear least squares is introduced at the output stage of the network. The results show DA-MVSNet reduces the accuracy and completeness errors of the reconstructed point cloud by 2.5% and 4.7%， respectively， on the indoor scene DTU dataset， achieving better overall performance. However， due to the additional feature enhancement network and correction mechanism， the memory and time consumption of DA-MVSNet are not very higher than those of CVP-MVSNet and CasMVSNet.

143 Campus Bike-Sharing Crowdsourcing Scheduling System Based on Spatio-Temporal Distribution Dynamic Perception

SHEN Ruda , HE Wanyuan , XU Yifan

2025, 40(4):972-985. DOI: 10.16337/j.1004-9037.2025.04.011

[Abstract](280) [HTML](441) [PDF 2.66 M](464)

Abstract:
The bike sharing system （BSS） has become a significant component of implementing urban intelligent transportation systems. This paper proposes a spatio-temporal distribution dynamic perception-based campus bike-sharing resource scheduling system. To address the issue of sudden inventory changes at shared bicycle stations leading to inventory shortages， the system first models the dynamic changes at bicycle stations using the vector autoregressive moving average （VARMA） model， achieving predictions of future inventory shortage events at stations. Secondly， to resolve the contradiction between bicycle scheduling utility and cost in crowdsourced resource scheduling scenarios， it introduces a task assignment method based on a binary optimal matching model and specifically optimizes the Hungarian algorithm for efficient decision-making in task assignment. Simulation results show that the proposed method can effectively improve the system utility of bike-sharing scheduling， reduce the service quality loss caused by inventory shortages at bike stations， and effectively balance the spatio-temporal distribution of bicycles.

144 Multi-objective Feature Selection Algorithm for Neighborhood Rough Set Under Mixed Hierarchical Dependence

LUO Gongzhi , ZHANG Shanglei

2025, 40(1):117-133. DOI: 10.16337/j.1004-9037.2025.01.009

[Abstract](498) [HTML](324) [PDF 1.40 M](460)

Abstract:
Accuracy and efficiency are the key metrics for evaluating the performance of feature selection algorithms. They correspond to the attribute dependence and reduction scale of neighborhood rough sets respectively. Conventional feature selection algorithms often optimize solely based on maximum attribute dependence reduction， overlooking the significance of reduction scale. However， as data feature dimensions increase and category hierarchies emerge， category information becomes complex and structural relationships become chaotic. Traditional attribute dependency calculations fail to effectively utilize category hierarchy information， leading to suboptimal classification performance. In response to this， a mixed hierarchical dependency that considers the relationship between attribute importance and category hierarchy structure is constructed. This treats mixed hierarchical dependency and reduction scale as two independent optimization objectives， and introduces a multi-objective evolutionary algorithm to optimize them independently. This approach improves attribute reduction performance from both the attribute dependency and attribute scale perspectives， resulting in reduction results that meet target constraints. Experimental results demonstrate that the proposed algorithm achieves higher-quality reduction results within target constraints， leading to the improvement of classification accuracy.

145 A Review of Development and Future Directions of Medical Foundation Models

QIAN Bo , LI Fujiang , ZHENG Changle , ZHANG Daoqiang

2025, 40(3):562-584. DOI: 10.16337/j.1004-9037.2025.03.002

[Abstract](1206) [HTML](1255) [PDF 4.44 M](648)

Abstract:
Medical foundation models represent a significant application of large-scale pre-trained model technology in the healthcare domain and have become a key research focus in intelligent medical assistance. By leveraging pretraining on vast amounts of medical data， these models exhibit critical capabilities such as cross-task transfer， multimodal understanding， and complex reasoning， overcoming several limitations of traditional neural networks in medical applications. With these capabilities， medical foundation models are reshaping the implementation of core tasks such as assisted diagnosis， clinical report generation， and medical image analysis. They hold profound implications for achieving general intelligence in healthcare. Based on this， this paper provides a comprehensive review of the current state and future trends of medical foundation models. First， it reviews the development of medical AI models in the context of rapid advancements in artificial intelligence. Then， it highlights research progress of large models in medical subfields such as pathology， ophthalmology， and neurological disorders. Finally， it discusses the challenges currently faced by medical foundation models and explores their future development directions.

146 Few-Shot Specific Communication Emitter Identification Method Based on Broad Learning and Attention Mechanism

CHEN Yupeng , LIU Hui , REN Gaoxing , YANG Junan

2025, 40(5):1261-1269. DOI: 10.16337/j.1004-9037.2025.05.012

[Abstract](186) [HTML](219) [PDF 25.16 K](465)

Abstract:
Under the condition of few-shot specific communication emitter identification， the difficulty of extracting individual features of communication radiation source by the existing deep learning algorithm increases， and the recognition rate decreases. To solve this problem， this paper proposes a recognition method to construct a shallow neural network by fusing attention mechanism and broad learning. Firstly， broad learning is introduced to simplify the network model and reduce the overfitting phenomenon caused by small samples. Secondly， the node attention module is constructed to improve the feature extraction ability of the broad neural network under the condition of small samples. Finally， the effectiveness of the proposed method is verified on the public dataset. The results show that compared with the deep learning method with a small number of samples， the proposed method improves the overfitting phenomenon of the deep learning network， strengthens the feature extraction ability of the broad learning method， and improves the recognition accuracy.

147 Construction of High-Quality Dataset in Aero-engine Domain Based on Large Language Model

ZOU Guanyun , WANG Cunjun , KONG Yinhao , MA Xiaoqing , LI Piji

2025, 40(3):603-615. DOI: 10.16337/j.1004-9037.2025.03.004

[Abstract](735) [HTML](639) [PDF 2.23 M](535)

Abstract:
With the rapid advancement of artificial intelligence technology， large language models （LLMs） are increasingly being applied across various domains. However， the lack of high-quality， manually curated question-answering datasets in the field of aero-engine has hindered the practical application of expert-level question-answering model. To address this issue， this paper proposes an automated method for constructing question-answering datasets based on LLMs， which generates high-quality open-domain question-answering data without human intervention. During the data generation phase， the method employs in-context learning and input-priority generation strategies to enhance the stability of the generated data. In the data filtering phase， a dual evaluation mechanism is established， combining faithfulness assessment based on source text similarity and semantic quality evaluation using large language models， to automatically filter out hallucinated or anomalous data and ensure factual reliability. Experimental results demonstrate that the proposed method significantly improves the quality of the generated dataset. Models fine-tuned on this dataset exhibit notable performance improvements in aero-engine domain knowledge question-answering tasks. The findings of this study not only provide a solid foundation for the application of large language model in the aero-engine domain but also offer valuable insights for automated dataset construction in other complex engineering fields.

148 EEG-TCNet for Motor Imagery Classification Based on Nonnegative Matrix Factorization

ZHANG Xuejun , SHI Baoming

2025, 40(5):1361-1370. DOI: 10.16337/j.1004-9037.2025.05.020

[Abstract](160) [HTML](116) [PDF 32.37 K](467)

Abstract:
In response to the limitations of deep learning approaches in motor imagery classification using electroencephalogram （EEG） signals， such as the failure to explore inter-channel correlations and fully exploit frequency， temporal， and spatial information， this study proposes a classification method named NTEEGNet， which combines nonnegative matrix factorization （NMF） with temporal convolutional network （TCN） and one compacted convolutional neural network named EEGNet to enhance the performance of motor imagery classification with a relatively small number of parameters. The NMF component of the model effectively extracts channel features and fully utilizes frequency， temporal， and spatial information. Additionally，the network’s receptive field increases exponentially under the action of TCN， leading to stronger feature extraction capabilities with fewer parameters. Experimental results on the BCI Competition Ⅳ 2a dataset demonstrate that NTEEGNet can achieve an impressive classification accuracy of 83.99%， improved by 6.64% on the basis of EEG-TCNet.

149 Multi-granularity Intuitionistic Fuzzy Three-Way Decision Model Based on Regret Theory

PANG Wenli , YU Xiao , ZHENG Yu , CHEN Hui , XUE Zhan’ao , XIN Xianwei

2025, 40(2):501-516. DOI: 10.16337/j.1004-9037.2025.02.017

[Abstract](434) [HTML](252) [PDF 1.41 M](340)

Abstract:
When solving complex multi-granularity decision-making problems， traditional three-way decision models based on functions or relationships tend to ignore the multi-granularity characteristics of information in reality and the limitations of the cognitive ability of decision makers. Based on it， this paper proposes a multi-granularity intuitionistic fuzzy three-way decision model based on regret theory. Firstly， to deal with complex calculation problems of intuitionistic fuzzy numbers， the θ operator is integrated with intuitionistic fuzzy rough sets， a multi-granularity upper and lower approximation operator for intuitionistic fuzzy rough sets is proposed， and the corresponding three-way decision rules are given. Secondly， to integrate the cognitive characteristics of the decision-maker into the decision-making process， a multi-granularity three-way sorting method under optimistic and pessimistic strategies is constructed based on the regret theory. Finally， the effectiveness of the proposed model is verified by a group decision example of the competency assessment of “Chinese+vocational” talents in international Chinese education， which provides a new method for uncertain decision-making problems that integrate decision-maker risk preferences in an intuitionistic fuzzy environment.

150 Named Entity Recognition of Fish Disease Based on Multi-feature and Cross-Modal Knowledge Distillation

SHEN Zhicheng , CHEN Ming

2025, 40(1):230-246. DOI: 10.16337/j.1004-9037.2025.01.018

[Abstract](498) [HTML](403) [PDF 3.70 M](427)

Abstract:
In order to solve the lack of reasonable arrangement of multi-modal fish disease knowledge， and at the same time reduce the redundant data in the knowledge distillation process， so as to deploy a recognition model with low storage， small samples， and high accuracy， this paper proposes a new method， named as FSFDAI-TMRD. In terms of multi-feature collaborative prediction， this paper focuses on improving the original multi-feature collaborative multi-feature prediction architecture of multi-tasks. Firstly， the finer-grained begin-middle-end-single （BMES） method is used instead of the rough labeling of the begin-inside-outside （BIO） method in the original work. Secondly， the formula for calculating the joint probability distribution of the original architecture is modified， so that the model can better recognize the nested noun entities. In terms of cross-modal multi-head distillation， this paper proposes to employ a cross-modal attention mechanism. Firstly， it calculates the multi-head relationship matrix after merging， splitting， and dot product， and secondly， it utilizes the relative entropy for knowledge distillation， so that the model can better align the intermediate features between heterogeneous teachers and students. Meanwhile， this paper also applies the biaffine attention and adversarial weight perturbation function to enhance the learning of multi-feature knowledge such as semantic phonology and word form word meaning. Compared with the mainstream model， the precision， recall and F₁ value of the FSFDAI-TMRD method are improved by 0.45%， 3.96% and 2.28%， respectively. The storage optimization ratio is improved by 3.01% and the model parameter size is reduced by 94.86%.

151 Multi-scale Dual-Branch Dual-Attention-Based Point Cloud Classification Network

GU Junhao , ZHANG Sunjie , QIN Chendong

2025, 40(6):1608-1624. DOI: 10.16337/j.1004-9037.2025.06.018

[Abstract](159) [HTML](139) [PDF 3.42 M](309)

Abstract:
Although Transformers have made significant progress in 3D point cloud processing， efficiently and accurately learning valuable low-frequency and high-frequency information remains a challenge. Moreover， most existing methods focus primarily on local spatial information， neglecting global spatial information， which leads to information loss. This paper proposes a novel point cloud learning network， referred to as the multi-scale dual-branch dual-attention network. First， in the feature extraction process of the point cloud， compared to methods that search for neighboring points at a fixed scale， the multi-scale K-nearest neighbor（KNN） approach not only preserves local structural details but also more effectively captures global geometric information. Second， this paper introduces a dual-branch dual-attention architecture to extract different spatial features， proposing a dual-attention mechanism combining local window attention and global channel content attention to extract low-frequency and high-frequency information from the network， respectively. Additionally， on this basis， this paper introduces the group-rational Kolmogorov-Arnold （GR-KAN） layer into the classification head， replacing the traditionally used multilayer perceptron （MLP） layer， which allows for more flexible handling of nonlinear features and makes the network more sensitive to complex datasets. Finally， extensive experiments demonstrate that the proposed model achieves an accuracy of 93.8% on the ModelNet40 dataset and 86.5% on the ScanObjectNN dataset， showcasing its superior performance and broad application prospects in 3D point cloud processing.

152 Research Progress on Multimodal Continual Learning Methods

ZHANG Wei , QIAN Longyue , ZHANG Lin , LI Teng

2025, 40(5):1122-1138. DOI: 10.16337/j.1004-9037.2025.05.002

[Abstract](381) [HTML](347) [PDF 75.38 K](555)

Abstract:
Multimodal continual learning （MMCL）， as a significant research direction in the fields of machine learning and artificial intelligence， aims to achieve continuous knowledge accumulation and task adaptation through the integration of multiple modal data （such as images， text， audio， etc.）. Compared with traditional single-modal learning methods， MMCL not only enables parallel processing of multi-source heterogeneous data， but also effectively retains existing knowledge while adapting to new task requirements， demonstrating immense application potential in intelligent systems. This paper provides a systematic review of multimodal continual learning. Firstly， the fundamental theoretical framework of MMCL is elaborated from three dimensions： Basic concepts， evaluation systems， and classical single-modal continual learning methods. Secondly， the advantages and challenges of MMCL in practical applications are thoroughly analyzed： Despite its significant advantages in multimodal information fusion， it still faces critical challenges such as modal imbalance and heterogeneous fusion， which not only constrain the performance of current methods but also indicate future research directions. Based on this， the paper then comprehensively reviews the research status and latest advancements in MMCL methods from four main aspects： Replay-based， regularization-based， parameter isolation-based， and large model-based approaches. Finally， a forward-looking perspective on the future development trends of MMCL is presented.

153 Recognition Algorithm for Multi-agent Collaborative Open-Domain Multimodal 3D Model

LI Qiang , MA Qiuyang , ZHANG Ning , NIE Weizhi

2025, 40(5):1139-1152. DOI: 10.16337/j.1004-9037.2025.05.003

[Abstract](271) [HTML](2162) [PDF 39.08 K](529)

Abstract:
To address the challenge of recognizing unlabeled 3D models in open-domain， this paper proposes a multi-agent collaborative algorithm for open-domain multimodal 3D model recognition. The algorithm employs a reinforcement learning framework to simulate human cognitive processes. Within this framework， a multi-agent system is utilized to extract and fuse multimodal information， which enables a comprehensive understanding of the feature space while leveraging the similarity of multimodal samples to enhance model training. Additionally， a progressive pseudo-label generation method is introduced in the reinforcement learning environment. It dynamically adjusts clustering constraints to generate reliable pseudo-labels for a subset of unlabeled data during training， mimicking human exploratory learning of unknown data. These mechanisms collectively update the network parameters based on environmental feedback rewards， effectively controlling the extent of exploratory learning and ensuring accurate learning for unknown categories. Experimental results show that the average recognition accuracy of the method proposed in this paper on the three-dimensional dataset OS-MN40 reaches 65.6%. After transferring the method to the image domain， the classification accuracy on the CIFAR10 dataset reaches 95.6%， which provdies a universal and efficient solution for the research of open-domain three-dimensional model recognition.

154 Prediction Method of EV Charging Demand Power Based on Reinforcement Learning and Variable Weight Combination Model

SONG Zongren , GE Quanbo , LI Chunxi

2025, 40(2):530-544. DOI: 10.16337/j.1004-9037.2025.02.019

[Abstract](405) [HTML](271) [PDF 3.21 M](339)

Abstract:
When an electric vehicle （EV） is connected to a charging pile， it is very important to accurately predict the charging demand power of the battery pack of the EV to prevent the battery pack from being overcharged. Due to the complexity of the physical model of battery pack， it is usually difficult to build a power prediction method based on it， and its real-time performance is not high. In addition， the prediction accuracy of a single prediction model is low. Aiming at the above problems， combining charging data with machine learning， this paper proposes an EV charging demand power prediction method based on reinforcement learning （RL） and variable weight combination model. Firstly， based on the traditional grey wolf optimization （GWO） algorithm， chaos mapping and elite reverse learning strategy are combined to improve the quality of the initial population， and the dynamic weight strategy of reinforcement learning is used to update the individual position of grey wolf to optimize the parameters in the least square support vector machine （LSSVM） algorithm. Then， the weights of the extreme learning machine prediction model and the improved LSSVM prediction model are reasonably distributed by the variable weight combination method based on time-varying weight distribution， so as to solve the shortcomings of the single prediction model method. Finally， the actual charging data of electric vehicles are used to verify the proposed prediction algorithm. Compared with the other three traditional methods， the prediction accuracy of the new method is improved by 4.75%， 3.84% and 0.38%， respectively.

155 Attribute Reduction of Incomplete Neighborhood Decision Rough Sets Based on Decision-Cost Fusion Measures

ZHANG Wanxiang , ZHANG Xianyong , YANG Jilin , CHEN Benwei

2025, 40(3):807-820. DOI: 10.16337/j.1004-9037.2025.03.019

[Abstract](252) [HTML](212) [PDF 2.01 M](332)

Abstract:
Attribute reduction relies on knowledge granulation and uncertainty measurement， thus facilitating intelligent recognition. For incomplete continuous data， neighborhood decision rough sets induce attribute reduction. However， the related neighborhood relation deserves optimal improvements， while the existing decision cost deserves integrated reinforcements. In this paper， a new neighborhood relation is proposed， and three decision-cost fusion measures are constructed， so new incomplete neighborhood decision rough sets are established and the attribute reduction is systematically researched. At first， an improved distance is introduced to produce an incomplete neighborhood relation， so improved rough sets on incomplete neighborhood are proposed. Then， the dependence degree and neighborhood entropy are introduced based on decision costs， so three fusion measures on decision costs are obtained by multiplication fusion， thus acquiring granulation non-monotonicity. Furthermore， eight heuristic reduction algorithms based on attribute importances are designed from two neighborhood relations and four relevant measures of decision costs. As finally verified by data experiments， the five algorithms out of the seven new algorithms have good performance of classification learning， thus improving the basic reduction algorithm.

156 Classification of Pancreatic Cystic Neoplasms by Fusion of Multi-kernel Learning and Multi-source Features

WU Jie , XU Zhenshun , ZHANG Zhiwei , TIAN Hui , BIAN Yun

2025, 40(1):247-257. DOI: 10.16337/j.1004-9037.2025.01.019

[Abstract](446) [HTML](364) [PDF 2.95 M](469)

Abstract:
The classification of pancreatic cystic neoplasms into benign and malignant categories is crucial for medical decision-making. This paper is dedicated to enhancing the accuracy of pancreatic cystic neoplasms classification to assist physicians in formulating more precise diagnostic and therapeutic plans. Utilizing radiomics technology and the ResNet50 neural network， a novel classification method for pancreatic cystic neoplasms is proposed， integrating multi-kernel learning and multi-source feature fusion. The key steps of this method include feature selection， kernel matrix fusion， and the construction of the classification model. Feature selection is performed using the least absolute shrinkage and selection operator （LASSO） to reduce redundant features and improve the model’s generalization ability. Subsequently， multi-source features， screened through feature selection， are mapped in basic kernel functions to construct basic kernel matrices for multi-source features. The weights of these kernel matrices are then optimized and summed up to form a fused kernel matrix. Finally， a support vector machine （SVM） classifier is utilized to categorize pancreatic serous and mucinous cystic tumors. The significance of this process lies in SVM’s ability to use the kernel matrix for inner product operations in high-dimensional spaces， thereby finding a hyperplane to classify data in such spaces. The fused kernel matrix， containing multi-source information after feature mapping， provides higher-dimensional and more complex feature representations. Experimental results demonstrate significant performance improvements in the classification task of pancreatic cystic neoplasms， offering more reliable auxiliary information to physicians and holding substantial clinical application potential.

157 Algorithm for Constructing Compound Partial Random Measurement Matrices Based on Multidimensional Chaotic Mapping

CHEN Xinglan , LU Jin , ZHANG Yanan

2025, 40(1):258-272. DOI: 10.16337/j.1004-9037.2025.01.020

[Abstract](489) [HTML](422) [PDF 3.51 M](461)

Abstract:
The construction of the measurement matrix is a crucial factor influencing the reconstruction performance of compressive sensing techniques. To address the high storage cost of random measurement matrices and the difficulty in satisfying the restricted isometric property （RIP） with deterministic matrices， an improved method for constructing measurement matrices based on chaotic mapping is proposed. This method combines the random Gaussian matrix with the deterministic matrix and chaotic sequences， taking full the advantages of a small number of measurements from random Gaussian matrices and the lower correlation provided by chaotic mappings. Simultaneously， an analysis is conducted on the phase space characteristics of chaotic sequences， the RIP properties of measurement matrices， and the computational complexity involved in constructing optimized measurement matrices. Finally， simulation experiments compare random Gaussian matrices， Toeplitz matrices， and existing composite matrices. The results show that the proposed optimized measurement matrices outperform the other three types of matrices in terms of relative error， success reconstruction probability， and signal-to-noise ratio for one-dimensional random signals. Additionally， these optimized measurement matrices also exhibit improvements in the reconstruction time complexity， peak signal-to-noise ratio， structural similarity index， and mean structural similarity index for two-dimensional images， indicating better reconstruction performance and significant practical value.

158 Parallel Magnetic Resonance Imaging Reconstruction Based on Multi-scale Feature Fusion Preprocessing and Deep Sparse Networks

XUE Lei , DUAN Jizhong

2025, 40(4):1082-1095. DOI: 10.16337/j.1004-9037.2025.04.020

[Abstract](190) [HTML](174) [PDF 5.42 M](462)

Abstract:
Magnetic resonance imaging （MRI） plays a crucial role in medical diagnosis， but prolonged scanning times can cause patients discomfort and motion artifacts. Parallel imaging techniques and compressed sensing theory indicate that undersampling k-space data can enhance the scanning speed， where parallel MRI accelerates the imaging process by utilizing multiple receiving coils to simultaneously acquire data from multiple channels. Leveraging its powerful feature extraction and pattern recognition capabilities， deep learning demonstrates great potential in undersampled MRI reconstruction. To overcome the limitations of existing technologies （e.g.， the need for automatic calibration signals， reconstruction instability）， this paper proposes an innovative reconstruction method aimed at efficiently and accurately reconstructing high-quality parallel MRI images from undersampled k-space data. The core framework of this method is a deep sparse network that unfolds the iterative process of the iterative shrinkage-thresholding algorithm （ISTA） for solving sparse models into a series of trainable layers within a deep neural network framework. Additionally， this paper introduces an adaptive preprocessing module based on multi-scale feature fusion， which further enhances the sparse representation capability of the network by integrating standard convolutions with heterogeneous convolutional kernels. Experimental results demonstrate that， compared to other advanced methods， the proposed method exhibits superior reconstruction performance across multiple datasets， including higher peak signal-to-noise ratio （PSNR） and structural similarity （SSIM）， as well as lower high-frequency error norms.

159 Event Detection Method Based on Type-Semantic Prompts

DING Yuanyuan , ZHANG Shunxiang , WEN Hua , JIAO Yixuan , ZHANG Jixu , CAO Yuxuan

2025, 40(2):517-529. DOI: 10.16337/j.1004-9037.2025.02.018

[Abstract](446) [HTML](352) [PDF 1.61 M](356)

Abstract:
Addressing the issue of error propagation in existing research that decomposes the event detection process into two staged tasks of trigger recognition and classification， this paper proposes an event detection method based on type-semantic prompts. This method uses event types as prompt information to guide the model in extracting triggers corresponding to the event types from event text. It enables the parallel execution of trigger recognition and classification tasks， thereby mitigating the issue of error propagation between tasks. Firstly， the cross-attention mechanism is utilized to process the representation of the event text and the prompt template consisting of event types， obtaining a fused prompt representation that integrates the event text information. Then， the cosine similarity between the prompt representation and the event context representation is computed to obtain the probability distribution of the trigger positions corresponding to the event types in the event text. Finally， the position of the trigger corresponding to the event type is determined based on the probability distribution of positions， thus achieving parallel execution of trigger recognition and classification tasks. Experimental results on the ACE2005 and MACCROBAT-EE datasets demonstrate an improvement in the F₁ score of the proposed method in event detection tasks.

160 Online Semantic Enhancement Hashing

ZHAO Zhijie , KANG Xiao , ZHANG Xuening , WANG Shaohua , LIU Xingbo , NIE Xiushan

2025, 40(4):1096-1106. DOI: 10.16337/j.1004-9037.2025.04.021

[Abstract](172) [HTML](167) [PDF 2.54 M](366)

Abstract:
Batch-based hash learning methods are usually inadequate for real-time online retrieval of large-scale streaming data. Therefore， online hashing has emerged as a promising solution， enabling the learning of hash codes for new data without revisiting old data and adapting hash functions to coming data. However， several challenges persist， including semantic drift caused by insufficient exploration of inter-class relationships and data forgetting resulting from limited association between new and old data. To address these challenges， this paper proposes a novel supervised method named online semantic enhancement hashing （OSEH）. It designs a triple matrix factorization framework， which mutually bridges the gap of original features and one-hot labels， thereafter constructing a fine-grained label matrix. Moreover， by seamlessly integrating label embedding and pairwise similarity， the proposed method effectively embeds enriched semantics into the process of hash learning， optimizing both hash code and function. Experimental evaluations conducted on benchmark datasets validate the effectiveness of the proposed method.

161 Learnable Mask and Position Encoding Based Occluded Pedestrian Re-identification

YANG Zhenzhen , CHEN Yanan , YANG Yongpeng , WU Xinyi

2025, 40(1):217-229. DOI: 10.16337/j.1004-9037.2025.01.017

[Abstract](646) [HTML](528) [PDF 3.33 M](460)

Abstract:
Although the pedestrian re-identification task has made significant progress， the occlusion problem caused by different obstacles is still a challenge in practical application scenes. In order to extract more effective features from occluded pedestrians， a learnable mask and position encoding （LMPE） method is proposed. Firstly， a learnable dual attention mask generator （LDAMG） is introduced to adapt to different occlusion patterns， significantly improving the re-identification accuracy of occluded pedestrians. It makes the network more flexible and better adapts to diverse occlusion situations. At the same time， the network learns contextual information through the mask， which further improves the understanding of the scenes. In addition， we introduce the occlusion aware position encoding fusion （OAPEF） module to solve the problem of losing position information in Transformer. This method helps to perform the fusion of different regional position encoding and allows the network to gain stronger expressive ability. The integration of position encoding in all directions enables the network to understand the spatial correlation between pedestrians more accurately， and improves the ability to adapt to the occlusion situation. Finally， simulation experiments are conducted， and results demonstrate that LMPE performs well on Occluded-Duke and Occluded-ReID occluded datasets and Market-1501 and DukeMTMC-ReID unoccluded datasets， which confirms the effectiveness and superiority of the proposed method.

162 Heterogeneous Vehicle Routing Method Based on Variable Step Multi-neighborhood Search

ZHENG Jiyuan , ZHANG Shaobo , WANG Xin , WANG Xiaobo

2025, 40(6):1650-1660. DOI: 10.16337/j.1004-9037.2025.06.021

[Abstract](93) [HTML](110) [PDF 1.24 M](314)

Abstract:
The vehicle routing problem is a classic combinatorial optimization problem that has been proven to be NP-hard. It is widely applied to the fields of transportation logistics and intelligent manufacturing. However， such problems usually assume the homogeneity of vehicles， making it difficult to characterize the differences in vehicles transportation capabilities for different types of commodities in practical scenarios. To address it， a new heterogeneous vehicle routing problem （HVRP） is proposed. By introducing commodity type attributes and vehicle transportation capability constraints， an integer programming model describing the vehicle-order matching relationship is constructed， with the objective of minimizing the total transportation distance. The service relationship between vehicles and customers is formally described by modeling the transport capability of different vehicle types for various product categories. To achieve efficient optimization of the HVRP， a variable step multi-neighborhood search （VSMNS） algorithm is proposed， along with a solution representation method that combines path encoding with linked-list structures. Finally， comparative experiments are conducted among VSMNS with genetic algorithms， hybrid genetic algorithms and artificial bee colony algorithms on 15 test cases. Experimental results show that not only the VSMNS achieves excellent performance in solution quality， but also its performance advantages become more significant as the problem scale increases. Ablation experiments further verify the contribution of each component in the algorithm， demonstrating the effectiveness and superiority of the designed local operators.

163 Hybrid Convolutional Enhancement and Content-Aware Attention for Cross-Modality Person Re-identification

YANG Zhenzhen , WU Xinyi

2025, 40(6):1596-1607. DOI: 10.16337/j.1004-9037.2025.06.017

[Abstract](154) [HTML](96) [PDF 2.51 M](309)

Abstract:
Cross-modality person re-identification， as a research hotspot in the field of computer vision， aims to solve the challenge of matching pedestrians across varying imaging conditions. Existing methods focus on extracting modality-shared features， but fail to fully mine the detailed features that are crucial for discriminative person identities. To address this issue， a hybrid convolutional enhancement and content-aware attention （HCECA） for cross-modality person re-identification is proposed， which aims to extract pedestrian features with more detailed information. First， a hybrid convolutional enhancement （HCE） module is embedded in the backbone network to capture richer cross-modality feature representation， enhancing the distinctiveness and robustness of the features. Second， a content-aware attention （CA） module is employed to mine rich detailed information， thereby improving the discriminability of pedestrian features. Finally， experiments are performed on the SYSU-MM01 and RegDB datasets. The proposed HCECA attains the Rank-1 accuracy of 72.21% and the mean average preeison（mAP） of 69.89% in the all-search mode on the SYSU-MM01 dataset， while achieving the Rank-1 accuracy of 92.23% and the mAP of 85.08% in the visible-infrared mode on the RegDB dataset. Both results outperform better than those of current cross-modality person re-identification methods.

164 Multi-radar Collaborative Anti-deception Jamming Method Based on Convolutional Neural Network

ZHAO Shanshan , SHEN Qi , MIAO Jianing

2025, 40(6):1518-1526. DOI: 10.16337/j.1004-9037.2025.06.011

[Abstract](142) [HTML](104) [PDF 1.73 M](319)

Abstract:
Existing multi-station fusion technologies focus on utilizing intuitive features such as echo amplitude correlation and spatial location. However， the comprehensiveness of manual feature extraction is insufficient， which can easily lead to signal resource waste， incomplete feature extraction， and limited generalization of discrimination algorithms. To address this issue， this paper innovatively proposes a jamming identification strategy that integrates multi-radar cooperative detection with convolutional neural network. This approach leverages convolutional neural networks to deeply explore unknown information in echo data， extracting differences between real and false targets in multidimensional deep features， surpassing single spatial correlation differences， and achieving deception jamming identification. Finally， simulation experiments validate the feasibility of the proposed method in resisting deception jamming and analyze the effects of target size， multi-station radar deployment and phase errors on the proposed algorithm.

165 A Review of Machine Learning for Brain Imaging Genomic Analysis

WANG Meiling , LIU Qingshan , ZHANG Daoqiang

2025, 40(4):869-886. DOI: 10.16337/j.1004-9037.2025.04.003

[Abstract](376) [HTML](478) [PDF 2.35 M](538)

Abstract:
Brain imaging genomics is a burgeoning domain within data science， where an integrated analytical approach is applied to brain imaging and genomics data， frequently in conjunction with other biomarker， clinical， and environmental datasets. This strategy is employed to glean fresh insights into the phenotypic， genetic， and molecular features of the brain， along with their effects on both typical and atypical brain function and behavior. In light of the escalating significance of machine learning in biomedicine and the swiftly expanding corpus of literature in brain imaging genomics， this paper presents a current and exhaustive review of machine learning methodologies tailored for brain imaging genomics. Firstly， the related background and fundamental work in imaging genomics are reviewed. Then， we summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning and present methods for joint association analysis and outcome prediction. Finally， this paper discusses some prospects for future work.

For Authors

Special issue