Special issue

  • Display Type:
  • Text List
  • Abstract List
  • 1  Intraoperative Hypothermia Prediction Model Based on Feature Selection and XGBoost Optimization
    CAO Liyuan FAN Qinqin HUANG Jingying
    2022, 37(1):134-146. DOI: 10.16337/j.1004-9037.2022.01.011
    [Abstract](1129) [HTML](1550) [PDF 1.89 M](2315)
    Abstract:
    In view of the high incidence of intraoperative hypothermia and complex influencing factors in patients undergoing anesthesia, a prediction model of intraoperative hypothermia based on feature selection and XGBoost optimization is proposed to better assist doctors in the clinical diagnosis of patients. Firstly, the random forest (RF) is used to deal with the high-dimensional data sets, and features are selected by the RF out-of-bag estimation. Then, XGBoost hyperparameters are optimized using the genetic algorithm based on elite retention strategy, i.e., EGA. Finally, the prediction is trained according to the optimal parameters and thus can be used to predict intraoperative hypothermia. This model combines the advantages of three algorithms to improve model generalization ability and prediction accuracy. The experimental result shows that the proposed model performs better other seven machine learning classification prediction models such as logistic regression, support vector machine, and so on in prediction accuracy, precision, recall and AUC, and overcomes the three representative hyperparameter tuning methods.
    2  Medical Image Synthesis Based on Optimized Cycle-Generative Adversarial Networks
    CAO Guogang LIU Shunkun MAO Hongdong ZHANG Shu CHEN Ying DAI Cuixia
    2022, 37(1):155-163. DOI: 10.16337/j.1004-9037.2022.01.013
    [Abstract](1176) [HTML](1726) [PDF 1.56 M](2692)
    Abstract:
    The radiation treatment plan system needs to calculate the dose distribution accurately based on CT images, but sometimes clinical MR images can only be obtained. Image synthesis effectively creates new modality images from another modality, which enhances image information. This paper presents a new method of synthesizing high precision and definition of CT images from MR images. To synthesize clearly pseudo CT images, an improved cycle-consistent generative adversarial network (CycleGAN) with densely connected convolutional network (DenseNet) is proposed. Avoiding the disappearance of input information and the vanishing of gradient information, the improved network can synthesize more credible CT images. Compared with the original method, the proposed method is reduced by 5.9% on mean absolute error, increased by 1.1% on structural similarity and increased by 4.4% on peak signal to ratio, which is trained and tested on the dataset of 18 patients. And compared with the deep convolutional neural network and the atlas-based method, the improved CycleGAN is reduced by 0.065% and 0.55% on relative error, respectively. The proposed method can synthesize more vivid CT images owing to the advantages of deep learning model, which better meets the requirements of dose calculation in radiation treatment planning system.
    3  MEL-YOLO:Multi-task Human Eye Attribute Recognition and Key Point Location Network
    WU Dongliang SHEN Wenzhong LIU Linsong
    2022, 37(1):82-93. DOI: 10.16337/j.1004-9037.2022.01.007
    [Abstract](1130) [HTML](2729) [PDF 2.41 M](2713)
    Abstract:
    The existing eye location algorithms have some disadvantages of single task and performance degrade in complex environment such as illumination, glasses and occlusion, so a multi- efficient, light-YOLO and lightweight neural network, MEL-YOLO, is designed for obtaining eye multi-attributes and landmarks. Based on the YOLOV3 network, combining with the enhanced DS-sandglass block, a denormalized coding and encoding method is used in the regression branch of key points to promote the network positioning depth, and the complete intersection-over-union (CIoU) and the mean square error (MSE) are introduced into the loss function, so promoting the overall performance of the network. On the near-infrared dataset, the MEL-YOLO network achieves the position accuracy of 100%, and achieves the attribute recognition rate and the landmark accuracy rate of 98.7% and 96.5%, while reaches 92% and 91% on the UBIRS dataset. The experimental results demonstrate that the MEL-YOLO network can accurately obtain eye multi-attributes and key point information. Also, it is proved that MEL-YOLO is small and robust, and has the firm generalization ability, thus applying to low-performance edge computing devices.
    4  Dual-Attention Network for Acute Pancreatitis Diagnosis with CT Images
    Zhang Jinyi Wan Peng Sun Liang Zhang Daoqiang
    2022, 37(1):147-154. DOI: 10.16337/j.1004-9037.2022.01.012
    [Abstract](992) [HTML](1771) [PDF 2.27 M](2798)
    Abstract:
    Acute pancreatitis (AP) is one of the most common digestive disease, while the analysis based on medical images of AP still depends on simple manual features with low efficiency and accuracy, which is not commensurate with AP’s harmfulness. Due to the anatomical variation of pancreas and complications of AP, AP has complex imaging manifestations and large appearance pattern variation of lesions that exist among patients and lesion kinds. It is challenging for diagnosis of acute pancreatitis based on CT images. To address these issues, we propose a dual-attention network for acute pancreatitis diagnosis. Specifically, the dual-attention network utilizes the global feature to generate local attention feature for each local feature on different stages, and final classification is facilitated by the fusion of multi-scale attention features focusing on lesions of different scales. Meanwhile, channel-domain attention is used to produce attention features based on the dependencies between each channel to improve the model’s feature representation ability. We evaluate the proposed method on the collected real acute pancreatitis dataset. Results show that the proposed network achieve superior performance in acute pancreatitis diagnosis compared with several competing methods, with the sensitivity improved by 3.4%. And the improvement of area under the curve (AUC) the proposed network brings to ResNet is 2.7% higher than other attention model such as SENet.
    5  Somatosensory Interaction Technology Based on Limiting Weighted Skeleton Node Filtering
    CHEN Jinyi LUO Shengqin LI Hongjun
    2022, 37(3):715-724. DOI: 10.16337/j.1004-9037.2022.03.020
    [Abstract](720) [HTML](973) [PDF 1.56 M](2096)
    Abstract:
    To improve the operation mode of robots and improve the recognition accuracy of somatosensory interaction, a somatosensory interaction technique based on the limiting weighted skeleton node filtering is proposed. Firstly, the Kinect sensor is used to acquire the depth scene information, the obtained depth information is processed by the skeleton tracking technology to match the joints of the human body, and the 3D coordinates of the joints of the human body are established. Then the rotation angles of each joint are calculated in the form of space vector mapping, and the proposed limiting weighted filtering algorithm is used to reduce the influence of bone noise by limiting weighted filtering the acquired and calculated joint rotation angles. Finally, the rotation angle is converted into a control command, which is sent to the mechanical arm controller through the Bluetooth serial port, and the steering of the mechanical arm is controlled. Experimental results show that the method can realize the somatosensory interaction effect, and the recognition rate of the robot arm with the human arm movement is 96.3%, and the limiting weighted filtering algorithm can effectively reduce the influence of skeleton noise.
    6  Multi-structure Segmentation of Intracranial Vessels with Aneurysms Based on Adaptive Sampling and Dense Mechanism
    ZHANG Xuyang YAO Yunchu SHI Yue TONG Xin LIANG Xinyu TONG Xinyu LIU Aihua CHEN Duanduan
    2022, 37(4):766-775. DOI: 10.16337/j.1004-9037.2022.04.006
    [Abstract](1115) [HTML](1002) [PDF 3.71 M](2679)
    Abstract:
    Intracranial aneurysm is a common cerebral vascular disease with a relatively high lethiferous and disable rate. An image-based intelligent and accurate diagnosis method of the disease is urgently needed by the clinic in recent years, for which the accurate segmentation of the vessels and aneurysms is very essential. In this work, we present a novel segmentation framework for the multi-structure intracranial vessels with aneurysms. An adaptive image sampling method is designed using the prior gray-level vascular features, and a Dense mechanism-based network is proposed for the vessel segmentation. Time-of-flight magnetic resonance angiography images of 135 patients (age: 54.7±12.7, 75 males) with intracranial aneurysms are collected for training and testing the framework. Compared with the sampling in the original space and image compression (mean DSC: 0.829 and 0.780), the adaptive sampling can obviously improve the accuracy of the vessel segmentation (mean DSC: 0.858). The Dense mechanism-based network can achieve better segmentation result while using less calculation space than the traditional models of 3D UNet, SegNet and DeepLabV3+ (mean DSC: 0.854,0.824 and 0.800). It also shows good robustness for the segmentation of aneurysms with various locations and sizes.
    7  Prediction on Pulse-Taking for H-type Hypertension Under Hybrid Deep Learning Mechanism
    Yang Jingdong Chen Lei Cai Shuchen Xie Tianxiao Yan Haixia
    2022, 37(4):883-893. DOI: 10.16337/j.1004-9037.2022.04.016
    [Abstract](1186) [HTML](863) [PDF 2.08 M](2544)
    Abstract:
    The diagnosis of H-type hypertension requires the determination of the patient’s plasma homocysteine content, which is inefficient and has a wound. Chinese pulse diagnosis helps doctors diagnose H-type hypertension by analyzing patient’s pulse activity and combining inquiry information. Therefore, we put forward a pulse-taking diagnosis classifiction model based on hybrid deep learning model, which can extract the local features via convolutional neural network(CNN) block, and long-term dependency features via Bi-directional long short-term memory(BiLSTM) block. The data come from 325 suspected cases of pulse diagnosis collected by Longhua Hospital affiliated to Shanghai University of Chinese Medicine and Hospital of Integrated Traditional Chinese and Western Medicine. We compare the proposed model with other machine learning models on the pulse diagnosis data respectively. The sensitivity, specificity, accuracy, F1-score, receiver operating characteristic(ROC)area under curve (AUC) values of the proposed model are 79.71%, 69.56%, 77.17%, 83.96%, 0.850 0, respectively, higher than the performance of other machine learning models. The results show that our model has good performance and has good reference value for the clinical diagnosis of traditional Chinese medicine.
    8  Brain Disease Prediction Based on Noise Confusion to Enhance Robustness of Features
    HAO Xiaoke TAN Qihao LI Jiawang GUO Yingchun YU Ming
    2022, 37(4):776-786. DOI: 10.16337/j.1004-9037.2022.04.007
    [Abstract](1214) [HTML](569) [PDF 2.91 M](2477)
    Abstract:
    With the continuous development of medical imaging data, longitudinal data analysis is gradually becoming an important research direction to understand and trace the process of the Alzheimer’s disease (AD). At present, many longitudinal data analysis methods have been proposed, among which multi-task learning is widely used, which can integrate imaging data of multiple time points to improve the generalization ability of the model. Most existing methods can identify shared features at different time points, but these features will contain a certain amount of noise. At the same time, potential associations of disease progression at different time points remain under explored. In this paper, we propose a parameter decomposition and relation-induced multi-task learning (PDRIMTL) method to identify features from longitudinal data. The method can not only identify shared features after noise removal and improve the robustness of shared features, but also can model the intrinsic associations between different time points. The results show that the model can effectively improve the accuracy of AD identification on structural magnetic resonance imaging (sMRI) data at different time points.
    9  Extraction Method of Gait Parameters Based on Kinect System
    ZHANG Xiaoyu CHEN Kai YANG Ying
    2022, 37(4):872-882. DOI: 10.16337/j.1004-9037.2022.04.015
    [Abstract](1282) [HTML](536) [PDF 2.00 M](2508)
    Abstract:
    Based on the definition of gait parameters, this paper proposes and studies a method of collecting and extracting gait parameters using the Microsoft’s Azure Kinect maker-free motion capture system(hereinafter referred to as Kinect system). At the same time, adaptive filtering, exponential filtering, Kalman filtering and no filtering conditions are used in data processing to improve the smoothness of gait data. In order to evaluate the accuracy of Kinect system and the effectiveness of filtering, the results of extracted gait parameter are statistically compared with those of the Qualisys marker-based motion capture system (Company of Sweden, hereinafter referred to as Q marker-based) in the synchronous experiment, and the different filtering methods are evaluated accordingly. The results show that, in general, the Kinect system has a high consistency with the Q marker-based, and the results under the three filtering conditions all fall within the 95% consistency limit. In terms of their gait parameters, the results of the gait speed are quite different under all filtering conditions, which cannot be applied. For other parameters, adaptive filtering and Kalman filtering show good consistency. Kinect system can accurately calculate the gait parameters of healthy people by applying the proposed method and smoothing it with Kalman filtering, and it can replace the marker-based device in some cases.
    10  Human Activity Recognition Based on Heuristic Integrated Feature Selection
    Dai Jianwei Li Ruixiang Chen Jinyao Le Yanfen Shi Weibin
    2022, 37(4):860-871. DOI: 10.16337/j.1004-9037.2022.04.014
    [Abstract](961) [HTML](463) [PDF 1.75 M](2414)
    Abstract:
    To address the problem that artificially extracted redundant feature sets and irrelevant feature sets lead to the degradation of human activity recognition classification performance of wearable sensor, this paper proposes a human activity recognition method based on heuristic integrated feature selection. The method first selects the feature set containing power spectral density (PSD) for recognizing confusing activities. Then, on this basis, the method screens out the lowly correlated feature subsets with the help of Pearson correlation coefficient (PCC) method, then uses an improved sine cosine algorithm (SCA) for features and obtains the optimal feature subset by screening the feature twice. The experimental results show that the feature subset dimension after using this method in the data set collected in the laboratory is 34, and the recognition accuracy rate reaches 98.21%. In the public SCUT-NAA data set for comparison experiments, the feature subset dimension is 39, lower than the feature dimension of previous research methods, and the recognition accuracy rate reaches 96.51%.
    11  Diagnosis of Intracranial Hemorrhage in Brain CT Images Based on Cost-Sensitive Faster R-CNN
    ZHU Xiaowei WAN Peng ZHANG Daoqiang CHENG Le WANG Yi
    2022, 37(4):757-765. DOI: 10.16337/j.1004-9037.2022.04.005
    [Abstract](1102) [HTML](715) [PDF 1.59 M](2389)
    Abstract:
    An intracranial hemorrhage (ICH) is a kind of severe emergency that occurs suddenly in patients’ brain with strong symptoms and high mortality. So it is of great significance to diagnose ICH automatically and quickly based on brain CT images. However, effective clinical application requires not only the accuracy, speed and interpretation ability of models, but also especially the emphasis given to the missed detection of bleeding. Therefore, cost-sensitive Faster R-CNN is proposed in this paper to diagnose ICH, through an automatic adjustment mechanism for the proportion of training samples and a hyperparameter introduced to loss function to measure the importance of positive samples. It can pay more attention to the missed detection situations to improve the detection effect, and diagnose ICH by located target region. A network structure with optimal performance and appropriate parameter is selected for good effect of detection and diagnosis through experiments. And then, results are measured by several indexes. It is shown that the cost-sensitive Faster R-CNN model can detect bleeding well by focusing on missed checks, so as to improve the diagnosis effect under the unbalanced cost.
    12  Clustering Related Factors of Intrinsic Frequency Dynamic Functional Connection in MRI Signal of Mild cognitive Impairment
    LI Dong WU Haifeng BAO Han MA Jia ZENG Yu
    2022, 37(4):798-813. DOI: 10.16337/j.1004-9037.2022.04.009
    [Abstract](1175) [HTML](490) [PDF 4.56 M](2656)
    Abstract:
    Functional connectivity (FC) can represent the ability of brain regions to work together. At present, a combination of dynamic functional connectivity (DFC) and cluster analysis is widely used to study the significant difference analysis and classification of diseases. However, in the existing study, there is no clear standard for the determination of the number of clusters and the selection of clustering results, and the traditional DFC cannot represent the FC information of different frequencies. Therefore, this paper studies the clustering related factors of intrinsic frequency DFC in MRI signal of mild cognitive impairment (MCI). First, the noise-assisted multivariate empirical mode decomposition of the time course (TC) data is performed and the DFC is calculated. Then, the cluster is analyzed through the evaluation-assisted clustering method, and the least square method is used to fit the clustering results. Finally, classifier is used for classification. The contribution of this paper is to suggest a more reasonable clustering method and a more number of clusters to obtain functional connections at different intrinsic frequencies. In the experiment, we used the Alzheimer’s disease neuroimaging (ANDI) database to test the proposed method. The experimental results show that the accuracy of supervised clustering used in this paper is higher than that of unsupervised clustering; the classification accuracy of DFC with natural frequency is higher than that of traditional DFC; the least square fitting can improve classification accuracy.
    13  Acquisition Technology of Multimodality Neurophysiological Signals Based on Asynchronous Chip
    Zhu Lixian Tian Fuze Dong Qunxi Zhao Qinglin He Anping Zheng Weihao Hu Bin
    2022, 37(4):848-859. DOI: 10.16337/j.1004-9037.2022.04.013
    [Abstract](1551) [HTML](1254) [PDF 1.58 M](3147)
    Abstract:
    Most of psychophysiological computing (PPC) studies are under the experimental environments of synchronization theory hypothesis, however neurophysiological representations have asynchronous properties, which cannot be precisely and effectively described in real time using synchronized recording technology. It is being the first issue of PPC to resolve how to recode these asynchronous multi-modality neurophysiological activities with low-power, low-redundancy, real-time and accurate. For this issue, this study focuses on the goals of microscopic neurophysiological activities and macroscopic psychological variables, resolves the design challenges of asynchronous multimodality physiological information recording scheme and corresponding passive physiological signals sensing technology, and designs and develops the first asynchronous physiological process unit (PPU). The PPU has the characteristics of low power consumption, high time series precision, high computing performance and strong anti-interference ability. Finally,we look forward to the future of PPU applied in the research area of brain science and brain-like computing.
    14  Research on Neurodynamic Coupling Based on Synchronization Analysis Between EEG and IMU Signals
    XIE Ping YU Jian ZHANG Tengyu CHENG Shengcui LYU Yan CHEN Xiaoling
    2022, 37(4):736-746. DOI: 10.16337/j.1004-9037.2022.04.003
    [Abstract](985) [HTML](1029) [PDF 3.59 M](2512)
    Abstract:
    Motor control is a process of multifaceted coordination and information interaction among neural, motor and sensory functions. The relationships between motion and physiological information in the motor control system is helpful to understand the mechanism of human motion control. Therefore, to explore the causal relationship and the evolutionary law between electroencephalogram (EEG) and acceleration (ACC) signals during upper limb movement and rest, we apply the coherence method in this study. Firstly, the EEG and ACC signals of 7 subjects are preprocessed to remove the interference components in the signals. Secondly, the coherence values between EEG and ACC signals during the resting, motion-action and motion-maintaining states are calculated respectively, and the significant area is then calculated by the threshold index of significant coherence. The results show that the significant areas in the motion-action state are larger than that of in the motion-maintaining state, and in the motion maintenance state is larger than that of in the resting state. Furthermore, the significant areas between EEG signals of C3 and C4 channels and ACC signals are more significant in the contralateral motor cortex during left and right upper limb movements. These results indicate that there are significant differences between EEG and ACC signals during the resting, motion-action and motion-maintaining states of upper limb movements, which can be helpful to deeply understand the neuromotor control mechanism, and also provide a new quantitative index and the theoretical basis for the assessment of motor function and the early diagnosis of motor dysfunction diseases.
    15  EEG Emotion Recognition Based on Convolutional Joint Adaptation Network
    CHEN Jingxia HU Xiuwen TANG Zhezhe LIU Yang HU Kailei
    2022, 37(4):814-824. DOI: 10.16337/j.1004-9037.2022.04.010
    [Abstract](1090) [HTML](1098) [PDF 1.16 M](2167)
    Abstract:
    A new electroencephalogram (EEG) emotion recognition method based on deep convolutional neural network-joint adaptation network (CNN-JAN) is presented. It incorporates the idea of joint adaptation in transfer learning into deep convolutional networks. Firstly, the model uses a rectangular convolution kernel to extract the deep emotion-related spatial features between EEG data channels. Then, the extracted spatial features are input into the adaptation layer with multi-kernel joint maximum mean discrepancy (MK-JMMD) for transfer learning, aiming to reduce the distribution differences between the source and target domains. The experiments are carried out on differential entropy features and differential causality features of EEG data from the SEED dataset to verify the effectiveness and advantages of the proposed method. As a result, the within-subject emotion classification accuracy on differential entropy features reaches 84.01%, and the cross-subject emotion classification accuracy is also improved compared with other current popular transfer learning methods.
    16  Early Diagnosis of Alzheimer’s Disease Based on Feature Enhanced Pyramid Network
    SHI Lei PENG Shaokang ZHANG Yameng ZHAO Guohua GAO Yufei
    2022, 37(4):727-735. DOI: 10.16337/j.1004-9037.2022.04.002
    [Abstract](1171) [HTML](622) [PDF 1.69 M](6022)
    Abstract:
    Alzheimer’s disease (AD) is an irreversible neurodegenerative disease, whose early medical intervention is of great significance to control and improve the condition. In recent years, deep learning methods have been widely used by researchers to analyze magnetic resonance imaging (MRI) of AD for early diagnosis. However, the changes of brain structure are less different from those of normal people in the early stage, and the existing single-scale analysis methods are difficult to capture these subtle differences. Aiming at the above problem, this paper proposes a feature enhanced pyramid network (FEPN) for early diagnosis of AD. The high-level features are supplemented by the contextual information extracted from the designed shallow feature re-extraction, and the fusion weights are calculated to guide the fusion of high-level and low-level feature maps, which enhance the interaction of contextual information and the matching degree of multi-scale feature fusion. The Alzheimer datasets published by Kaggle are employed to conduct comparison experiments to verify the performance of the proposed approach. The comparison experiment employs the Alzheimer dataset published by Kaggle to verify the performance. Compared with related methods, FEPN achieves the SOTA classification accuracy of MRI of four AD brain states (non-demented, very mild demented, mild demented, moderate demented).
    17  Cerebral Hematoma Segmentation and Bleeding Volume Measurement Based on Self-attention Mechanism
    LI Yao YU Nannan HU Chunai KE Mingchi YU Jinkou
    2022, 37(4):839-847. DOI: 10.16337/j.1004-9037.2022.04.012
    [Abstract](960) [HTML](909) [PDF 3.52 M](2696)
    Abstract:
    Hemorrhage volume is an important indicator for the grading of intracerebral hemorrhage disease, the determination of treatment options, and the judgment of prognosis. However, because of the complexity of the brain structure and the variety of morphology and location of the hematoma, accurate and reliable segmentation of the hematoma and measurement of the amount of hemorrhage are extremely difficult. This paper presents an algorithm for cerebral hematoma segmentation and blood volume measurement based on a self-attention mechanism deep learning network. First, to overcome the complexity of brain structure and make up for the shortcomings that convolution module can only perform linear operations and extract local features, a self-attention module is introduced at the end of the encoder of the segmentation network, and through higher order operations, the feature association properties of the whole domain of the image are extracted and the hematoma is extracted from a global perspective. Second, a channel and spatial attention module is introduced to obtain weights on the individual channels and feature regions through training learning, by which useful information is highlighted and useless information is suppressed. Finally, the hemorrhage volume is calculated based on the hematoma segmentation results of multislice CT imaging slices in patients with intracerebral hemorrhage. The experimental results on the real CT imaging data set of intracerebral hemorrhage show that the proposed algorithm achieves better results on cerebral hematoma segmentation and hemorrhage volume measurement in multiple cases, and even is still relatively effective in the case of irregular shape or close to skull.
    18  EEG Signal Classification of Epilepsy Based on Deep Learning
    XU Qing GE Cheng CAI Biao LU Yi CHANG Shan
    2022, 37(4):787-797. DOI: 10.16337/j.1004-9037.2022.04.008
    [Abstract](1898) [HTML](1413) [PDF 2.74 M](3273)
    Abstract:
    Effectively analyzing, processing and accurately classifying epileptic electroencephalographic (EEG) signals can further improve the problem of epilepsy detection. Therefore, various deep learning approaches have been gradually applied to this problem, such as using the BiLSTM model to process the 1D time series data of epileptic EEG. To further improve the accuracy of epileptic EEG classification, the 1D time series data of epileptic EEG is converted into 2D images and the EfficientNetV2 model is used to achieve binary classification for epilepsy detection in this paper. At the same time, the gradient-weighted class activation mapping (Grad-CAM) is introduced for visual analysis of 2D images classification. By performing classification experiments on a pre-processed version of the epilepsy EEG signal dataset from the University of Bern, Germany, the EfficientNetV2 model achieves the accuracy of 98.69%, which is better than the BiLSTM model. The result indicates that the EfficientNetV2 model can effectively achieve epileptic EEG classification by 2D EEG images with higher classification accuracy.
    19  Attention Training Based on Double Convolutional Neural Network Fusion
    XU Xin ZHANG Jiaxin ZHANG Ruhao
    2022, 37(4):825-838. DOI: 10.16337/j.1004-9037.2022.04.011
    [Abstract](1034) [HTML](971) [PDF 2.02 M](2462)
    Abstract:
    Students’ learning situation is closely related to their classroom attention state. In order to explore whether attention training can improve classroom attention, the electroencephalogram (EEG) signals of non-attention and attention states of ten students before and after α music training are collected and compared. It is worth noting that EEG signal is dynamic in nature and has the characteristics of low signal-to-noise ratio and high redundancy. In order to avoid the problem of poor recognition of EEG signals directly through neural network, 11 features of signal sample entropy (SampEn), energy and energy ratio of each band are extracted, and these features are fused into multi-feature images as the input of neural network model. In addition, the weighted fusion of AlexNet and VGG11 network models is used to form a double convolution neural network (CNN), which can further improve the performance of image classification. The results show that the performance of the fusion model with double CNN can achieve a better performance compared with the model with single CNN. In particular, the recognition accuracy of the proposed model can reach 97.53%. It can be found that after α music training, the EEG features of the subjects are significantly different from those before, and the classification accuracy of the network model can be 4% higher than that before training. This observations show that the considered α music training can improve the attention level of healthy students.
    20  Neural Network for Parpameter Estimation of Intravoxel Incoherent Motion Based on Sparse Coding
    ZHENG Tianshu YAN Guohui YE Chuyang WU Dan
    2022, 37(4):747-756. DOI: 10.16337/j.1004-9037.2022.04.004
    [Abstract](956) [HTML](704) [PDF 1.34 M](2115)
    Abstract:
    Diffusion magnetic resonance imaging (dMRI) is an important medical imaging tool for the non-invasive detection of microstructures in biological tissues. Among others, intravoxel incoherent motion (IVIM) is a widely used dMRI model to separate diffusion and microvascular perfusion. Conventional methods to resolve IVIM parameters rely on fitting a biexponential model from multi-b-value dMRI data (typically ≥10 b-values), which requires a relatively long acquisition time. Such an acquisition is challenging for IVIM imaging of the body, such as placental IVIM, which is strongly influenced by both fetal and maternal motions. Deep learning models can accelerate the dMRI acquisition using a subset of the q-space data. However, common deep learning based on convolutional neural networks is not relevant to biophysical models and, therefore, the outputs of the network are difficult to interpret. Here, this work combines sparse coding with deep learning to develop a sparse coding based deep neural network for the IVIM parameter estimation that takes advantage of the feature representation of deep networks while incorporating a potential bi-exponential model to estimate the microcirculation parameters of the placenta. Compared with other algorithms, the proposed algorithm demonstrates advantages in accuracy and generalizability.
    21  Homogenization Study of Brain Network of Suicidal Patients with Major Depressive Disorder from Multiple Imaging Sites
    LIANG Jun SONG Yanxin WANG Yueyun WAN Chunxiao
    2022, 37(5):1115-1125. DOI: 10.16337/j.1004-9037.2022.05.016
    [Abstract](933) [HTML](868) [PDF 2.65 M](2314)
    Abstract:
    Currently, there is heterogeneity in the functional brain images of suicidal patients with major depressive disorder (MDD) from multiple imaging sites, resulting in computational difficulties and affecting the reliability. According to the data from homogenizing multisite resting-state functional magnetic resonance imaging(rfMRI) from patients with MDD, the influence of suicidal tendencies on the MDD brain functional network is studied. Firstly, rfMRI of 99 MDD patients (including 67 non-suicidal MDD(nMDD), 32 suicidal MDD(sMDD)) along with 72 healthy controls(HC) subjects from 3 sites are enrolled. After preprocessing of rfMRI, the functional connectivity of the Pearson correlation is calculated on the whole brain, and multisite functional connectivity is homogenized by ComBat technology. Then, the brain network topology is established and the graph theory analysis is performed by taking the existence of small-world attributes as the criterion for sparsity, the functional connectivity as edges and the brain areas as the nodes. Comparisons of the significance between groups are made on node degrees and node efficiency indicators in the graph theory. Experimental results show that the heterogeneity of functional connectivity in sites is effectively eliminated by the homogenization algorithm.Compared with the nMDD and HC groups,the sMDD group has siginificant between-group difference (pFDF<0.05)in inferior cerebellar lobule and vermis cone. There exist abnormal functional activities in the inferior cerebellar lobules and vermis cones due to the suicidal tendencies. Based on the multisite homogenization of MDD network-level functional connectivity, this study effectively extracts the network characteristic indicators of suicidal patients and provides the functional imaging markers for the suicide risk assessment.
    22  Two-Person Collaborative Brain-Controlled Robotic Arm System for Writing Chinese Character Using P300 and SSVEP Features
    HAN Jin DONG Bowen LIU Miao XU Minpeng MING Dong
    2022, 37(6):1401-1411. DOI: 10.16337/j.1004-9037.2022.06.020
    [Abstract](1183) [HTML](628) [PDF 3.27 M](2188)
    Abstract:
    Brain-controlled technology based on brain-computer interface (BCI) has developed rapidly and made great progress. However, the existing research mostly adopts the single-person brain-controlled manner, which has the problems of poor execution efficiency and low degree of controllability, making it difficult to meet the needs of complex manipulation tasks. To address this problem, this study adopts a time-frequency-phase hybrid encoding method, and designs a collaborative strategy. A two-person collaborative brain-controlled robotic arm system with 108 instructions has been developed, enabling two people to write Chinese characters simultaneously one stroke by one stroke. The average online accuracy of the eight subjects is 87.92%, and the corresponding average online information-transfer rate (ITR) is 66.00 b/min. This system extends the BCI information interaction manner, and preliminarily verifies the feasibility and effectiveness of collaborative BCI manipulation of robotic arm. It provides technical support for collaborative BCI.
    23  Character Analysis and Unfolding Study of Protein Molecular Machine Structures
    ZHANG Lili JIANG Yifeng XIE Liangxu KONG Ren CHANG Shan
    2022, 37(5):1126-1133. DOI: 10.16337/j.1004-9037.2022.05.017
    [Abstract](836) [HTML](922) [PDF 2.31 M](2254)
    Abstract:
    Molecular machine is a kind of machine composed of molecular scale materials and can perform a certain processing function. Three-dimensional structure determines the related properties and functions of proteins. How the amino acid (residue) sequence of a protein folds into a specific three-dimensional structure, that is, understanding the folding mechanism and characteristics of protein structure is of great significance for the study of molecular machines. Therefore, it is necessary to use a fast and simple simulation method to study the folding mechanism information of protein structure. In this paper, based on the natural state topology of protein, we use Gaussian network model to study protein GB1 and analyze the structural characteristics of protein GB1 and its unfolding process. The results are in good agreement with experimental data and molecular dynamics simulation data, showing that the elastic network model is suitable for the study of protein structure.
    24  Realistic Medical Image Augmentation by Using Multi-loss Hybrid Adversarial Function and Heuristic Projection Algorithm
    WANG Jian CHENG Chufan CHEN Fang
    2023, 38(5):1104-1111. DOI: 10.16337/j.1004-9037.2023.05.009
    [Abstract](647) [HTML](630) [PDF 2.15 M](1147)
    Abstract:
    Early detection of COVID-19 allows medical intervention to improve the survival rate of patients. The use of deep neural networks (DNN) to detect COVID-19 can improve the sensitivity and speed of interpretation of chest CT for COVID-19 screening. However, applying DNN for the medical field is known to be influenced by the limited samples and imperceptible noise perturbations. In this paper, we propose a multi-loss hybrid adversarial function (MLAdv) to search the effective adversarial attack samples containing potential spoofing networks. These adversarial attack samples are then added to the training data to improve the robustness and the generalization of the network for unanticipated noise perturbations. Especially, MLAdv not only implements the multiple-loss function including style, origin, and detail losses to craft medical adversarial samples into realistic-looking styles, but also uses the heuristic projection algorithm to produce the noise with strong aggregation and interference. These samples are proven to have stronger anti-noise ability and attack transferability. By evaluating on COVID-19 dataset, it is shown that the augmented networks by using adversarial attacks from the MLAdv algorithm can improve the diagnosis accuracy by 4.75%. Therefore, the augmented network based on MLAdv adversarial attacks can improve the ability of models and is resistant to noise perturbations.
    25  Domain Generalization via Domain-Specific Decoding for Medical Image Segmentation
    Ye Huaize Zhou Ziqi Qi Lei Shi Yinghuan
    2023, 38(2):324-335. DOI: 10.16337/j.1004-9037.2023.02.009
    [Abstract](1424) [HTML](672) [PDF 3.11 M](2066)
    Abstract:
    Multi-source domain generalization (DG) aims to train a model uses semantic information of different domains and can be generalized to unknown domains. In the medical image, the gap between different domains is relatively large, and the model will suffer from performance drop in the unknown domain. In order to solve this problem, this paper proposes a network structure which encodes images for features and decodes domain specific features. The model uses a generic encoder, which learns all source domains for the domain-invariant features, and several domain-specific decoders to reconstruct the original images to promote the ability of extracting image features. Meanwhile, these decoders also help to generate transferred image to engage in adversarial learning with images of source domains in order to improve the encoder’s ability of learning invariant features. In addition, we also introduce a special Cutmix strategy which change foreground information of different domain images to augment the data set in the model to enhance the generalization ability of the model and further improve the performance of our network structure. In two medical image segmentation tasks, a large number of experimental data show that the proposed model has excellent performance compared with the existing advanced models. In addition, a series of ablation experiments are carried out to prove the effectiveness of the model.
    26  Diagnosis of Primary Insomnia Based on Synchronous Resting-State Brain Network
    JIN Mingyan ZHANG Chi CHANG Yi CONG Fengyu
    2023, 38(4):802-814. DOI: 10.16337/j.1004-9037.2023.04.005
    [Abstract](675) [HTML](902) [PDF 3.37 M](1032)
    Abstract:
    About a third of the world’s population suffers from insomnia, and many studies have shown that elevating high frequency band activity is an important cause of insomnia. However, due to the existence of large disturbance factors, it is difficult to evaluate in daily resting state conditions. Therefore, the Beta and Gamma bands of electroencephalogram (EEG) are extracted from patients with primary insomnia and normal controls. The phase locking value (PLV), which is more suitable for nonlinear and non-stationary signals such as EEG, is used to obtain the adjacency matrix to construct rest-state functional brain network. The adaptive threshold technology is used to binarize the adjacency matrix. In order to fuse various characteristics of brain networks, a comprehensive measurement index of brain networks is proposed for insomnia detection. In Beta frequency band, the comprehensive indexes are significantly different between the primary insomnia group and the normal control group (p=0.044). The automatic classification using support vector machine (SVM) achieves the accuracy of 77.7% and the sensitivity of 90.7% in Beta band. Compared with the original network characteristics, the classification accuracy and the sensitivity of the proposed comprehensive index are increased by 9.4% and 20.7%, respectively. At the same time, compared with the existing studies, the classification accuracy and the sensitivity of the proposed comprehensive index are increased by 19.4% and 20.7%, respectively. It shows the proposed method has potential application value in the diagnosis of insomnia.
    27  An Adaptive Denoising Algorithm for Few-Channel EEG
    CHEN He ZHANG Hao CHAI Yifan LI Xiaoli
    2023, 38(4):824-836. DOI: 10.16337/j.1004-9037.2023.04.007
    [Abstract](738) [HTML](951) [PDF 3.06 M](1069)
    Abstract:
    Few-channel electroencephalogram (EEG) is more suitable and affordable for practical use as a portable or wearable device, but it is subject to a variety of unpredictable artifacts, making removal of artifacts extremely difficult. In the feature space, the artifact-related components are dispersed while the components related to brain activities are closely distributed. We propose an outlier detection-based method for artifact removal under the few-channel condition. The underlying components (sources) are extracted using wavelet decomposition and blind source separation methods, and the artifact-related components far from the center of distribution of all components are considered as outliers and are identified using one-class support vector machine. In the quantitative analyses with semi-simulated data, the proposed method outperforms the threshold-based methods for various artifacts, including EMG, electro-oculogram(EOG) and power line noise. The visualization of the clusters of components demonstrates the effectiveness of the hypothesis. This study innovatively combines the ideas of blind source separation and outlier detection, without setting artifact-specific parameters, and is capable of adaptively removing various artifacts while effectively retaining brain activities, showing excellent performance and usability.
    28  Brain Network Analysis of Patients with ADHD Based on Subnetwork Similarity
    WANG Xinxin SONG Xiaoying CHAI Li
    2023, 38(5):1142-1150. DOI: 10.16337/j.1004-9037.2023.05.012
    [Abstract](600) [HTML](671) [PDF 1.83 M](1025)
    Abstract:
    Attention deficit hyperactivity disorder (ADHD) seriously affects children’s development, so extensive attention has been paid to its effective diagnosis. A new method for calculating graph similarity is proposed, which combines the topological information of brain networks with signals on the network. The Pearson correlation coefficient is used to construct the fully connected brain network. Based on the sparse representation, the node subnetwork is extracted from the underlying structure, and the similarity of the subnetwork is calculated according to the graph kernel function. Finally, the global index of brain network similarity is given. Experimental results of classifying ADHD-200 in the public dataset characterized by similarity between subjects show that the proposed method can distinguish ADHD patients and healthy people with 93.1% accuracy, and the classification performance is significantly superior than other existing methods. In addition, it is found that ADHD patients have stronger connections in brain regions, such as anterior central gyrus, thalamus, hippocampus and insula.
    29  Emotion Recognition Based on Graph Features Extracted from EEG Networks
    Li Cunbo Yang Lei Chen Zhaojin Wang Yifeng Li Peiyang Li Fali Yao Dezhong Xu Peng
    2023, 38(4):815-823. DOI: 10.16337/j.1004-9037.2023.04.006
    [Abstract](1056) [HTML](1132) [PDF 1.09 M](1311)
    Abstract:
    To accurately evaluate individual emotional states, we propose a graph feature learning and recognition algorithm for electroencephalogram(EEG)-based emotion recognition. In the proposed algorithm, the original EEG data are first used to construct the corresponding EEG network. And then, the local adjacency graph between different emotional EEG network samples is constructed in the high-dimensional EEG brain network space, which aims to capture the distribution of the emotional EEG brain networks, and the graph Laplacian matrix can be estimated with the adjacency graph. Thirdly, the optimal low-dimensional graph embeddings of emotional EEG brain networks are obtained by the spectral graph theory, and the emotional EEG brain network samples can be represented in the low-dimensional space, in which the initial emotional EEG brain networks can be represented with a set of network features. Finally, based on the extracted emotional EEG brain network features, the optimal support vector machine classifier is trained and utilized in the emotion recognition. The verification experiment is carried out on the international public emotional EEG datasets, and experimental results show that compared with traditional emotion recognition algorithms, the proposed method can effectively improve the accuracy of emotion recognition, and achieve a robust recognition effect of 91.85% (SEED dataset, 3-class), 79.36% (MAHNOB-HCI dataset, 3-class) and 79% (DEAP dataset, 4-class) on three public datasets, respectively.
    30  Lightweight Segmentation Algorithm for TAO Diseased Areas Based on DSE-Net
    Chen Jiayu HE Hong ZHU Haipeng SONG Xuefei
    2023, 38(4):915-925. DOI: 10.16337/j.1004-9037.2023.04.014
    [Abstract](637) [HTML](683) [PDF 3.58 M](867)
    Abstract:
    The clinical activity score (CAS) is one of the important assessment methods for clinical diagnosis of thyroid associated ophthalmopathy (TAO) disease. Manual diagnosis of TAO is susceptible to the subjective experience of ophthalmologists due to the diversity of TAO symptoms and the influence of non-diseased areas. The accurate acquisition of key facial areas of TAO patients is one of the significant prerequisites for early diagnosis of TAO. Therefore, this paper proposes a lightweight algorithm for automatic segmentation of TAO diseased areas based on DSE-Net. The DSE-Net adopts U-Net as the backbone model, and the dense squeeze-and-excitation (DSE) channel attention module, which is designed to extract low-level features of the encoding structure layer by layer and fuse high-level features of the decoding structure layer, further enhances the feature extraction capability of the model. Tests on the sclera, eyelid, and lacrimal caruncle datasets demonstrate the effectiveness of DSE-Net, with Dice coefficients reaching 84.8%, 84.7%, and 92.7%, and IoUs reaching 74.0%, 74.7%, and 86.5%, respectively. The superiority of DSE-Net is also proved by a large number of comparative experiments. The proposed model has fewer parameters, simple structure and strong feature extraction ability, providing significant information for the early diagnosis and prognosis treatment of TAO.
    31  An Improved AdaBoost Algorithm for Identifying Breath Signals of Liver Cancer
    HAO Lijun HUANG Gang
    2023, 38(4):860-872. DOI: 10.16337/j.1004-9037.2023.04.010
    [Abstract](452) [HTML](545) [PDF 2.94 M](907)
    Abstract:
    An improved AdaBoost reinforcement learning algorithm is proposed for distinguishing the breath signals of healthy patients and liver cancer patients. First, the breath signals of volunteers, including healthy controls and liver cancer patients, are collected and their main features are extracted by Relief algorithm. Then, based on Stacking model, several groups of base classifiers are trained by traditional machine learning algorithms and some sub-classifiers are then constructed. To reduce the influence of training samples on the classifier performance, a K-fold crossover is applied, and k base classifiers could be successively obtained to form a base classifier group. Further, the prediction results of this base classifier group, i.e., sub-classifiers on the test set, are obtained by the voting method. Then, according to the prediction error rate of each sub-classifier on the training set, the training set is updated and the weight coefficients of each sub-classifier are obtained according to the prediction error rate of each sub-classifier on the training set. Finally, the prediction results of multiple sub-classifiers are weighted and combined to obtain the final prediction results. Experimental results show that the improved AdaBoost algorithm can achieve an accuracy of about 90% and the specificity and precision are more than 95% in discriminating the breath of liver cancer from the breath of healthy controls. Compared with the traditional AdaBoost algorithm, the proposed algorithm has significantly lower error rate and improved robustness when used for liver cancer breath detection. Therefore, the improved AdaBoost algorithm can effectively improve the accuracy of liver cancer breath identification, which is important for the research of identifying liver cancer by breath for early diagnosis.
    32  Cognitive Development Prediction Algorithm for Healthy Elderly Based on Multivariate Morphological Features
    ZHANG Lingyu WANG Yalin ZHAO Ziyang HUANG Wenjing ZHENG Weihao YAO Zhijun HU Bin
    2023, 38(4):837-848. DOI: 10.16337/j.1004-9037.2023.04.008
    [Abstract](703) [HTML](465) [PDF 1.64 M](846)
    Abstract:
    Because conventional morphological indicators such as volume and surface area are too general for the subcortical nuclei, it is difficult to detect the subtle changes in the surface morphology using traditional morphological feature acquisition methods. To solve this problem, we propose a fine feature extraction algorithm for subcortical nuclei and apply it to the cognitive state prediction task of the elderly. Using surface conformal parameterization, surface conformal representation, and the surface fluid registration based on mutual information, 15 000×2 morphological features are extracted from both the bilateral hippocampus and amygdala of 46 subjects. Using the dimensionality reduction process, including patch selection, sparse coding and dictionary learning, and max-pooling, we avoid the dimensionality curse while fully preserving the texture information of nuclei. Finally, taking tree as the weak learner, we integrate the final strong classifier using the GentleBoost algorithm for cognitive prediction. The results show that the prediction accuracy of 85% could be achieved only by the novel features of the hippocampus and amygdala, providing a new way perspective for fine feature mining of subcortical structures.
    33  Early Mycosis Fungoides Recognition Based on Multimodal Image Fusion
    XIE Fengying ZHAO Danpei WANG Ke LIU Zhaorui WANG Yukun ZHANG Yilan LIU Jie
    2023, 38(4):792-801. DOI: 10.16337/j.1004-9037.2023.04.004
    [Abstract](974) [HTML](778) [PDF 1.57 M](1125)
    Abstract:
    Early mycosis fungoides (MFs) may present as erythematous scaly skin lesions, which are difficult to distinguish from benign inflammatory skin diseases such as psoriasis and chronic eczema. This paper presents a new method based on multimodal image fusion for early mycosis fungoides recognition. The method adopts the ResNet18 network to extract features of single-modality images based on dermoscopic images and clinical images, designs the cross-modal attention module to achieve feature fusion of two modal images, and uses the self-attention module to extract the key information and reduce redundant information in the fusion features, thereby improving the accuracy of intelligent identification of early mycosis fungoides. Experimental results show that the proposed intelligent diagnosis model outperforms the comparison algorithms. At the same time, the proposed intelligent model is applied to the actual clinical diagnosis of dermatologists. Through the changes in the average diagnostic accuracy of the experimental group and the control group, it is confirmed that the proposed intelligent diagnostic model can effectively improve the clinical diagnosis level.
    34  Interpretable Deep Learning Model Based on Knowledge Representation Vectors and Its Application in Disease Prediction
    XU He ZHENG Qunli XIE Zuoling CHENG Haitao LI Peng JI Yimu
    2023, 38(4):777-792. DOI: 10.16337/j.1004-9037.2023.04.003
    [Abstract](791) [HTML](936) [PDF 912.54 K](980)
    Abstract:
    In recent years, deep learning methods have been widely applied to various disease prediction tasks, even surpassing human experts in some aspects. However, the black box nature of the algorithm limits its clinical application. In this paper, the knowledge representation and reasoning learning and deep learning methods are combined to build an interpretable deep learning model incorporating knowledge representation and reasoning vectors. The model first builds a relationship graph between physical examination indicators and test values according to the normal range of physical examination indicators, and the relationship graph between physical examination indicators and test values is coded through the deep learning model based on knowledge representation and reasoning learning. Then, the patients’ physical examination data are expressed as vectors, which are input into the self-attention mechanism and the classifier constructed by convolutional neural network to realize the disease prediction. When the model is applied to the prediction experiment of diabetes, the accuracy and recall of the model are better than those of the comparative machine learning methods. Compared with the random forest algorithm, the accuracy and recall are also improved by 0.81% and 5.21%, respectively. Experimental results show that the application of knowledge representation and reasoning learning and deep learning technological convergence to diabetes prediction through interpretable methods can achieve the purpose of early detection and auxiliary diagnosis of diabetes.
    35  Automatic Sleep Staging Based on Deep Learning: A Review
    LIU Ying CHU Haoran ZHANG Haowei
    2023, 38(4):759-776. DOI: 10.16337/j.1004-9037.2023.04.002
    [Abstract](2177) [HTML](1798) [PDF 5.02 M](2554)
    Abstract:
    Sleep staging is a vital process for analyzing polysomnographic recordings, which plays a key role in sleep monitoring and diagnosis of sleep disorders. Traditional manual sleep staging requires expertise, which is cumbersome and time-consuming. Deep learning constructs models by simulating the mechanism of human brain to interpret information, and has powerful automatic feature extraction and feature expression functions. Applying deep learning method to the research of sleep staging does not rely on manually designed features and can realize the automation of sleep staging. This article emphasizes on some typical automatic sleep staging studies since 2017, and conducts a systematic review of deep learning model applied in automatic sleep staging from two aspects of single-view and multi-view input. Then, the difficulties of deep learning model based on multi-view input are analyzed and its potential research value is pointed out. Finally, possible future research direction is discussed.
    36  Contrast-Enhanced Ultrasound Analysis Based on Machine Learning: A Survey
    WAN Peng LIU Han ZHAO Junyong XUE Haiyan LIU Chunrui SHAO Wei KONG Wentao ZHANG Daoqiang
    2023, 38(4):741-758. DOI: 10.16337/j.1004-9037.2023.04.001
    [Abstract](1454) [HTML](1481) [PDF 3.62 M](2022)
    Abstract:
    Contrast-enhanced ultrasound (CEUS) is a powerful diagnostic tool that enhances blood flow signals from tumor micro-vessels through the peripheral venous injection of ultrasound contrast agents. This enables clinical physicians to dynamically evaluate tumor angiogenesis in real-time. CEUS imaging is widely used for the diagnosis, postoperative evaluation, and treatment planning of multiple organs. In recent years, deep learning techniques have made considerable progress, offering new opportunities for the intelligent analysis of dynamic CEUS. Deep learning methods have widened the scope of clinical applications largely, improving its efficacy of diagnosis and treatment. However, similar to the traditional ultrasound imaging, CEUS is faced with the challenges of interference from speckle noise, respiratory motion, and low standardization, making the analysis of spatial-temporal information of dynamic perfusion become difficult. This paper systematically reviews recent research on the intelligent analysis of CEUS, covering clinical applications such as benign-malignant differentiation, malignant grading, therapeutic prediction, and the selection of diagnosis and treatment plans. We summarize the latest advances of radiomic and deep learning methods in the area of CEUS imaging analysis, and highlight the limitations of current research and future directions for development.
    37  Prediction of Pulse Wave for Target Organ Damage in Hypertension Based on Frequency-Domain Feature Maps
    Cai Shuchen Yang Jingdong WENG Wenhao Qi Chenhao YAO Minghui Yan Haixia
    2023, 38(4):898-914. DOI: 10.16337/j.1004-9037.2023.04.013
    [Abstract](543) [HTML](773) [PDF 4.95 M](921)
    Abstract:
    For less efficiency and low accuracy of predicting on hypertensive target organ damage, this paper proposes a prediction of hypertensive pulse wave based on mel frequency ceptral coefficient (MFCC)-based feature maps to accomplish the efficient and non-invasive diagnosis on target organ damage. For low accuracy of pulse-taking classification in temporal domain, pulse wave is transformed to the MFCC-based feature maps in frequency domain via replacing angular filter with Gaussian filter, an improved EfficientNet model, EfficientNetS is employed to enhance the ability of global feature extraction via adding the improved SiMAM attention mechanism. The clinical 608 cases of hypertension target organ damage concerning pulse-taking diagnosis are used. The evaluation indicators of five-fold cross-validation classification, i.e. F1 score, accuracy, precision, sensitivity, area under the curve (AUC), are 97.31%, 98.72%, 97.71%, 97.04%, 99.13%, respectively. Compared to the typical models, the proposed method has higher classification accuracy and generalization performance. In addition, this paper also studies the correlation between classification of pulse wave and its features, and analyzes the feature importance ranking in temporal domain and frequency domain of pulse-taking, which can help clinicians seek the occurrence mechanisms of hypertension caused by target organ damage, and find the effective measurements for timely prevention and treatment.
    38  Heart Sound Segmentation Based on Two-Stage Neural Network
    Feng Zhengwei Quan Haiyan
    2023, 38(4):849-859. DOI: 10.16337/j.1004-9037.2023.04.009
    [Abstract](619) [HTML](831) [PDF 1.65 M](1095)
    Abstract:
    Heart sound signal is an important signal for analyzing and diagnosing heart problems, and heart sound segmentation is an essential part before analyzing and processing it. By separating the heart sound segmentation task into two sub tasks of localization and recognition, this paper proposes a two-stage convolutional neural network, which is composed of localization network and discrimination network to complete the recognition and localization of heart sound signals respectively. First, the original signal is divided into frames through a sliding window, then the spectrum is obtained by short time Fourier transform, and then the Mel frequency spectral coefficient(MFSC) characteristics are obtained by Mel filter. The first localization model is input to judge whether it is a heart sound segment. If so, the discrimination neural network is input to identify the first heart sound and the second heart sound, so as to achieve heart sound segmentation. At last, multi frame voting results are used to reduce the misjudgment. At the same time, the spatial attention mechanism is introduced into the convolutional neural network. Experimental results show that this two-stage neural network model with attention mechanism has higher accuracy in heart sound segmentation tasks than a single convolutional neural network classification model, and also makes the model more simple and lightweight.
    39  Improved FCN Segmentation Method for Thyroid Nodules
    Zhang Yating Shuai Renjun Huang Daohong Zhao Chen Wu Menglin
    2023, 38(4):873-885. DOI: 10.16337/j.1004-9037.2023.04.011
    [Abstract](884) [HTML](673) [PDF 1.71 M](906)
    Abstract:
    In order to segment thyroid nodules more accurately, this paper proposes an improved fully convolutional network (FCN) segmentation model. Compared with FCN, the atrous spatial pyramid pooling (ASPP) module and the multi-layer feature transfer (FT) module are added. The decoder module in LinkNet model is used for up-sampling, and the VGG16 backbone network is used for feature extraction down-sampling. The experiment uses 17 413 ultrasound thyroid nodule images from Stanford AIMI shared data set for training, verification and testing, respectively. Experimental results show that compared with other segmentation models, the proposed model achieves 79.7%, 87.6% and 98.42% in mean intersection over union (mIoU), Dice similarity coefficient and F1 score respectively, achieving better segmentation effect and effectively improving the segmentation accuracy of thyroid nodules.
    40  Discriminative Domain Adaptation via Low-Rank Representation for Multi-site Autism Spectrum Disorder Identification
    LI Xizhi ZHU Lingyao WANG Mingliang
    2023, 38(4):886-897. DOI: 10.16337/j.1004-9037.2023.04.012
    [Abstract](674) [HTML](606) [PDF 1.88 M](1046)
    Abstract:
    The diagnosis of autism spectrum disorder (ASD) mainly relies on the patient’s medical history and clinical symptoms, and there is still a lack of objective evaluation indicators. Therefore, the discovery of disease-related biomarkers is essential for early identification and intervention. Although the multi-site brain imaging data have increased the sample size and improved the statistical power, which helps to improve the diagnostic performance of autism, the current research is often plagued by data heterogeneity. To address this issue, a discriminative domain adaption via low-rank representation (DDA-LRR) framework for multi-site ASD identification is proposed. Specifically, we first transfer both source and target data to a common subspace, where each source data can be represented by a combination of source samples such that the distribution differences can be well relieved. Then, we learn an orthogonal reconstruction matrix, which can preserve the main energy in the obtained low-dimensional embedding space and thus is appropriate for the subsequent learning tasks. Finally, to ensure the discriminative ability of the low-rank representation, we use the label information of the source data to integrate the classification loss into the training stage. An efficient optimization strategy based on the alternating direction method of multipliers method is developed to solve the proposed DDA-LRR method. Experimental results show that the proposed method can reduce the differences in data distributions of multiple sites, realize the effective transfer of knowledge, and improve the diagnosis performance of multi-site ASD effectively.
    41  Artificial Intelligence-Assisted Magnetic Resonance Imaging in Assessment of Neoadjuvant Chemotherapy for Breast Cancer: A Review
    LIU Kaiwen JIN Yingying WANG Shouju
    2024, 39(4):794-812. DOI: 10.16337/j.1004-9037.2024.04.003
    [Abstract](1422) [HTML](1308) [PDF 2.75 M](1368)
    Abstract:
    Neoadjuvant chemotherapy has become a standard treatment strategy for breast cancer, and magnetic resonance imaging (MRI) is the preferred imaging method for assessing the response of breast cancer to neoadjuvant chemotherapy. Although MRI can provide detailed information of tumor, including location, size, and microenvironment, the precise assessment of neoadjuvant chemotherapy of breast cancer suffers from the diverse changes in tumors present in MRI images. Artificial intelligence methods based on machine learning and deep learning have demonstrated the ability to recognize complex patterns in MRI data. Through clinical radiologic feature analysis, radiomics analysis, and habitat analysis, artificial intelligence technology has significantly enhanced the performance and efficiency of assessments for breast cancer neoadjuvant chemotherapy, aiding in the realization of personalized treatment strategies. This paper introduces the MRI data and performance indicators in assessing breast cancer neoadjuvant chemotherapy, summarizes the progress of artificial intelligence applications in this field, and discusses the current challenges and potential future research directions for artificial intelligence technology in practical applications.
    42  CI-WGAN: Integrating Clinical Indicators and WGAN for Generating Individualized Brain Functional Connectivity Networks in Autism Spectrum Disorder
    SUN Hailin YAN Jiadong ZHANG Rong KENDRICK Keith JIANG Xi
    2024, 39(4):813-826. DOI: 10.16337/j.1004-9037.2024.04.004
    [Abstract](746) [HTML](740) [PDF 1.77 M](719)
    Abstract:
    Brain functional connectivity (FC) networks serve as potential neuroimaging biomarkers for the auxiliary diagnosis and treatment of autism spectrum disorder (ASD). However, most existing models are merely based on neuroimaging data and neglect individual clinical indicators, leading to the loss of disorder-specific information. And, ASD is a spectrum disorder exhibiting significant individual differences in terms of clinical indicators. Therefore, these traditional generative models are limited in generating accurate individual FC of ASD that reflects specific clinical symptoms. To address this limitation, a novel clinical-indicator-aware Wasserstein generative adversarial network (CI-WGAN) is proposed to generate individual FC of ASD. The proposed model introduces an effective guidance mechanism based on individual clinical indicators to generate individualized FC networks. Extensive experiments are performed on ABIDE I dataset, one of the largest publicly available ASD brain imaging datasets. The results show that the generated FC of the proposed method achieves promising peak signal-to-noise ratio (PSNR) of 19.037, structural similarity (SSIM) of 0.236 and mean absolute error (MAE) of 0.178, showing satisfying improvements of 3%, 12% and 2% respectively compared to the traditional models. Additionally, the representational similarity analysis (RSA) are performed between the generated FC and two independent clinical indicators. The results show that the RSA values based on the proposed method increase by 0.1 and 3.7 times compared to those based on traditional models, demonstrating that the FC generated via the proposed CI-WGAN contains more individual symptom information of ASD. In summary, the proposed CI-WGAN model achieves high-quality generation of individual FC, and provides a powerful tool for the early diagnosis and personalized treatment of ASD.
    43  Liver Cancer Diagnosis Method Based on Multi-modal Ultrasound Contrast Learning
    YANG Yinkai WAN Peng SHI Hang XUE Haiyan SHAO Wei
    2024, 39(4):874-885. DOI: 10.16337/j.1004-9037.2024.04.008
    [Abstract](1081) [HTML](541) [PDF 3.18 M](960)
    Abstract:
    In recent years, liver cancer has become a disease that seriously threatens human health, and multi-modal ultrasound imaging is one of the important diagnostic tools for it. Similar to how clinicians use multi-modal ultrasound to diagnose liver cancer, using multi-modal fusion methods to integrate the image features of each ultrasound modality is expected to improve the accuracy of liver cancer diagnosis. However, the existing multi-modal fusion methods often isolate the feature information of each modality during the fusion process, failing to fully consider the intra-modal sample similarity and inter-modal semantic consistency, while ignoring modality uncertainty. Therefore, this paper proposes a liver cancer diagnosis method based on multi-modal ultrasound contrast learning, aiming to make full use of the feature information of each ultrasound modality to improve the diagnostic accuracy. Specifically, this method employs supervised contrastive learning to deeply explore modality features, capturing both the similarity information among samples within the modality and the semantic consistency information across different modalities. In addition, this method introduces a measure of modality uncertainty based on Subjective Logic, enabling dynamic fusion of modality information and exhibiting good robustness. Evaluation of multimodal ultrasound imaging shows that the proposed method achieves an 85.21% diagnostic accuracy, demonstrating performance improvement compared to other mainstream multimodal fusion methods.
    44  A Double-Decoding Model for Polyp Segmentation Based on Feature Fusion
    WU Gang QUAN Haiyan
    2024, 39(4):954-966. DOI: 10.16337/j.1004-9037.2024.04.015
    [Abstract](631) [HTML](713) [PDF 2.84 M](951)
    Abstract:
    In the early screening of colorectal cancer, diagnostic efficiency and accuracy can be improved by automated polyp detection and segmentation of colonoscopy images. Due to the complexity of internal environment of intestines and the limitation of image quality, automated polyp segmentation is still a challenging problem. Aiming at this problem, this paper proposes a dual-decoding model for polyp segmentation using Transformer and null convolution to achieve feature fusion (FTDC-Net). ResNet50 is used as an encoder in order to be able to better extract deep image features. The Transformer coding module is used, which has a self-attention mechanism to capture long distance dependencies between the inputs, and different dilated-convolutions are used in the model to expand the sensory field of the model to allow the model to capture a larger range of information in the colonoscopy image. The decoding part of the network model in this paper uses a dual-decoding structure, including an autoencoder branch that reconstructs the inputs and a coding branch for segmenting the results. The output of the autoencoder is used in the model to generate an attention map as an attention mechanism. This map will be used to guide the segmentation results. Experimental validation is carried out on the Kvasir-SEG and ETIS-LARIBPOLYPDB standard datasets, and experimental results show that FTDC-Net can effectively segment colon polyps, and achieves a high level of improvement in all evaluation metrics compared to the current mainstream polyp segmentation models.
    45  Feature Compression Analysis of Whole-Brain Functional Connection in Classification of Mild Cognitive Impairment
    MA Jia WU Haifeng LI Shunliang
    2024, 39(4):967-983. DOI: 10.16337/j.1004-9037.2024.04.016
    [Abstract](500) [HTML](663) [PDF 4.66 M](741)
    Abstract:
    The use of resting-state functional magnetic resonance imaging technology to obtain functional connection (FC) of brain regions is widely used in classification studies of mild cognitive impairment (MCI). However, the classification of whole-brain FC usually has the problems of information redundancy and feature dimension disaster. Therefore, a new method of “G-Lasso + feature compression” is proposed to solve the above problems. Firstly, the blind source separation technology is used to obtain the active signal time series of the whole brain functional brain region, and the FC sparse network is constructed by G-Lasso. Secondly, the sparse FC of MCI, normal subjects and all subjects on the group average is calculated, and the cluster Class 1—Class 3 center decision is performed in combination with the Euclidean distance to obtain the difference feature information between clusters. Finally, the sparse FC of each participant is expressed as a linear combination of the cluster center, and the compressed FC is obtained as the key feature to complete the classification. The results show that the proposed method obtains significant differences in inter-cluster features after Class decision and provides effective sign information. The classification accuracy of the key features obtained by further compressing (89.8%) is 5%—10% higher than that of the sparse method alone. The results show that in order to solve the problems of whole-brain FC, feature selection and dimensionality reduction need to be considered, but there are many uncertain factors, and “sparse + compression” can be appropriately combined.
    46  An Attention Mechanism-Based CNN-LSTM Framework for Lower Limb Knee Joint Angle Prediction
    TANG Lu YANG Xilin WANG Xiangrui HU Qianyuan ZHENG Hui
    2024, 39(4):996-1008. DOI: 10.16337/j.1004-9037.2024.04.018
    [Abstract](712) [HTML](828) [PDF 2.96 M](835)
    Abstract:
    Decoding knee motion intention is crucial for the wearable comfort in lower extremity exoskeleton robots. Patients with neurological disorders are often accompanied with lower limb movement disorders assessed by surface electromyography (sEMG) signals. To integrate the motion assessment and joint angle prediction for these patients, a novel CNN-LSTM framework based on the attention mechanism is proposed to predict the knee joint angle for three daily motions, i.e., horizontal walking, going uphill, and going up stairs, through 10-channel sEMG signals. The prediction error indicators, i.e., the root mean squared error (RMSE), the mean absolute error (MAE), and the coefficient of determination (R2) reach 2.74, 2.50, and 0.97, respectively, outperforming the traditional network. Furthermore, the ablation experiments show the three indicators have decreased by 20.47%, 34.36% and 6.59% on average, respectively. The proposed end-to-end prediction framework based on the attention mechanism can reach the highest prediction accuracy, providing a reference for the human-robot interaction scheme of the lower limb exoskeleton robot system.
    47  A Non-contact HRV Estimation Method Based on TVF-EMD
    MA Xiao LU Xiaoguang ZHANG Zhe SUO Chenhao YANG Lei
    2024, 39(4):1009-1019. DOI: 10.16337/j.1004-9037.2024.04.019
    [Abstract](695) [HTML](826) [PDF 1.91 M](727)
    Abstract:
    The physical health status of civil aviation personnel is an important factor affecting aviation safety, among which respiration and heart rate are extremely important indicators of health. To address the limitations and interference of contact or wearable measurement systems on personnel during working, linear frequency-modulated continuous wave (FMCW) radar can be used to achieve non-contact measurement. Since vital sign signals have the characteristics of time-varying and non-stationary, to solve the problem of mode aliasing in empirical mode decomposition (EMD) in signal decomposition, the time-varying filtering based on EMD (TVF-EMD) can adaptively adjust the local cutoff frequency of the signal, effectively improving the signal separation performance and solving the mode aliasing problem. By using the intrinsic mode functions (IMF) components decomposed by TVF-EMD to reconstruct the time-domain signal corresponding to the heartbeat, the frequency and inter-beat interval (IBI) of the heartbeat signal can be estimated, and further the relevant indicators of heart rate variability (HRV) can be estimated. Simulation experiments and actual measured data processing results show that TVF-EMD can effectively separate respiration and heartbeat signals from millimeter wave radar measurement signals. At the same time, a simulation analysis of the decomposition effects of TVF-EMD and EMD methods from the aspects of mode aliasing degree and signal separation performance has been conducted, and the results show that TVF-EMD can effectively solve the mode aliasing problem. Therefore, the TVF-EMD method can accurately and effectively extract vital sign information from millimeter wave radar measurement signals, provide accurate time-domain information for IBI estimation and HRV analysis, and has a broad application prospect.
    48  Opportunities and Challenges of Diffusion MRI in Traditional Chinese Medicine
    WU Ye HE Lanxiang ZHANG Xinyuan FU Yunhe LIU Xiaoming HE Jianzhong
    2024, 39(4):776-793. DOI: 10.16337/j.1004-9037.2024.04.002
    [Abstract](1209) [HTML](1445) [PDF 937.74 K](1140)
    Abstract:
    Diffusion magnetic resonance imaging (dMRI) represents an advanced medical imaging modality that yields intricate insights into tissue microstructure by assessing the diffusion of water molecules within biological tissues, which is progressively integrated into clinical practices for diagnosis and treatment. Notably, within traditional Chinese medicine (TCM), dMRI has demonstrated unique potential and significance, providing an empirical foundation for TCM’s “differentiation and treatment”. Its utility extends beyond precise disease diagnosis to encompass disease progression monitoring and treatment efficacy evaluation, aligning with TCM’s principles of “preventive treatment”and “individualized treatment”. Nonetheless, the assimilation of dMRI into TCM encounters notable challenges. This review article delves into the recent applications of dMRI within TCM, scrutinizing its prospects and constraints. By fostering interdisciplinary partnerships between medical and engineering disciplines, particularly in the realm of TCM-intelligent imaging technology, this study aims to propel the application and evolution of dMRI within TCM’s diagnostic and therapeutic domains.
    49  Graph Learning-Based Methods for Generating Missing Brain Networks and Multi-modal Fusion Diagnosis
    GONG Rongfang HUANG Linya ZHU Qi LI Shengrong
    2024, 39(4):843-862. DOI: 10.16337/j.1004-9037.2024.04.006
    [Abstract](994) [HTML](1123) [PDF 6.06 M](982)
    Abstract:
    The multi-modal brain network, which integrates the brain structural and functional networks, can effectively extract the complementary information from different modalities, significantly improving the diagnostic accuracy of neurological diseases such as epilepsy. However, due to the long acquisition time and high acquisition cost of multi-modal data collection, it often faces the problem of modality missingness in practical applications, leading to decreased diagnostic accuracy and generalization ability of the model. To address the issue of modality data completely missing, we propose a method based on graph learning methods and cycle-consistent generative adversarial networks, named Graph-CycleGAN method. This method captures feature information between different brain regions in the brain network by introducing graph neural networks, such as graph convolutional neural networks and graph attention mechanisms. Besides, it strengthens the feature extraction ability of the generative framework and realizes the mutual generation of brain structural network and functional network. In addition, to address the lack of diagnostic result-based evaluations for the quality of generated data, this paper proposes a classification model that integrates real and generated brain networks. Experimental results on the epilepsy dataset indicate that the proposed Graph-CycleGAN method can effectively realize the generation of missing brain network by utilizing the existing modality information.
    50  White Matter Fiber Tract Segmentation Method Based on T1-Weighted Imaging
    JIAO Ruike ZHANG Xiaofeng YE Chuyang
    2024, 39(4):863-873. DOI: 10.16337/j.1004-9037.2024.04.007
    [Abstract](738) [HTML](1131) [PDF 2.69 M](763)
    Abstract:
    White matter fiber tract segmentation methods provide crucial neural pathway reference information for brain connectivity analysis by identifying white matter tracts connecting distinct brain regions. Traditional segmentation methods predominantly depend on diffusion magnetic resonance imaging (dMRI), but the lengthy acquisition time of dMRI severely restricts its clinical applicability. To address this limitation, this paper introduces a white matter fiber tract segmentation approach based on T1-weighted imaging. This method leverages the structural tensor of T1-weighted images to infer potential fiber orientations, thereby enhancing the segmentation accuracy of white matter tracts. Moreover, the proposed method incorporates privileged information from dMRI during model training to guide the learning process, thus improving the performance of the white matter tract segmentation model, and the segmentation of challenging tracts is improved significantly, with a 5% improvement in Dice score for the left fornix (FX_left) and a 6% improvement in Dice score for the right fornix (FX_right). This approach mitigates the limitations of conducting neural pathway analysis in the absence of dMRI, broadening the application scope of neural pathway analysis.
    51  Diagnosis of Brain Diseases Based on Multi-scale Residual Fusion Graph Convolutional Networks
    HAO Xiaoke HE Zilong LU Xinchu MA Mingming LIU Shiyu
    2024, 39(4):827-842. DOI: 10.16337/j.1004-9037.2024.04.005
    [Abstract](957) [HTML](803) [PDF 2.38 M](951)
    Abstract:
    In recent years, functional brain networks have been used in the diagnosis of brain disorders such as autism spectrum disorder (ASD). Existing studies have shown that combining resting-state functional magnetic resonance imaging (rs-fMRI) data as well as non-imaging information to form a population graph, and then learning and classifying the data by using graph neural network (GNN) is very effective in the diagnosis of ASD. However, most studies still face two challenges: First, the construction of functional connectivity matrices using methods such as Pearson correlation coefficient cannot effectively identify and analyze localized brain regions and biomarkers associated with diseases; second, it is difficult to efficiently learn multi-scale information about node features in population graphs on GNN. To solve these problems, a multi-scale residual fusion graph convolutional networks (MSRF-GCN) based on the attention mechanism is proposed. The algorithm efficiently localizes and identifies brain regions useful for diagnosis by designing a functional connection generator to extract temporally relevant features with remote dependencies. Meanwhile, the multi-scale information in the population graph is learned by designing a multi-scale residual fusion algorithm. The Edge Sparse strategy is also introduced to increase the sparsity of node connections by randomly discarding edges in the initial population graph, which in turn reduces the risk of overfitting during training. The effectiveness of MSRF-GCN in the diagnosis of ASD is demonstrated by the results of experiments performed on the autism brain imaging data exchange (ABIDE) program.
    52  Signal Acquisition and Processing Technology of Flexible Sensor Intelligent Pulse Diagnosis System
    WANG Shidan XU Hong FU Hongbo DING Fuyang WU Daming
    2024, 39(1):236-246. DOI: 10.16337/j.1004-9037.2024.01.021
    [Abstract](1207) [HTML](893) [PDF 2.70 M](1380)
    Abstract:
    The development and application of pulse diagnostic instruments provide an objective basis for the intelligent diagnosis of traditional Chinese medicine. However, the existing pulse diagnostic instruments do not consider the influence of the collection region (Cun, Guan, Chi) and pressure (Fu, Zhong, Chen) on the diagnostic results, and there is still room for the improvement of the diagnostic accuracy. In order to recognize pulse condition more accurately, this paper presents an intelligent pulse diagnosis system based on flexible sensors and the corresponding pulse signal processing method. By installing three array flexible sensors at the collection region of Cun, Guan and Chi and setting different pressure thresholds of Fu, Zhong and Chen, multiple pulse signals are obtained. Signal features are then extracted, and multi-channel features are integrated based on multi-set canonical correlations analysis (MCCA) to get more pulse information. Experimental results show that the proposed method can further improve the accuracy of pulse condition classification in four typical pulse types. The multi-point pulse condition induction designed in this paper based on two aspects of region and pressure can simulate and restore the real Chinese medicine diagnosis process and help to extract real pulse signals, providing a theoretical basis and reference value for the subsequent research and development of intelligent pulse diagnosis instruments based on flexible sensors.
    53  Polyp Segmentation Network Based on Multiple Attention and schatten-p Norm
    LI Su LIU Guoqi LIU Dong ZHAO Manqi
    2024, 39(1):223-235. DOI: 10.16337/j.1004-9037.2024.01.020
    [Abstract](701) [HTML](520) [PDF 4.76 M](1066)
    Abstract:
    Automatic and accurate polyp localization and segmentation methods can detect polyps in a timely manner in the early stage of colorectal cancer lesions, greatly reducing the risk of cancer transformation. The encoder-decoder architecture, as the most mainstream network structure in polyp segmentation in recent years, has been greatly improved, such as improving the model’s ability to capture global contextual and local features, and using deep features to guide shallow decoding. However, polyps vary in shape and size, and due to their convolutional nature, they are prone to getting too caught up in local information mining and losing remote information dependencies during encoding. Some polyp images also have low contrast and complex spatial characteristics, which makes it easy to confuse the polyp with the background. Based on this, this paper proposes a polyp segmentation network based on multiple attention and schatten-p norm(MASNet). Among them, the axial multiple attention module utilizes axial attention to supplement remote contextual relationships in the image, while also paying attention to boundary and background information to achieve feature complementarity. It enhances the capture of local detail features while paying attention to global features. By utilizing the correlation between matrix singular values and matrix implicit information, the schatten-p norm is introduced as a constraint to analyze the data from a matrix perspective and assist the model in distinguishing foreground and background. By setting up a large number of experiments, the effectiveness of the proposed method is proven, and MASNet achieves the best segmentation results by comparing different advanced methods on the Kvasir-SEG dataset.
    54  Detection of VR-induced Motion Sickness Levels Based on EEG Rhythm Energy and Fuzzy Entropy
    ZHOU Zhanfeng HUA Chengcheng CHAI Lining YAN Ying LIU Jia FU Rongrong
    2024, 39(2):490-500. DOI: 10.16337/j.1004-9037.2024.02.021
    [Abstract](874) [HTML](712) [PDF 2.33 M](1127)
    Abstract:
    Motion sickness has been a key factor affecting the virtual reality user experience and limiting the growth of the virtual reality industry. To address this issue, this paper investigates the effects of virtual reality motion sickness on neural activity in the brain and uses electroencephalogram (EEG) features to detect levels of motion sickness. To obtain features that can measure the level of vertigo, this paper records the EEG signals of subjects before and during the experience of the vertigo test scene, calculates the rhythm energy and fuzzy entropy, uses statistical analysis for feature selection, and finally classifies and verifies the validity of the features. The results show that the energy in the θ and α bands of CP4 and Oz and the energy in the β and γ bands of C4 are significantly reduced when subjects develop motion sickness (p<0.01); in terms of fuzzy entropy, there are significantly higher values of FC4 and Cz fuzzy entropy in the δ band (p<0.000 1) and significantly lower values of O1 fuzzy entropy in the β band (p< 0.000 1). Compared to linear discriminant analysis (LDA), logistic regression (LR) and support vector machine (SVM), K nearest neighbor (KNN) shows better classification results with 89% and 91% classification accuracy on rhythm energy and fuzzy entropy, respectively. This study shows that EEG rhythm energy and fuzzy entropy are expected to be effective indicators for motion sickness level detection, providing an objective basis for studying the causes of virtual reality motion sickness and mitigation options.
    55  Human Activity Recognition Based on DWT-VMD Hybrid Signal Decomposition
    CHEN Jinyao LI Ruixiang WANG Xing SHI Weibin
    2024, 39(3):736-749. DOI: 10.16337/j.1004-9037.2024.03.020
    [Abstract](682) [HTML](525) [PDF 2.04 M](981)
    Abstract:
    In the application environment of human activity recognition, it is still challenging to extract sufficiently reliable features from the original sensor data. The hybrid signal decomposition technology of discrete wavelet transform (DWT) and variational mode decomposition (VMD) is used to extract the salient feature vectors from the original sensor signals to identify various human activities. Using a variety of machine learning classification algorithms, such as K-nearest neighbor, random forest, LightGBM and XGBoost, the effectiveness of the proposed algorithm is tested on UCI-HAR and SCUT-NAA data sets. Experimental results show that by using the hybrid signal decomposition technology, the recognition accuracy of all classification algorithms has been improved, with the maximum classification accuracy of 98.91% for UCI-HAR dataset, which has improved by 1.79% compared to not joining the decomposition algorithm. The maximum classification accuracy of SCUT-NAA dataset reaches 95.52%, which has improved by 3.2%. In human activity recognition, through the use of DWT-VMD hybrid signal decomposition technique, more effective features can be extracted from the original signal and the recognition accuracy can be further improved, showing the certain practical value of the technique.
    56  Research on EEG Mental Arithmetic Classification Based on Amplitude Permutation Entropy for Global Graph
    WANG Shenglin QIU Xiangkai WANG Ruqing HUANG Liya
    2024, 39(3):724-735. DOI: 10.16337/j.1004-9037.2024.03.019
    [Abstract](606) [HTML](576) [PDF 2.55 M](792)
    Abstract:
    Mental arithmetic is a skill commonly used in daily life. It involves various cognitive processing processes that cause changes in brain activity, so research on its electroencephalogram (EEG) can help improve the level of research on cognitive tasks. Amplitude permutation entropy for global graph (APEGG) is proposed to apply to the study of EEG mental arithmetic, to make up that the traditional permutation entropy for graph (PEG) can not fully reflect changes of the neighboring nodes around brain network nodes, and overcome the problem of insensitive EEG signal amplitude. At first, the EEG brain network is constructed using the phase locking value (PLV), the synchronization and correlation between multi-lead EEG signals are analyzed, and then the amplitude permutation entropy for global graph of the brain network at different frequency bands is calculated. Finally, support vector machine (SVM) is used for classification. EEG in public data sets is used for simulation, and the mental state of different frequency bands and resting state entropy scatterplot are analyzed, showing a larger difference. The classification results show better results compared with other algorithms.
    57  Epilepsy Identification Method Based on Multi-modal Multi-grained Fusion Network
    Qi Xiaoyu Ding Weiping Ju Hengrong Cheng Xueyun Huang Jiashuang
    2024, 39(3):710-723. DOI: 10.16337/j.1004-9037.2024.03.018
    [Abstract](1230) [HTML](628) [PDF 2.10 M](1123)
    Abstract:
    Structural brain network (SC) and functional brain network (FC) can reflect the changes in brain structure information caused by epilepsy from different perspectives. Currently, the fusion of two types of brain network information for auxiliary diagnosis of epilepsy has become one of the important studies in the field. However, common fusion models only fuse the information of the two types of brain networks at a single granularity, ignoring the multi-grained attribute of brain networks. This paper proposes an epilepsy identification method based on multi-modal multi-grained fusion network (MMFN), which integrates the features of the multi-modal brain network from global and local granularities to take full advantage of multi-modal brain network information. Specifically, at the local granularity, two modules (i.e., edge features fusion module and node features fusion module) are designed to reconstruct the feature maps of edge layer and node layer of two types of brain network, so that these two modes can learn features interactively. At the global granularity, a multimodal decomposition bilinear pooling module is designed to learn the joint representation of the two types of brain networks. Compared to current methods, experimental results show that the proposed method can improve the accuracy of epilepsy recognition significantly and assist doctors in the diagnosis of epilepsy.
    58  Medical Image Segmentation Method with Integrated Self-attention
    ZHAO Fan ZHANG Xuedian
    2024, 39(5):1240-1250. DOI: 10.16337/j.1004-9037.2024.05.015
    [Abstract](1064) [HTML](889) [PDF 2.15 M](972)
    Abstract:
    Aiming at the limitations of the UNet architecture in capturing local features and preserving edge details in medical image segmentation, this paper presents an improved UNet algorithm integrating self-attention mechanism. The proposed algorithm is based on traditional encoder-decoder structure, incorporating a multi-scale convolution (MSC) block for multi-granularity feature extraction, and a convolution mixer attention (CMA) block, which combines the modeling of local features by convolutional layers with global contextual modeling by self-attention layers. In the segmentation task of BUSI and DDTI datasets, compared with the existing classical network architecture, a large number of experimental data verify the excellent segmentation ability of the model. Additionally, Statistical data analysis and ablation studies further confirm the effectiveness of the MSC and CMA modules. This research provides an innovative approach for high-precision medical image segmentation, holding significant theoretical and practical implications for enhancing the accuracy and efficiency of medical diagnoses.
    59  Convolutional Transformer EEG Emotion Recognition Model Based on Multi- domain Information Fusion
    ZHANG Xuejun WANG Tianchen WANG Zetian
    2024, 39(6):1543-1552. DOI: 10.16337/j.1004-9037.2024.06.021
    [Abstract](887) [HTML](995) [PDF 1.93 M](696)
    Abstract:
    Current emotion recognition methods for eletroencephalogram(EEG) signals seldom fuse spatial, temporal and frequency information, and most methods can only extract local EEG features, resulting in limitations in global information correlation. The article proposes an EEG emotion recognition method based on 3D-CNN-Transformer mechanism (3D-CTM) model with multi-domain information fusion. The method first designs a 3D feature structure based on the characteristics of EEG signals, simultaneously fusing the spatial, temporal, and frequency information of EEG signals. Then a convolutional neural network module is used to learn the deep features for multi-domain information fusion, and then the Transformer self-attention module is connected to extract the global correlations within the feature information. Finally, the global average pooling is used to integrate the feature information for classification. Experimental results show that the 3D-CTM model achieves an average accuracy of 96.36% in the SEED dataset for triple classification and 87.44% in the SEED-Ⅳ dataset for quadruple classification, which effectively improves the emotion recognition accuracy.
    60  Multi-branch Collaborative Segmentation Model for Multi-modal Cardiac Imaging
    XIAO Rui SHAO Wei
    2025, 40(4):887-900. DOI: 10.16337/j.1004-9037.2025.04.004
    [Abstract](375) [HTML](495) [PDF 2.50 M](587)
    Abstract:
    Precise structural segmentation of the heart is important for the adjunctive diagnosis of cardiovascular disease and accurate preoperative evaluation. There are significant differences between images of different modalities in terms of spatial distribution and semantic expression, but existing methods mostly use single-branch network structures, which are unable to fully integrate multi-modal information and lack generalization capabilities in multi-modal tasks. To address this problem, this paper proposes a multi-branch collaborative segmentation network, i.e. multi-modal collaborative network (MCNet), which fuses the state space model Mamba with the convolutional model. The network is mainly composed of three modules: A dual-branch feature extractor based on Mamba and convolutional neural networks, a dynamic feature fusion module, and a Mamba decoder. The dual branches of the feature extractor focus on extracting global semantic and local detail features, respectively, and the dynamic feature fusion module dynamically adjusts the weights of multiple fusion paths according to the image, thus realizing dynamic feature integration in different branches. The proposed method is fully experimented on the MRI dataset ACDC of the heart and the ultrasound dataset CAMUS. Experimental results show that the proposed method, through a dynamic feature fusion module based on the mixture of experts (MoE) mechanism, dynamically adjusts the fusion weights of Mamba global features and CNN local features. In the ACDC dataset with clear boundaries, the average Dice and intersection over union (IoU) values reach 0.845 and 0.779, respectively. In the CAMUS dataset with blurred boundaries, the average Dice and IoU values reach 0.883 and 0.796, respectively, both of which outperform current mainstream methods. Additionally, ablation experiments further validate the effectiveness of each module. MCNet uses the MoE mechanism to dynamically adjust the fusion weights between global and local features in real time, enhancing structural detail integrity while maintaining global perception, thereby providing an efficient and robust solution for multi-modal cardiac image segmentation.
    61  Heart Sound Classification Using Bi-LSTM and Self-attention Mechanism
    LU Guanming LI Qijian LU Junhe QI Jirong ZHAO Yuhang WANG Yang WEI Jinsheng
    2025, 40(2):456-468. DOI: 10.16337/j.1004-9037.2025.02.014
    [Abstract](495) [HTML](393) [PDF 1.48 M](434)
    Abstract:
    Heart sound auscultation is an effective diagnostic method for early screening of heart disease. In order to improve the performance of abnormal heart sound detection, this paper proposes a heart sound classification algorithm based on bi-directional long short-term memory (Bi-LSTM) network and self-attention mechanism (SA). Firstly, the heart sound signal is partitioned into frames, and the Mel-frequency cepstral coefficients (MFCC) features are extracted from each frame of the heart sound signal. Next, the MFCC feature sequence is input into the Bi-LSTM network to extract the temporal contextual features of the heart sound signals. Then, the weights of the features output from the Bi-LSTM network at each time step are dynamically adjusted through self-attention mechanism, and more discriminative heart sound features that are conducive to classification are obtained. Finally, the Softmax classifier is used to classify normal/abnormal heart sounds. The proposed algorithm is evaluated using 10-fold cross-validation on the heart sound dataset provided by PhysioNet/CinC Challenge 2016, and achieves sensitivity of 0.942 5, specificity of 0.943 7, accuracy of 0.836 7, F1 score of 0.886 5, and accuracy of 0.943 4, respectively, which are superior to typical comparative algorithms. Experimental results show that the proposed algorithm can effectively detect abnormal heart sounds without the need for heart sound segmentation, and has potential clinical application prospects.
    62  Operator Distraction Detection for UAV Ground Monitoring Missions Based on Uncalibrated Eye Tracker
    XU Tianze SUN Qianru ZHANG Daoqiang CHEN Fang
    2025, 40(4):1055-1064. DOI: 10.16337/j.1004-9037.2025.04.018
    [Abstract](218) [HTML](242) [PDF 3.45 M](505)
    Abstract:
    In unmanned aerial vehicle(UAV) ground monitoring tasks, operators often need to be stuck in a long monotonous wait, which is easy to make mistakes due to distraction. This paper analyzes the effects of calibration on eye movement signals and attempts to evaluate operator distraction without calibration using an eye tracker. Firstly, the collaborative search and supervision task of multiple UAVs is simulated, and the eye movement data set containing 22 subjects is constructed. Then, an eye movement velocity vector time sequence diagram method independent of specific coordinate position is proposed to visualize and qualitatively analyze the uncalibrated eye movement signals, and then eye movement behavior detection is carried out based on double-mean clustering. Finally, the feasibility of using uncalibrated eye tracker for distraction state detection is preliminarily verified by correlation analysis and classification verification on common classifiers.
    63  High-Resolution Diffusion Magnetic Resonance Imaging Techniques in Ex-vivo Mouse Brain
    REN Baoxing FENG Yanqiu ZHANG Xinyuan
    2025, 40(4):901-911. DOI: 10.16337/j.1004-9037.2025.04.005
    [Abstract](308) [HTML](348) [PDF 3.57 M](473)
    Abstract:
    In ex-vivo high-resolution diffusion magnetic resonance studies, the conventional diffusion-weighted spin-echo pulse (DWI-SE) sequence is difficult to satisfy the large sample requirement due to the long scan time. Multi-shot diffusion-weighted echo-planar imaging (MS-DWI-EPI) sequence, which combines echo-planar imaging (EPI) readout and k-space segmented acquisition, not only significantly improves the scanning efficiency, but also effectively reduces the common image aberration and distortion problems of single-shot EPI. However, the microstructure resolution ability of MS-DWI-EPI in ex-vivo samples still lacks systematic validation. In this study, we perform high-resolution diffusion imaging of ex-vivo mouse brains using 3D DWI-SE sequence and 3D MS-DWI-EPI sequence, and evaluate the differences between these two sequences in signal-to-noise ratio, diffusion tensor imaging (DTI) parameter estimation, and tractography performance. Experimental results show that the scanning time of the MS-DWI-EPI sequence is nearly 50% shorter while the signal-to-noise ratio of its raw b0 images is about three times higher than the DWI-SE for the same spatial and angular resolution of acquisition. In critical anatomical regions such as the corpus callosum and hippocampus, MS-DWI-EPI not only enhances the structural contrast of DTI images, but also improves the tractography. The sequence achieves a good balance between imaging efficiency and quality, providing a more efficient diffusion-weighted imaging protocol for high-throughput microstructural studies.
    64  A Parameter-Sharing Multi-feature Map Interaction Model for EEG Classification
    BI Yingzhou LIU Shanrui HUO Leigang GAN Qiujing LI Yongyu
    2025, 40(4):950-961. DOI: 10.16337/j.1004-9037.2025.04.009
    [Abstract](301) [HTML](327) [PDF 1.50 M](538)
    Abstract:
    Electroencephalography (EEG) signal classification plays a crucial role in emotion recognition and brain-computer interface (BCI) applications. This paper proposes a parameter-sharing cross-map token attention (CMTA) model for intra- and inter-feature map interaction. Firstly, a spatial-temporal convolutional neural network (STCNN) is used to process EEG data, generating multiple EEG feature maps. Each feature map is treated as a token and fed into a parameter-sharing multi-modal module MT, which integrates a multi-layer perceptron (MLP) and a Transformer. The MLP captures intra-feature map interactions, while the Transformer enables information exchange between feature maps, thereby extracting richer features. Finally, an adaptive classifier (Adapt-Classifier) consisting of one-dimensional adaptive pooling and a fully connected layer is used to perform EEG classification. Experimental results show that the proposed method achieves a classification accuracy of 98.86% and a Kappa value of 0.982 9 on the SEED dataset for emotion recognition, an accuracy of 81.20% and a Kappa value of 0.748 4 on the BCI Competition IV Dataset 2a for motor imagery classification, and an accuracy of 86.55% and a Kappa value of 0.735 2 on the BCI Competition IV Dataset 2b. These results demonstrate the superior performance of the proposed method in EEG classification tasks and highlight its broad applicability across different EEG datasets.
    65  Incomplete Multimodal Brain Tumor Segmentation Method Based on the Combination of U-Net and Transformer
    TANG Zhanjun JIAN Hong WANG Jian
    2025, 40(4):934-949. DOI: 10.16337/j.1004-9037.2025.04.008
    [Abstract](372) [HTML](403) [PDF 4.07 M](567)
    Abstract:
    Given inherent variations among patients, discrepancies in imaging protocols, and potential data corruption, existing brain tumor segmentation methods based on magnetic resonance imaging (MRI) are often challenged by the issue of missing modality data, resulting in low segmentation accuracy. To address this, an innovative incomplete multimodal brain tumor segmentation method based on the combination of U-Net and Transformer (IM TransNet) is proposed. Firstly, a modality-specific encoder is developed for four distinct MRI modalities to enhance the model’s ability to capture unique characteristics of each modality. Secondly, a dual-attention Transformer module is embedded within the U-Net to mitigate the issue of incomplete information arising from missing modalities, thus alleviating the limitations imposed by long-range context interactions and spatial dependencies within the U-Net framework. Additionally, a skip-cross attention mechanism is incorporated into the U-Net’s skip connections to dynamically focus on features from various hierarchical levels and modalities, effectively facilitating feature fusion and reconstruction even in the presence of missing modalities. Furthermore, an auxiliary decoding module is devised to counteract the training imbalance induced by missing modalities, ensuring that the model can consistently and effectively segment brain tumors across diverse subsets of incomplete modalities. Finally, the model’s performance is validated on the publicly accessible BRATS dataset. Experimental results indicate that the proposed model attains average Dice scores of 63.19%, 76.42%, and 86.16% for enhancing tumor, tumor core, and whole tumor, respectively, highlighting its superiority and robustness in handling incomplete multimodal data. This approach offers a viable technical solution for accurate, efficient, and reliable brain tumor segmentation in clinical practice.
    66  A Review of Machine Learning for Brain Imaging Genomic Analysis
    WANG Meiling LIU Qingshan ZHANG Daoqiang
    2025, 40(4):869-886. DOI: 10.16337/j.1004-9037.2025.04.003
    [Abstract](415) [HTML](526) [PDF 2.35 M](628)
    Abstract:
    Brain imaging genomics is a burgeoning domain within data science, where an integrated analytical approach is applied to brain imaging and genomics data, frequently in conjunction with other biomarker, clinical, and environmental datasets. This strategy is employed to glean fresh insights into the phenotypic, genetic, and molecular features of the brain, along with their effects on both typical and atypical brain function and behavior. In light of the escalating significance of machine learning in biomedicine and the swiftly expanding corpus of literature in brain imaging genomics, this paper presents a current and exhaustive review of machine learning methodologies tailored for brain imaging genomics. Firstly, the related background and fundamental work in imaging genomics are reviewed. Then, we summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning and present methods for joint association analysis and outcome prediction. Finally, this paper discusses some prospects for future work.
    67  Emotional EEG Recognition Using Spatial Connectivity Features and Residual Convolutional Neural Network
    ZHANG Xuejun FU Congwei
    2025, 40(4):1046-1054. DOI: 10.16337/j.1004-9037.2025.04.017
    [Abstract](312) [HTML](275) [PDF 2.09 M](538)
    Abstract:
    As an objective and direct source of information, electroencephalogram (EEG) is widely used in the task of emotion recognition. In order to extract the information implicit in the spatial connectivity features of EEG signals, this paper proposes an emotion recognition method based on the spatial connectivity features and residual convolutional neural network (SCF-RCNN) model. In this method, Pearson correlation coefficient (PCC), phase-locked value (PLV) and mutual information (MI) are extracted from the preprocessed EEG signals as spatial connectivity features, and a convolutional neural network model containing two residual modules is used to extract emotional information. Experimental results on the SEED dataset show that the connection matrix constructed by PLV is more closely related to EEG emotion, with an average accuracy of 93.38% and a standard deviation of 3.35%. Compared with traditional algorithms, SCF-RCNN performs better in classification tasks in the field of emotion recognition, showing its important application potential in the field of emotion recognition.
    68  Lightweight Human Pose Estimation Algorithm Based on Partial Channel Encoding
    XU Xinzhi HE Hong
    2025, 40(6):1625-1636. DOI: 10.16337/j.1004-9037.2025.06.019
    [Abstract](164) [HTML](122) [PDF 3.35 M](416)
    Abstract:
    Aiming at the problems of high computational complexity and large number of parameters in the current pose estimation model, this paper proposes a lightweight pose estimation algorithm. Firstly, the partial channel encoding (PCE) module is introduced in the feature extraction process, and the local and global features of the image are extracted respectively by combining the advantages of convolutional neural network and visual encoder. Then, the weighted feature fusion is introduced in the process of multi-scale feature fusion to enhance the multi-scale feature fusion ability of the model and avoid the problem of reduced accuracy caused by model lightweight. Then, in the process of regression prediction, the detection head of the human detection and classification parts is shared to improve the recognition efficiency of the model in the pose estimation task. Experimental results show that compared with the basic model, the proposed model reduces the number of parameters by 27% and the amount of computation by 18%, and increases the accuracy by 0.2%. It not only ensures the accuracy of recognition, but also realizes the lightweight of the detection algorithm, providing an effective means to achieve real-time accurate pose estimation.
    69  EEG-TCNet for Motor Imagery Classification Based on Nonnegative Matrix Factorization
    ZHANG Xuejun SHI Baoming
    2025, 40(5):1361-1370. DOI: 10.16337/j.1004-9037.2025.05.020
    [Abstract](197) [HTML](170) [PDF 32.37 K](564)
    Abstract:
    In response to the limitations of deep learning approaches in motor imagery classification using electroencephalogram (EEG) signals, such as the failure to explore inter-channel correlations and fully exploit frequency, temporal, and spatial information, this study proposes a classification method named NTEEGNet, which combines nonnegative matrix factorization (NMF) with temporal convolutional network (TCN) and one compacted convolutional neural network named EEGNet to enhance the performance of motor imagery classification with a relatively small number of parameters. The NMF component of the model effectively extracts channel features and fully utilizes frequency, temporal, and spatial information. Additionally,the network’s receptive field increases exponentially under the action of TCN, leading to stronger feature extraction capabilities with fewer parameters. Experimental results on the BCI Competition Ⅳ 2a dataset demonstrate that NTEEGNet can achieve an impressive classification accuracy of 83.99%, improved by 6.64% on the basis of EEG-TCNet.
    70  Alzheimer’s Disease Classification Based on 3D Multi-modal Convolutional Network and Cross-Modal Feature Integration
    ZHU Houyuan ZHENG Lele SHANG Hao ZANG Xuefeng WU Shaoqi ZHOU Guangchao SUN Jiande QIAO Jianping
    2025, 40(4):912-921. DOI: 10.16337/j.1004-9037.2025.04.006
    [Abstract](371) [HTML](362) [PDF 1.51 M](551)
    Abstract:
    Multi-modal neuroimaging technology provides crucial technical support for the early and precise diagnosis of Alzheimer’s disease (AD). However, due to the inherent heterogeneity in imaging principles and feature representations across different neuroimaging modalities, the fusion of inter-modal information poses significant challenges. To address this issue, this study proposes a multi-modal fusion network (MFN) based on a 3D ResNet architecture for the early auxiliary diagnosis of AD. The proposed method first employs a 3D ResNet to separately extract feature representations from T1- and T2-weighted magnetic resonance images. Subsequently, an innovative cross-modal feature integration module (CFIM) is designed to overcome the limitations of direct concatenation. CFIM adopts a hierarchical fusion strategy, consisting of global information fusion module, local feature learning module and key factor module. Finally, the fused multimodal features are fed into a fully connected neural network for classification. Compared to early concatenation (fixed-weight fusion) and late fusion (shallow aggregation), this strategy more effectively identifies disease-relevant diagnostic features. Experiments conducted on the Alzheimer’s disease neuroimaging initiative (ADNI) database demonstrate that the proposed method achieves higher accuracy and superior performance in AD classification tasks compared to existing approaches. Ablation studies further validate the effectiveness of each module, offering new technical insights for multi-modal neuroimaging analysis.
    71  Multi-modal Medical Entity Recognition Based on Multi-scale Attention and Graph Neural Networks
    HAN Pu LIU Senling CHEN Wenqi
    2025, 40(4):922-933. DOI: 10.16337/j.1004-9037.2025.04.007
    [Abstract](310) [HTML](293) [PDF 1.38 M](490)
    Abstract:
    With the rapid development of information technology, multi-modal data such as Chinese texts and images in the medical and health field has shown explosive growth. Multi-modal medical entity recognition (MMER) is a key step in multi-modal information extraction, and has attracted great attention recently. Aiming at the problems of image detail loss and insufficient text semantic understanding in multi-modal medical entity recognition tasks, this paper proposes a novel MMER model based on multi-scale attention and dependency parsing graph convolution(MADPG). This model introduces a multi-scale attention mechanism based on ResNet to collaborate to extract visual features fused with different spatial scales and to reduce the loss of important details of medical images. Thus the image feature representation and complementing text semantic information are enhanced. Then, the dependency syntactic structure is used to construct the graph neural network to capture the complex grammatical dependencies between words in medical texts, so as to enrich the semantic expression of texts and promote the deep integration of image text features. Experiments show that the F1 value of the proposed model reaches 95.12% on the multi-modal Chinese medical data set, and the performance of the proposed model is significantly improved compared with the mainstream single- and multi-modal entity recognition models.
    72  Segmentation Methods for Diffusion Magnetic Resonance Imaging Tractography: A Survey
    ZHANG Wei LI Yijie WU Ye CHEN Huafu ZHANG Fan
    2025, 40(4):846-868. DOI: 10.16337/j.1004-9037.2025.04.002
    [Abstract](392) [HTML](621) [PDF 1.81 M](712)
    Abstract:
    Diffusion magnetic resonance imaging (dMRI), as an advanced medical imaging technique, enables the reconstruction of white matter connectivity in the living brain at the macroscopic level. This technology provides an important tool for the quantitative description of brain structural connectivity and allows for quantitative analysis using connectivity or microstructural indices. Over the past two decades, the use of dMRI tractography to study brain connectivity has become a major direction in neuroimaging research. Tract segmentation is key to defining different quantitative regions in the analysis of brain connectivity. It enables the identification of white matter pathways that are meaningful for quantifying brain structural connections and supports quantitative comparisons of white matter pathways across subjects. This paper reviews tract segmentation methods and categorizes them into two major types based on their technical approaches: One type targets specific anatomical fiber bundles, focusing on tracts with clearly defined structures (such as the arcuate fasciculus and corticospinal tract), and is suitable for task-oriented analysis and clinical navigation; the other type involves whole-brain tract segmentation methods, emphasizing data-driven or atlas-guided structural parcellation for the construction of large-scale structural connectivity networks and the implementation of whole-brain hierarchical analyses. In addition, this paper discusses the trade-offs of various methods in terms of applicability, accuracy, reproducibility, and computational cost. Although automated segmentation techniques have made significant progress in recent years, current methods still struggle to balance accuracy, generalizability and efficiency, and challenges remain in anatomical consistency, methodological standardization, and result interpretability. Data-driven deep learning methods have been rapidly developing in the field of tract segmentation, showing promising performance and holding potential for significant breakthroughs in the aforementioned areas.
    73  A Review of Development and Future Directions of Medical Foundation Models
    QIAN Bo LI Fujiang ZHENG Changle ZHANG Daoqiang
    2025, 40(3):562-584. DOI: 10.16337/j.1004-9037.2025.03.002
    [Abstract](1275) [HTML](1327) [PDF 4.44 M](741)
    Abstract:
    Medical foundation models represent a significant application of large-scale pre-trained model technology in the healthcare domain and have become a key research focus in intelligent medical assistance. By leveraging pretraining on vast amounts of medical data, these models exhibit critical capabilities such as cross-task transfer, multimodal understanding, and complex reasoning, overcoming several limitations of traditional neural networks in medical applications. With these capabilities, medical foundation models are reshaping the implementation of core tasks such as assisted diagnosis, clinical report generation, and medical image analysis. They hold profound implications for achieving general intelligence in healthcare. Based on this, this paper provides a comprehensive review of the current state and future trends of medical foundation models. First, it reviews the development of medical AI models in the context of rapid advancements in artificial intelligence. Then, it highlights research progress of large models in medical subfields such as pathology, ophthalmology, and neurological disorders. Finally, it discusses the challenges currently faced by medical foundation models and explores their future development directions.