ZHANG Wei , LI Yijie , WU Ye , CHEN Huafu , ZHANG Fan
2025, 40(4):846-868. DOI: 10.16337/j.1004-9037.2025.04.002
Abstract:Diffusion magnetic resonance imaging (dMRI), as an advanced medical imaging technique, enables the reconstruction of white matter connectivity in the living brain at the macroscopic level. This technology provides an important tool for the quantitative description of brain structural connectivity and allows for quantitative analysis using connectivity or microstructural indices. Over the past two decades, the use of dMRI tractography to study brain connectivity has become a major direction in neuroimaging research. Tract segmentation is key to defining different quantitative regions in the analysis of brain connectivity. It enables the identification of white matter pathways that are meaningful for quantifying brain structural connections and supports quantitative comparisons of white matter pathways across subjects. This paper reviews tract segmentation methods and categorizes them into two major types based on their technical approaches: One type targets specific anatomical fiber bundles, focusing on tracts with clearly defined structures (such as the arcuate fasciculus and corticospinal tract), and is suitable for task-oriented analysis and clinical navigation; the other type involves whole-brain tract segmentation methods, emphasizing data-driven or atlas-guided structural parcellation for the construction of large-scale structural connectivity networks and the implementation of whole-brain hierarchical analyses. In addition, this paper discusses the trade-offs of various methods in terms of applicability, accuracy, reproducibility, and computational cost. Although automated segmentation techniques have made significant progress in recent years, current methods still struggle to balance accuracy, generalizability and efficiency, and challenges remain in anatomical consistency, methodological standardization, and result interpretability. Data-driven deep learning methods have been rapidly developing in the field of tract segmentation, showing promising performance and holding potential for significant breakthroughs in the aforementioned areas.
WANG Meiling , LIU Qingshan , ZHANG Daoqiang
2025, 40(4):869-886. DOI: 10.16337/j.1004-9037.2025.04.003
Abstract:Brain imaging genomics is a burgeoning domain within data science, where an integrated analytical approach is applied to brain imaging and genomics data, frequently in conjunction with other biomarker, clinical, and environmental datasets. This strategy is employed to glean fresh insights into the phenotypic, genetic, and molecular features of the brain, along with their effects on both typical and atypical brain function and behavior. In light of the escalating significance of machine learning in biomedicine and the swiftly expanding corpus of literature in brain imaging genomics, this paper presents a current and exhaustive review of machine learning methodologies tailored for brain imaging genomics. Firstly, the related background and fundamental work in imaging genomics are reviewed. Then, we summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning and present methods for joint association analysis and outcome prediction. Finally, this paper discusses some prospects for future work.
2025, 40(4):887-900. DOI: 10.16337/j.1004-9037.2025.04.004
Abstract:Precise structural segmentation of the heart is important for the adjunctive diagnosis of cardiovascular disease and accurate preoperative evaluation. There are significant differences between images of different modalities in terms of spatial distribution and semantic expression, but existing methods mostly use single-branch network structures, which are unable to fully integrate multi-modal information and lack generalization capabilities in multi-modal tasks. To address this problem, this paper proposes a multi-branch collaborative segmentation network, i.e. multi-modal collaborative network (MCNet), which fuses the state space model Mamba with the convolutional model. The network is mainly composed of three modules: A dual-branch feature extractor based on Mamba and convolutional neural networks, a dynamic feature fusion module, and a Mamba decoder. The dual branches of the feature extractor focus on extracting global semantic and local detail features, respectively, and the dynamic feature fusion module dynamically adjusts the weights of multiple fusion paths according to the image, thus realizing dynamic feature integration in different branches. The proposed method is fully experimented on the MRI dataset ACDC of the heart and the ultrasound dataset CAMUS. Experimental results show that the proposed method, through a dynamic feature fusion module based on the mixture of experts (MoE) mechanism, dynamically adjusts the fusion weights of Mamba global features and CNN local features. In the ACDC dataset with clear boundaries, the average Dice and intersection over union (IoU) values reach 0.845 and 0.779, respectively. In the CAMUS dataset with blurred boundaries, the average Dice and IoU values reach 0.883 and 0.796, respectively, both of which outperform current mainstream methods. Additionally, ablation experiments further validate the effectiveness of each module. MCNet uses the MoE mechanism to dynamically adjust the fusion weights between global and local features in real time, enhancing structural detail integrity while maintaining global perception, thereby providing an efficient and robust solution for multi-modal cardiac image segmentation.
REN Baoxing , FENG Yanqiu , ZHANG Xinyuan
2025, 40(4):901-911. DOI: 10.16337/j.1004-9037.2025.04.005
Abstract:In ex-vivo high-resolution diffusion magnetic resonance studies, the conventional diffusion-weighted spin-echo pulse (DWI-SE) sequence is difficult to satisfy the large sample requirement due to the long scan time. Multi-shot diffusion-weighted echo-planar imaging (MS-DWI-EPI) sequence, which combines echo-planar imaging (EPI) readout and k-space segmented acquisition, not only significantly improves the scanning efficiency, but also effectively reduces the common image aberration and distortion problems of single-shot EPI. However, the microstructure resolution ability of MS-DWI-EPI in ex-vivo samples still lacks systematic validation. In this study, we perform high-resolution diffusion imaging of ex-vivo mouse brains using 3D DWI-SE sequence and 3D MS-DWI-EPI sequence, and evaluate the differences between these two sequences in signal-to-noise ratio, diffusion tensor imaging (DTI) parameter estimation, and tractography performance. Experimental results show that the scanning time of the MS-DWI-EPI sequence is nearly 50% shorter while the signal-to-noise ratio of its raw b0 images is about three times higher than the DWI-SE for the same spatial and angular resolution of acquisition. In critical anatomical regions such as the corpus callosum and hippocampus, MS-DWI-EPI not only enhances the structural contrast of DTI images, but also improves the tractography. The sequence achieves a good balance between imaging efficiency and quality, providing a more efficient diffusion-weighted imaging protocol for high-throughput microstructural studies.
ZHU Houyuan , ZHENG Lele , SHANG Hao , ZANG Xuefeng , WU Shaoqi , ZHOU Guangchao , SUN Jiande , QIAO Jianping
2025, 40(4):912-921. DOI: 10.16337/j.1004-9037.2025.04.006
Abstract:Multi-modal neuroimaging technology provides crucial technical support for the early and precise diagnosis of Alzheimer’s disease (AD). However, due to the inherent heterogeneity in imaging principles and feature representations across different neuroimaging modalities, the fusion of inter-modal information poses significant challenges. To address this issue, this study proposes a multi-modal fusion network (MFN) based on a 3D ResNet architecture for the early auxiliary diagnosis of AD. The proposed method first employs a 3D ResNet to separately extract feature representations from T1- and T2-weighted magnetic resonance images. Subsequently, an innovative cross-modal feature integration module (CFIM) is designed to overcome the limitations of direct concatenation. CFIM adopts a hierarchical fusion strategy, consisting of global information fusion module, local feature learning module and key factor module. Finally, the fused multimodal features are fed into a fully connected neural network for classification. Compared to early concatenation (fixed-weight fusion) and late fusion (shallow aggregation), this strategy more effectively identifies disease-relevant diagnostic features. Experiments conducted on the Alzheimer’s disease neuroimaging initiative (ADNI) database demonstrate that the proposed method achieves higher accuracy and superior performance in AD classification tasks compared to existing approaches. Ablation studies further validate the effectiveness of each module, offering new technical insights for multi-modal neuroimaging analysis.
HAN Pu , LIU Senling , CHEN Wenqi
2025, 40(4):922-933. DOI: 10.16337/j.1004-9037.2025.04.007
Abstract:With the rapid development of information technology, multi-modal data such as Chinese texts and images in the medical and health field has shown explosive growth. Multi-modal medical entity recognition (MMER) is a key step in multi-modal information extraction, and has attracted great attention recently. Aiming at the problems of image detail loss and insufficient text semantic understanding in multi-modal medical entity recognition tasks, this paper proposes a novel MMER model based on multi-scale attention and dependency parsing graph convolution(MADPG). This model introduces a multi-scale attention mechanism based on ResNet to collaborate to extract visual features fused with different spatial scales and to reduce the loss of important details of medical images. Thus the image feature representation and complementing text semantic information are enhanced. Then, the dependency syntactic structure is used to construct the graph neural network to capture the complex grammatical dependencies between words in medical texts, so as to enrich the semantic expression of texts and promote the deep integration of image text features. Experiments show that the F1 value of the proposed model reaches 95.12% on the multi-modal Chinese medical data set, and the performance of the proposed model is significantly improved compared with the mainstream single- and multi-modal entity recognition models.
TANG Zhanjun , JIAN Hong , WANG Jian
2025, 40(4):934-949. DOI: 10.16337/j.1004-9037.2025.04.008
Abstract:Given inherent variations among patients, discrepancies in imaging protocols, and potential data corruption, existing brain tumor segmentation methods based on magnetic resonance imaging (MRI) are often challenged by the issue of missing modality data, resulting in low segmentation accuracy. To address this, an innovative incomplete multimodal brain tumor segmentation method based on the combination of U-Net and Transformer (IM TransNet) is proposed. Firstly, a modality-specific encoder is developed for four distinct MRI modalities to enhance the model’s ability to capture unique characteristics of each modality. Secondly, a dual-attention Transformer module is embedded within the U-Net to mitigate the issue of incomplete information arising from missing modalities, thus alleviating the limitations imposed by long-range context interactions and spatial dependencies within the U-Net framework. Additionally, a skip-cross attention mechanism is incorporated into the U-Net’s skip connections to dynamically focus on features from various hierarchical levels and modalities, effectively facilitating feature fusion and reconstruction even in the presence of missing modalities. Furthermore, an auxiliary decoding module is devised to counteract the training imbalance induced by missing modalities, ensuring that the model can consistently and effectively segment brain tumors across diverse subsets of incomplete modalities. Finally, the model’s performance is validated on the publicly accessible BRATS dataset. Experimental results indicate that the proposed model attains average Dice scores of 63.19%, 76.42%, and 86.16% for enhancing tumor, tumor core, and whole tumor, respectively, highlighting its superiority and robustness in handling incomplete multimodal data. This approach offers a viable technical solution for accurate, efficient, and reliable brain tumor segmentation in clinical practice.
BI Yingzhou , LIU Shanrui , HUO Leigang , GAN Qiujing , LI Yongyu
2025, 40(4):950-961. DOI: 10.16337/j.1004-9037.2025.04.009
Abstract:Electroencephalography (EEG) signal classification plays a crucial role in emotion recognition and brain-computer interface (BCI) applications. This paper proposes a parameter-sharing cross-map token attention (CMTA) model for intra- and inter-feature map interaction. Firstly, a spatial-temporal convolutional neural network (STCNN) is used to process EEG data, generating multiple EEG feature maps. Each feature map is treated as a token and fed into a parameter-sharing multi-modal module MT, which integrates a multi-layer perceptron (MLP) and a Transformer. The MLP captures intra-feature map interactions, while the Transformer enables information exchange between feature maps, thereby extracting richer features. Finally, an adaptive classifier (Adapt-Classifier) consisting of one-dimensional adaptive pooling and a fully connected layer is used to perform EEG classification. Experimental results show that the proposed method achieves a classification accuracy of 98.86% and a Kappa value of 0.982 9 on the SEED dataset for emotion recognition, an accuracy of 81.20% and a Kappa value of 0.748 4 on the BCI Competition IV Dataset 2a for motor imagery classification, and an accuracy of 86.55% and a Kappa value of 0.735 2 on the BCI Competition IV Dataset 2b. These results demonstrate the superior performance of the proposed method in EEG classification tasks and highlight its broad applicability across different EEG datasets.
YE Zhongfu , GUO Jiayu , YU Runxiang , HUANG Xinyue
2025, 40(4):962-971. DOI: 10.16337/j.1004-9037.2025.04.010
Abstract:In channel estimation tasks for intelligent reflecting surface (IRS)-assisted communication systems when line-of-sight communication between user equipment and base station (BS) is blocked, this paper proposes a neural network based on noise suppression in the latent feature space, which can realize accurate channel estimation. The neural network combines the variational auto-encoder (VAE) and UNet to reduce the noise in the input signal while performing channel estimation. Firstly, the VAE model takes noise-free BS received signals as input, with the objective of minimizing the error between the estimated noise-free BS received signals and their true value, so that the encoder of the VAE model maps a feature vector as a potential representation of the pure received signal. Secondly, the VAE model part is fixed. The entire network is trained using noisy BS received signals as input to the UNet model, in which the noise-free latent feature vectors learned by the VAE assist the encoder of the UNet model in learning noise-free feature representations. Subsequently, the pure feature representations are fed into the decoder of the UNet model to achieve the channel estimation task. Finally, during the estimation phase, only the UNet model part is utilized, which effectively reduces computational complexity. The results of simulation experiments demonstrate that the proposed channel estimation method can effectively suppress noisy information in the feature space, and can estimate the channel information more accurately with lower time complexity.
SHEN Ruda , HE Wanyuan , XU Yifan
2025, 40(4):972-985. DOI: 10.16337/j.1004-9037.2025.04.011
Abstract:The bike sharing system (BSS) has become a significant component of implementing urban intelligent transportation systems. This paper proposes a spatio-temporal distribution dynamic perception-based campus bike-sharing resource scheduling system. To address the issue of sudden inventory changes at shared bicycle stations leading to inventory shortages, the system first models the dynamic changes at bicycle stations using the vector autoregressive moving average (VARMA) model, achieving predictions of future inventory shortage events at stations. Secondly, to resolve the contradiction between bicycle scheduling utility and cost in crowdsourced resource scheduling scenarios, it introduces a task assignment method based on a binary optimal matching model and specifically optimizes the Hungarian algorithm for efficient decision-making in task assignment. Simulation results show that the proposed method can effectively improve the system utility of bike-sharing scheduling, reduce the service quality loss caused by inventory shortages at bike stations, and effectively balance the spatio-temporal distribution of bicycles.
CHEN Zihan , YU Guangzheng , WANG Yewei , LI Zhelin
2025, 40(4):986-996. DOI: 10.16337/j.1004-9037.2025.04.012
Abstract:Over-ear headphones (or around-ear headphones) are acoustic wearable devices that directly contact with the surface of the human body. In addition to the shape and material of the earmuffs, the clamping force applied to the earmuffs will directly affect the contacting force on the scalp and the noise attenuation performance, thereby influencing the user’s wearing comfort and hearing comfort. To address the challenge of measuring and evaluating contact pressure in headphone products, a testing device is designed to employ an adjustable clamping force on a subject. In contrast, the contact pressure exerted on the scalp is measured using a pair of pressure-sensitive films. To analyze acoustic parameters during the wearing process, a pair of miniature microphones is positioned at the ear canal entrances to record and analyze the attenuation of binaural noise exposure dose (i.e. noise reduction amount) under different noise environments and various clamping forces. Finally, by incorporating comfort rating scales, the study examines the relationship between objective parameters, including clamping force, contacting force and noise attenuation, and subjective comfort perception. Based on the findings, an appropriate range for clamping force design is suggested. The experimental methodology and relevant conclusions of this study provide a reference for the design and evaluation of clamping force in over-ear headphones.
WANG Yuchen , JIANG Chengyang , KONG Huaicong , HAN Lue , LIN Min
2025, 40(4):997-1010. DOI: 10.16337/j.1004-9037.2025.04.013
Abstract:For the multi-user scenarios of the high-altitude platform (HAP)-assisted integrated satellite and aerial network, this paper proposes a novel hybrid wireless transmission scheme on terahertz (THz) and millimeter-wave (mmWave) bands, aiming at providing reliable access for heterogeneous users and solving the problem of capacity constraints of the satellite backbone network. Firstly, according to the relevance and service priorities among users, terrestrial users are divided into two groups, each of which includes a primary user and a secondary user. The secondary user adopts the cognitive radio-based non-orthogonal multiple access (CR-NOMA) technique to control its transmit power without compromising the quality of service (QoS) of the primary user, while the inter-group interference is eliminated by the zero-forcing (ZF) based beamforming technique. Secondly, it is assumed that the channel from each user to HAP experiences mmWave band with Nakagami-m distribution, while the channel from HAP to satellite experiences THz band with
ZHAO Yue , TIAN Wen , DING Xufei , DAI Yuewei
2025, 40(4):1011-1022. DOI: 10.16337/j.1004-9037.2025.04.014
Abstract:As a mobile data collection platform, unmanned aerial vehicle (UAV) has significant application prospects in the field of wireless sensor network (WSN) due to its superior mobility. To solve the problem of high energy consumption in wireless sensor networks, this study proposes a UAV-vehicle collaborative data collection optimization method for the sensor location information sharing scenario to achieve the optimization goal of minimizing sensor energy consumption under non-fixed take-off and landing of UAVs. This method considers the scheduling problem of each sensor, specifically, by introducing a convex optimization algorithm based on block coordinate descent techniques to iteratively solve the mixed integer non-convex problem in the alternating optimization of trajectory and wake-up strategy. This approach ensures that the data collection needs are met while minimizing the energy consumption of the sensor nodes (SNs). Simulation results demonstrate that this method can effectively reduce the energy consumption of wireless sensor networks in typical sensor distributions, showing its potential in more complex environments and its significant adaptability and scalability.
XU Lei , LEI Youyuan , ZHU Jun , ZHOU Jie , SHAO Genfu , ZHANG Jiaming
2025, 40(4):1023-1034. DOI: 10.16337/j.1004-9037.2025.04.015
Abstract:The memory consumption issue in MVSNet reconstruction networks, compared with CVP-MVSNet and CasMVSNet networks, reduces memory usage when processing high-resolution images and improving the accuracy of reconstructed point clouds. However, both networks still exhibit significant errors in point cloud completeness. To address this issue, this paper proposes DA-MVSNet, a multi-view 3D reconstruction network based on dilated attention and depth optimal correction. DA-MVSNet uses CasMVSNet as the baseline network, with an additional feature enhancement network that integrates a parallel dilated convolution and attention module, incorporating the concept of depth-wise separable convolutions. This enhancement strengthens the network’s ability to capture global features of input views, improving point cloud completeness. To further enhance the accuracy of output depth maps and prevent the feature enhancement network from extracting irrelevant background information, which can degrade the accuracy of the reconstructed point cloud, an optimization correction mechanism based on nonlinear least squares is introduced at the output stage of the network. The results show DA-MVSNet reduces the accuracy and completeness errors of the reconstructed point cloud by 2.5% and 4.7%, respectively, on the indoor scene DTU dataset, achieving better overall performance. However, due to the additional feature enhancement network and correction mechanism, the memory and time consumption of DA-MVSNet are not very higher than those of CVP-MVSNet and CasMVSNet.
WU Fei , MA Yongheng , DENG Zheying , WANG Yinjie , JI Yimu , JING Xiaoyuan
2025, 40(4):1035-1045. DOI: 10.16337/j.1004-9037.2025.04.016
Abstract:Text-guided editing of real images with only images and target text prompts as input is an extremely challenging problem. Previous approaches based on fine-tuning large pre-trained diffusion models often simply interpolate and combine source and target text features to guide the image generation process, which limits their editing capabilities, while fine-tuning large diffusion models is highly susceptible to overfitting and time-consuming problems. In this paper, we propose a text-guided image editing method based on diffusion model with mapping-fusion embedding(MFE-Diffusion). The method consists of the following two components: (1) A large pre-trained diffusion model and source text feature vectors joint learning framework, which enables the model to quickly learn to reconstruct the original image. (2) A feature mapping-fusion module, which deeply fuses the feature information of the target text and the original image to generate conditional embedding that is used to guide the image editing process. Experimental validation on the challenging text-guided image editing benchmark TEdBench shows that the proposed method has advantages in image editing performance.
2025, 40(4):1046-1054. DOI: 10.16337/j.1004-9037.2025.04.017
Abstract:As an objective and direct source of information, electroencephalogram (EEG) is widely used in the task of emotion recognition. In order to extract the information implicit in the spatial connectivity features of EEG signals, this paper proposes an emotion recognition method based on the spatial connectivity features and residual convolutional neural network (SCF-RCNN) model. In this method, Pearson correlation coefficient (PCC), phase-locked value (PLV) and mutual information (MI) are extracted from the preprocessed EEG signals as spatial connectivity features, and a convolutional neural network model containing two residual modules is used to extract emotional information. Experimental results on the SEED dataset show that the connection matrix constructed by PLV is more closely related to EEG emotion, with an average accuracy of 93.38% and a standard deviation of 3.35%. Compared with traditional algorithms, SCF-RCNN performs better in classification tasks in the field of emotion recognition, showing its important application potential in the field of emotion recognition.
XU Tianze , SUN Qianru , ZHANG Daoqiang , CHEN Fang
2025, 40(4):1055-1064. DOI: 10.16337/j.1004-9037.2025.04.018
Abstract:In unmanned aerial vehicle(UAV) ground monitoring tasks, operators often need to be stuck in a long monotonous wait, which is easy to make mistakes due to distraction. This paper analyzes the effects of calibration on eye movement signals and attempts to evaluate operator distraction without calibration using an eye tracker. Firstly, the collaborative search and supervision task of multiple UAVs is simulated, and the eye movement data set containing 22 subjects is constructed. Then, an eye movement velocity vector time sequence diagram method independent of specific coordinate position is proposed to visualize and qualitatively analyze the uncalibrated eye movement signals, and then eye movement behavior detection is carried out based on double-mean clustering. Finally, the feasibility of using uncalibrated eye tracker for distraction state detection is preliminarily verified by correlation analysis and classification verification on common classifiers.
XU He , YANG Dandan , LIU Sixing , JI Yimu
2025, 40(4):1065-1081. DOI: 10.16337/j.1004-9037.2025.04.019
Abstract:Diabetes is a common chronic disease, and it is very important to control blood sugar for preventing diabetes. However, the uncertainty of continuous glucose monitoring (CGM) data extraction significantly increases the difficulty of blood glucose prediction. Therefore, this article proposes a new deep learning based blood glucose concentration prediction model, aiming at improving the model’s adaptability to sensor extracted data. In this model, the stacked denoising auto encoder (SDAE) is embedded into the structure of the Transformer encoder to achieve reconstruction, denoising, and feature extraction of input data. Then, a mixed position encoding strategy is adopted to replace the original single absolute position encoding embedding, and a lightweight decoder is introduced into the Transformer model to replace the original structurally complex decoder, aggregate feature information from different levels, and obtain local and global features simultaneously. Finally, by constructing an improved SDAE-improved Transformer network for parallel training of CGM data sequences, temporal patterns and complex correlations in the data can be more comprehensively captured, thus improving predictive performance. Experimental results show that the model has achieved significant performance improvement in blood glucose prediction tasks compared to traditional methods, confirming its effectiveness and robustness in processing CGM data.
2025, 40(4):1082-1095. DOI: 10.16337/j.1004-9037.2025.04.020
Abstract:Magnetic resonance imaging (MRI) plays a crucial role in medical diagnosis, but prolonged scanning times can cause patients discomfort and motion artifacts. Parallel imaging techniques and compressed sensing theory indicate that undersampling k-space data can enhance the scanning speed, where parallel MRI accelerates the imaging process by utilizing multiple receiving coils to simultaneously acquire data from multiple channels. Leveraging its powerful feature extraction and pattern recognition capabilities, deep learning demonstrates great potential in undersampled MRI reconstruction. To overcome the limitations of existing technologies (e.g., the need for automatic calibration signals, reconstruction instability), this paper proposes an innovative reconstruction method aimed at efficiently and accurately reconstructing high-quality parallel MRI images from undersampled k-space data. The core framework of this method is a deep sparse network that unfolds the iterative process of the iterative shrinkage-thresholding algorithm (ISTA) for solving sparse models into a series of trainable layers within a deep neural network framework. Additionally, this paper introduces an adaptive preprocessing module based on multi-scale feature fusion, which further enhances the sparse representation capability of the network by integrating standard convolutions with heterogeneous convolutional kernels. Experimental results demonstrate that, compared to other advanced methods, the proposed method exhibits superior reconstruction performance across multiple datasets, including higher peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), as well as lower high-frequency error norms.
ZHAO Zhijie , KANG Xiao , ZHANG Xuening , WANG Shaohua , LIU Xingbo , NIE Xiushan
2025, 40(4):1096-1106. DOI: 10.16337/j.1004-9037.2025.04.021
Abstract:Batch-based hash learning methods are usually inadequate for real-time online retrieval of large-scale streaming data. Therefore, online hashing has emerged as a promising solution, enabling the learning of hash codes for new data without revisiting old data and adapting hash functions to coming data. However, several challenges persist, including semantic drift caused by insufficient exploration of inter-class relationships and data forgetting resulting from limited association between new and old data. To address these challenges, this paper proposes a novel supervised method named online semantic enhancement hashing (OSEH). It designs a triple matrix factorization framework, which mutually bridges the gap of original features and one-hot labels, thereafter constructing a fine-grained label matrix. Moreover, by seamlessly integrating label embedding and pairwise similarity, the proposed method effectively embeds enriched semantics into the process of hash learning, optimizing both hash code and function. Experimental evaluations conducted on benchmark datasets validate the effectiveness of the proposed method.
WANG Hongbin , WANG Weiwei , YANG Songhan , ZHU Yijie , SUN Yi , ZOU Yizhen , CHENG Jianhong
2025, 40(4):1107-1120. DOI: 10.16337/j.1004-9037.2025.04.022
Abstract:To address the issues of low efficiency and high error rate associated with traditional manual inspection methods for chip electronic components, this paper proposes an automatic detection method and system based on the normalized cross-correlation algorithm. The proposed system integrates the normalized cross-correlation algorithm with a vision-guided four-axis robotic arm and a secondary bottom-view positioning technique to achieve efficient recognition and precise grasping of components. Additionally, an innovative rotational center calibration algorithm is developed to effectively compensate for coaxiality errors in vacuum nozzles. The system is compatible with various types of electronic components, including chip tantalum capacitors, ceramic capacitors in 0402 packages, and SOD-323 surface-mounted diodes, achieving a positioning accuracy of 0.008 5 mm. Experimental results demonstrate that identifying 100 components takes only 68 ms, with a 100% accuracy rate for orientation and polarity recognition. The rotational center calibration reduces coaxiality errors from 0.4 mm to 0.008 mm, component damage and rejection rates are reduced to 0.01% and 0.02%, respectively, and the successful placement rate reaches 99.98%. Compared with the traditional manual inspection, the system increases detection efficiency by over five times, providing a high-precision and high-reliability technical solution for automatic inspection of electronic components and significantly advancing the application of machine vision technology in industrial inspection.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links