• Volume 40,Issue 2,2025 Table of Contents
    Select All
    Display Type: |
    • Research Review on Low-Altitude Visual Datasets for Unmanned Aerial Vehicles

      2025, 40(2):274-302. DOI: 10.16337/j.1004-9037.2025.02.002

      Abstract (1270) HTML (2785) PDF 8.13 M (869) Comment (0) Favorites

      Abstract:Driven by the cross-domain synergy of unmanned aerial vehicle (UAV) technology and artificial intelligence, and supported by national low-altitude economic policies and pilot reforms for airspace opening, the low-altitude visual perception has played a significant role in smart cities, inspection, rescue, and other applications. High-quality low-altitude visual data serve as the crucial foundational resource in the field of low-altitude intelligent perception, and the release and application of public datasets have been pivotal in advancing low-altitude perception technologies. Despite the proposal of numerous datasets for low-altitude visual perception, systematic organization and analysis of these datasets remain inadequate. To address this issue, this paper conducts a comprehensive survey of publicly released low-altitude UAV vision-related datasets over the past 11 years, categorizes and explores them based on different data characteristics and application scenarios, and selects representative datasets for detailed analysis. This review covers multiple domains, including single-UAV perception, multi-UAV cooperative perception, multi-task perception, multi-source perception, complex environmental characteristics, and UAV embodied intelligence. To facilitate researchers’ understanding and use, the paper summarizes the basic information of all datasets in graphical form and systematically analyzes their development trends from two main dimensions: (1) metadata analysis, including dataset size distribution, scenario distribution, and supported task types; and (2) basic information analysis, involving total image and video counts, target category distribution, and annotation instance numbers. The analysis fully demonstrates the significant progress in the quality of low-altitude visual perception datasets. Meanwhile, it points out that, despite the initial formation of a systematic framework for low-altitude data, issues such as the imbalance between cost and efficiency in low-altitude data annotation, insufficient reusability of multi-source data, inadequate coverage of extreme environments, and fragmented embodied intelligence data still exist. Finally, this paper proposes outlooks for the future development of low-altitude datasets.

    • End-to-End Video Compression Technology and Its Application in Unmanned Aerial Vehicles

      2025, 40(2):303-319. DOI: 10.16337/j.1004-9037.2025.02.003

      Abstract (547) HTML (908) PDF 6.19 M (285) Comment (0) Favorites

      Abstract:The field of multimedia visual representation and transmission is undergoing profound transformation, with end-to-end optimized intelligent video coding technologies serving as the driving force. The compression of emerging video content represented by unmanned aerial vehicle (UAV) videos has further stimulated the development of core technologies and innovation in application scenarios. Focusing on end-to-end video coding technology and its initial exploration in UAV video coding, this study proposes a hierarchical bi-directional reference structure-based video coding method that addresses the shortcomings of existing models in motion representation efficiency and predictive coding accuracy. The targeted design introduces a parameter-shared motion codec, a bi-directional scaled motion representation method, and credible motion modeling technology, significantly improving the rate-distortion performance of UAV video compression and outperforming traditional video coding standards such as H.266/VVC. This work provides novel insights for the advancement of key intelligent video coding technologies and their practical applications, demonstrating promising potential for future deployment in UAV visual perception and related domains.

    • Low Bit Rate Generative Drone Video Compression

      2025, 40(2):320-333. DOI: 10.16337/j.1004-9037.2025.02.004

      Abstract (446) HTML (756) PDF 4.34 M (260) Comment (0) Favorites

      Abstract:In complex environments across air, space, land, and sea, the massive volume of video data exerts tremendous pressure on limited transmission bandwidth and storage devices. Therefore, improving the coding efficiency of video compression technologies under low bit rate conditions becomes crucial. In recent years, deep learning-based video compression algorithms have made significant progress, yet due to issues such as model design flaws, mismatches between optimization objectives and perceptual quality, and biases in training data distributions, the visual perception quality at extremely low bit rates has been compromised. Generative encoding effectively improves the texture and structure restoration ability at low bit rates through data distribution learning, alleviating the problem of blur artifacts in deep video compression. However, there are still two major bottlenecks in existing research: Firstly, time domain correlation modeling is insufficient and inter-frame feature correlation is missing; secondly, the lack of dynamic bit allocation mechanism makes it difficult to achieve adaptive extraction of key information. Therefore, this article proposes a video encoding algorithm based on conditional guided diffusion model-video compression (CGDM-VC), aiming to improve the perceptual quality of videos under low bit-rate conditions while enhancing inter-frame feature modeling capabilities and preserving key information. Specifically, the algorithm designs an implicit inter-frame alignment strategy, utilizing a diffusion model to capture potential inter-frame features and reduce the computational complexity of estimating explicit motion information. Meanwhile, the designed adaptive spatio-temporal importance-aware coder can dynamically allocate code rates to optimize the generation quality of key regions. Furthermore, a perceptual loss function is introduced, combined with the learned perceptual image patch similarity (LPIPS) constraint, to improve the visual fidelity of the reconstructed frames. Experimental results demonstrate that, compared to algorithms such as deep contextual video compression (DCVC), the proposed method achieves an average LPIPS reduction of 36.49% under low bit rate conditions (<0.1 BPP), showing richer texture details and more natural visual effects.

    • Object Detection Algorithm for UAV Maritime Rescue Based on Dynamic Progressive Fusion

      2025, 40(2):334-348. DOI: 10.16337/j.1004-9037.2025.02.005

      Abstract (540) HTML (959) PDF 3.32 M (288) Comment (0) Favorites

      Abstract:Unmanned aerial vehicle (UAV) object detection plays a crucial role in maritime rescue missions. However, the varying perspectives and altitudes inherent in UAV aerial photography lead to multi-scale variations in object individuals and vessels. Additionally, the glare resulting from sunlight reflecting off the sea surface can cause false detection issues. To address these challenges and meet the lightweight requirements of real-time object detection algorithms for UAVs, this paper proposes a lightweight UAV maritime rescue object detection algorithm based on dynamic progressive fusion (DPF-YOLO), using YOLOv8n as the baseline network. Firstly, we introduce a lightweight redundant information extraction module (RIEM) that reduces redundant information in feature maps, highlighting key features to mitigate false detections caused by glare. Secondly, we propose a dynamic multi-scale feature extraction module (DMFEM) that dynamically adjusts the receptive field to accommodate objects of varying scales, enhancing multi-scale feature representation capabilities. Finally, by integrating the DMFEM module, we develop a dynamic progressive fusion network (DPFNet). This network employs a progressive fusion structure to reduce semantic differences between non-adjacent layers with objects of different scales, thereby improving multi-scale feature fusion. DPF-YOLO is designed with P2, P3 and P4 detection layer structure to accommodate the object sizes in maritime rescue scenarios, enrich multi-scale information, and enhance feature extraction for small objects. Experimental results on the SeaDronesSee v2 dataset demonstrate that DPF-YOLO achieves a detection accuracy of mAP0.5 = 72.2% with only 1.19 M of parameters. Compared to the baseline network YOLOv8n, DPF-YOLO reduces the number of parameters by 60.5%, increases the recall rate by 12.4%, and improves precision by 8.2%. The generalization experimental results on the VisDrone dataset demonstrate that DPF-YOLO possesses excellent generalization capabilities.

    • Infrared Small Target Detection Based on Low-Rank Tensor Subspace Learning

      2025, 40(2):349-364. DOI: 10.16337/j.1004-9037.2025.02.006

      Abstract (442) HTML (397) PDF 5.23 M (213) Comment (0) Favorites

      Abstract:Infrared target detection system is one of the effective technical means for reliably detecting and identifying high-value targets under the conditions of background radiation and other interferences, and it is widely used in various fields. Infrared weak target detection, as an important part of the system, is still a challenging key core technology at present. In this paper, a method based on low-rank tensor spatial learning is proposed, which preserves the structural integrity of the infrared image while considering the consistency of the sequences in the spatio-temporal continuum. The spatio-temporal tensor block model is obtained through a spatio-temporal sliding window, and the infrared tensor dictionary model is constructed under different scenes using a multi-subspace learning strategy. Finally, an optimization algorithm is used to solve the proposed infrared tensor objective function to obtain the low-rank background and sparse target tensor, and the interested infrared weak targets are detected by reconstructing the image. Experimental results show that the method outperforms other existing detection algorithms for target detection in complex-background environments with high-reflection-induced false alarms and combined strong interference scenarios.

    • Composite-Cost-Based Fast Light-Field 3D Imaging Method for Handling Spatial Occlusions

      2025, 40(2):365-373. DOI: 10.16337/j.1004-9037.2025.02.007

      Abstract (327) HTML (481) PDF 2.04 M (242) Comment (0) Favorites

      Abstract:Light field cameras, with their multi-dimensional imaging capabilities and minimal resource allocation, expand the exploration boundaries of imaging applications in unstructured air-ground-sea environments. The process of light field imaging is susceptible to occlusion and noise , and may produce unreliable depth estimation. This paper proposes a fast light fields depth estimation method for spatial occlusion-oriented, analyzes the main factors affecting the accuracy of depth estimation in depth, and establishes the optimal light field fast filtering architecture for different spatial occlusion modes. Then a highly integrated composite cost is constructed using single-bit features of pixel points to achieve depth image refinement and occlusion optimization. The experiments demonstrate that the computational efficiency of this method is significantly better than those of Markov random fields, and can reduce the MSE by 51.3%, the reliability of the depth estimation algorithm is improved at a lower operational cost, and this method is expected to provide strong support for the application of light-field imaging technology in complex scenes.

    • Degradation Information-Guided Underwater Light Field Image Enhancement and Angular Reconstruction

      2025, 40(2):374-383. DOI: 10.16337/j.1004-9037.2025.02.008

      Abstract (343) HTML (381) PDF 3.43 M (218) Comment (0) Favorites

      Abstract:Unlike traditional 2D RGB imaging, 4D light field imaging captures the scene from multiple angular and carries its own geometric information. This feature is expected to solve the problem of underwater imaging. We propose a degradation information-guided underwater 4D light field image enhancement and angular reconstruction network based on the angular properties of 4D light field images. The network learns the degradation information of underwater images from different angular views after downsampling. It converts the degradation information into a convolution kernel to be passed to the original-size underwater light field image, realizing efficient exchange of degradation information between underwater images of different angular views. By fully using the degradation information and spatial-angular information of the underwater light field image, the network proposed in this paper can better complete the image enhancement and angular reconstruction of the underwater light field. Meanwhile, this paper proposes the spatial-angular aggregation convolution for the light field characteristics, which efficiently learns the correlation of texture information between different views by calculating the gradient difference between the centre pixel and other view pixels. The effectiveness of the network design is fully verified through quantitative experiments as well as qualitative experiments.

    • Fine-Grained Image Recognition Method Based on Attention and Multi-scale Ensemble Learning

      2025, 40(2):384-400. DOI: 10.16337/j.1004-9037.2025.02.009

      Abstract (433) HTML (680) PDF 4.54 M (246) Comment (0) Favorites

      Abstract:Fine-grained image recognition (FGIR) is an important research topic in the field of computer vision. Its main goal is to distinguish subclasses with high similarity in appearance under the same category. This paper focuses on the research of weakly-supervised fine-grained image recognition technology. Given the problems of insufficient use of feature of fine-grained images and difficulty in digging discriminative regions existing in the research of FGIR, the attention and multi-scale ensemble-learning based network (AMEN) is proposed. This method introduces a progressive learning network, which uses the strategy of ensemble learning to construct multi-scale base-classifiers based on three levels of output features of deep neural network in parallel, and uses the label smoothing method to carry out progressive training for multi-scale base-classifiers, so as to greatly improve the utilization of low-level features. At the same time, the efficient dual channel attention is used to impose channel weights on features, so that the network can independently select features at the channel level, so as to improve the utilization of high information correlation channels. This method also introduces a self-attention region proposal network, which promotes the model to gradually locate the more discriminative region by constructing a circular feedback mechanism, and fuses the feature information of the complete image and the discriminative region in the final classification module. Experimental results show that the recognition accuracy of AMEN on three fine-grained image datasets of CUB-200-2011, FGVC Aircraft and Stanford Cars has reached the advanced level of the field.

    • Context-Aware Image Restoration Based on Fused Semantic Information

      2025, 40(2):401-416. DOI: 10.16337/j.1004-9037.2025.02.010

      Abstract (280) HTML (446) PDF 4.22 M (237) Comment (0) Favorites

      Abstract:In recent years, generative adversarial networks have been widely used in the field of image restoration and have achieved good results. However, current methods do not consider problems of blurred structures and textures in high-resolution images (512×512), which mainly come from the lack of effective feature information. To address this problem, this paper proposes a generative adversarial network that combines image features with semantic information. Based mainly on semantic information, a context-aware image restoration model is proposed, which adaptively fuses semantic information with image features, and adaptive convolution is proposed to replace the traditional convolution, as well as a multi-scale context aggregation module is added after the decoder to capture long-distance information for contextual inference. Experiments are conducted on Places2, CelebA-HQ, Paris Street View, and Openlogo datasets, whose results show that the proposed method improves in terms of L1 loss, peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) in comparison with the existing methods.

    • Time-Series Decomposition and Attention Graph Neural Network Based Traffic Forecasting

      2025, 40(2):417-430. DOI: 10.16337/j.1004-9037.2025.02.011

      Abstract (367) HTML (440) PDF 1.85 M (197) Comment (0) Favorites

      Abstract:In order to address challenges on how to accurately capture the spatial-temporal dependency, dynamic information and spatial heterogeneity information in traffic forecasting, we propose a time-series decomposition and attention graph neural network (TDAGNN) based traffic forecasting. Specifically, the model first adopts the dual time-series decomposition convolutional neural network (DTDCNN) to extract temporal dependency from traffic data. Secondly, the multi-head interactive attention (MIA) network is introduced to capture spatial heterogeneity and dynamicity from traffic data via the interactivity between the original features and the local augmentation features. Thirdly, the self-scaling dynamic diffusion graph neural network (SDDGNN) is introduced for capturing the spatial dependence and dynamicity from the traffic data. Finally, extensive experiments are carried out for some datasets. Experimental results demonstrate that the average MAE, RMSE and MAPE of the proposed model can be improved up to 14.64, 23.68 and 9.41% respectively, compared to other classic algorithms, proving its high prediction accuracy.

    • A Simplified Implementation Method of CSI Feedback Transformer Network Based on Data Clustering

      2025, 40(2):431-445. DOI: 10.16337/j.1004-9037.2025.02.012

      Abstract (341) HTML (552) PDF 4.37 M (227) Comment (0) Favorites

      Abstract:In order to cope with the increasing overhead of channel state information (CSI) feedback in massive multiple-input multiple-output (MIMO) systems, deep learning-based CSI feedback networks (such as Transformer) have received extensive attention and become very promising intelligent transmission technologies. To this end, this paper proposes a simplification method of CSI feedback Transformer network based on data clustering, which uses clustering-based approximate matrix multiplication (AMM) to reduce the computational complexity of the Transformer network in the feedback process. In this paper, we focus on the computation of the fully connected layer in the Transformer network (equivalent to matrix multiplication), adopt the simplification methods such as product quantization (PQ) and MADDNESS, analyze their influence on the computational complexity and system performance, and optimize the algorithm according to the characteristics of neural network data. Simulation results show that the performance of the CSI feedback network based on the MADDNESS method is close to that of the exact matrix multiplication method with an appropriate parameter adjustment, and the computational complexity can be greatly reduced.

    • Performance Analysis of Sensing and Communication Probability Fusion System for Rayleigh Channels

      2025, 40(2):446-455. DOI: 10.16337/j.1004-9037.2025.02.013

      Abstract (334) HTML (296) PDF 806.37 K (195) Comment (0) Favorites

      Abstract:This article presents a Rayleigh fading channel model of integrated sensing and communication (ISAC), proposes a method of probability fusion after integrated sensing and communication (PF-ISAC), and derives the PF-ISAC channel model. It is theoretically proved that when the sensing signal to noise ratio(SNR) approaches infinity, the ISAC model degenerats into an ideal channel state information(CSI) scenario, and when the sensing SNR approaches zero, the ISAC model degenerats into a scenario where CSI is unknown. The relationship between mutual information and SNR of the PF-ISAC system is given. As the SNR increases, the channel capacity of the mutual information gradually approaches the capacity of the ideal CSI when the CSI is unknown. This article proposes a probability fusion after maximum a posterior (PF-MAP) detection method and a probability fusion after maximum likelihood (PF-ML) detection method, and compares them with the minimum mean square error(MMSE) estimation-MMSE detection method(MMSE-MMSE). The results show that PF-MAP performs similarly to MMSE-MMSE at low to medium SNRs, while PF-MAP outperforms MMSE-MMSE at high SNRs. We evaluate the error performance of the PF-ISAC system using entropy error (EE). Results show that MMSE-MMSE, PF-MAP, PF-ML have significant gaps from the theoretical optimal performance EE. Finally, a scheme for power allocation in ISAC system is proposed. When the total power is given, the performance of two-stage equal power allocation in ISAC system is close to optimal.

    • Heart Sound Classification Using Bi-LSTM and Self-attention Mechanism

      2025, 40(2):456-468. DOI: 10.16337/j.1004-9037.2025.02.014

      Abstract (355) HTML (329) PDF 1.48 M (206) Comment (0) Favorites

      Abstract:Heart sound auscultation is an effective diagnostic method for early screening of heart disease. In order to improve the performance of abnormal heart sound detection, this paper proposes a heart sound classification algorithm based on bi-directional long short-term memory (Bi-LSTM) network and self-attention mechanism (SA). Firstly, the heart sound signal is partitioned into frames, and the Mel-frequency cepstral coefficients (MFCC) features are extracted from each frame of the heart sound signal. Next, the MFCC feature sequence is input into the Bi-LSTM network to extract the temporal contextual features of the heart sound signals. Then, the weights of the features output from the Bi-LSTM network at each time step are dynamically adjusted through self-attention mechanism, and more discriminative heart sound features that are conducive to classification are obtained. Finally, the Softmax classifier is used to classify normal/abnormal heart sounds. The proposed algorithm is evaluated using 10-fold cross-validation on the heart sound dataset provided by PhysioNet/CinC Challenge 2016, and achieves sensitivity of 0.942 5, specificity of 0.943 7, accuracy of 0.836 7, F1 score of 0.886 5, and accuracy of 0.943 4, respectively, which are superior to typical comparative algorithms. Experimental results show that the proposed algorithm can effectively detect abnormal heart sounds without the need for heart sound segmentation, and has potential clinical application prospects.

    • Multi-scale Crossed Algorithm for Ultrasound Medical Image Segmentation Based on MSC-LSAM

      2025, 40(2):469-484. DOI: 10.16337/j.1004-9037.2025.02.015

      Abstract (433) HTML (609) PDF 3.91 M (225) Comment (0) Favorites

      Abstract:Stroke is one of the leading causes of death and disability around the world. Carotid artery stenosis (CAS) and cardiac lesions are important contributing factors to ischemic stroke, and ultrasound imaging has shown great potential in diagnosing ischemic stroke caused by CAS and cardiac lesions. But ultrasound images present significant segmentation challenges due to noise and blurred boundaries. To address this issue, the MSC-LSAM algorithm, a multi-scale crossed dual encoder network for ultrasound image segmentation is proposed. It aims to achieve rapid and accurate segmentation of carotid and cardiac cavities, assisting physicians in disease diagnosis. In the MSC-LSAM, the encoder part parallels a segment anything model (SAM) vision encoder and an UNet encoder, while the decoder part utilizes an UNet decoder. In the SAM image encoder, we froze the pretrained SAM image encoder and introduce efficient adapter blocks in Transformer layers, referred to as learnable SAM (LSAM). LSAM maintains learning capability and high generalization ability while having a low number of parameters. In the global UNet network, we incorporate the multi-scale cross-axial attention (MCA) blocks to achieve cross-fusion of multi-scale features between different axes, effectively enhancing edge segmentation capabilities and suppressing model overfitting. Following the parallel encoders, the efficient channel attention (ECA) block is added to enable integration of multi-scale features from dual encoders, reducing incorrect segmentation caused by feature level mismatches. MSC-LSAM achieves good performance on both the publicly available cardiac ultrasound dataset of CAMUS and the self-constructed carotid artery ultrasound dataset of CAUS. Average dice similarity coefficients (DSCs) for the segmentation of the two-chamber (2CH) and four-chamber (4CH) datasets in CAMUS reach 0.927 and 0.934, respectively; while the average DSC for the CAUS dataset reaches 0.917. MSC-LSAM achieves good segmentation accuracy in tasks of carotid lumen and cardiac chamber ultrasound image segmentation, surpassing mainstream segmentation algorithms, and shows promising application prospects.

    • Hybrid Neural Network Model Based on Sparrow Search Algorithm and Its Application in Blood Glucose Prediction

      2025, 40(2):485-500. DOI: 10.16337/j.1004-9037.2025.02.016

      Abstract (327) HTML (452) PDF 3.36 M (212) Comment (0) Favorites

      Abstract:Diabetes is one of the most common diseases that endanger human health. Effective management and control of blood sugar is very important for patients. Traditional blood glucose prediction models are mostly single deep learning models, which have the defect of insufficient accuracy or low efficiency, restricting their effect in practical application. Therefore, a hybrid neural network model based on sparrow search is proposed and applied to blood glucose prediction. The proposed model combines a temporal convolutional network (TCN) and a gated recurrent unit (GRU), and it is a sequential neural network trained in an end-to-end manner to predict blood glucose based on a patient’s blood glucose level history. In order to ensure the generalization ability of the model, data sets from two different sources are used for validation. Firstly, the feature sampling frequency of multi-source timing monitoring data is set at a time interval of 5 min, the data is smooth-processed and standardized, and the timing pattern and dependency characteristics are captured by TCN. Then, by constructing a GRU model based on the attention mechanism (GRU-Attention), features are further extracted and modeled. Finally, the sparrow search algorithm is used to optimize the hyperparameters of the TCN and GRU-attention models to realize the blood glucose prediction model. To prove the validity of the proposed model, its prediction results are compared with those of other models, including LSTM, ARIMA, RNN, etc. The results show that the proposed TCN and GRU-Attention models based on the sparrow search algorithm perform well in the task of predicting blood glucose value. The root mean square error (RMSE) and mean absolute error (MAE) of the two datasets are 0.552 and 0.402, 0.531 and 0.388, respectively, which are all better than other models.

    • Multi-granularity Intuitionistic Fuzzy Three-Way Decision Model Based on Regret Theory

      2025, 40(2):501-516. DOI: 10.16337/j.1004-9037.2025.02.017

      Abstract (362) HTML (235) PDF 1.41 M (198) Comment (0) Favorites

      Abstract:When solving complex multi-granularity decision-making problems, traditional three-way decision models based on functions or relationships tend to ignore the multi-granularity characteristics of information in reality and the limitations of the cognitive ability of decision makers. Based on it, this paper proposes a multi-granularity intuitionistic fuzzy three-way decision model based on regret theory. Firstly, to deal with complex calculation problems of intuitionistic fuzzy numbers, the θ operator is integrated with intuitionistic fuzzy rough sets, a multi-granularity upper and lower approximation operator for intuitionistic fuzzy rough sets is proposed, and the corresponding three-way decision rules are given. Secondly, to integrate the cognitive characteristics of the decision-maker into the decision-making process, a multi-granularity three-way sorting method under optimistic and pessimistic strategies is constructed based on the regret theory. Finally, the effectiveness of the proposed model is verified by a group decision example of the competency assessment of “Chinese+vocational” talents in international Chinese education, which provides a new method for uncertain decision-making problems that integrate decision-maker risk preferences in an intuitionistic fuzzy environment.

    • Event Detection Method Based on Type-Semantic Prompts

      2025, 40(2):517-529. DOI: 10.16337/j.1004-9037.2025.02.018

      Abstract (387) HTML (318) PDF 1.61 M (222) Comment (0) Favorites

      Abstract:Addressing the issue of error propagation in existing research that decomposes the event detection process into two staged tasks of trigger recognition and classification, this paper proposes an event detection method based on type-semantic prompts. This method uses event types as prompt information to guide the model in extracting triggers corresponding to the event types from event text. It enables the parallel execution of trigger recognition and classification tasks, thereby mitigating the issue of error propagation between tasks. Firstly, the cross-attention mechanism is utilized to process the representation of the event text and the prompt template consisting of event types, obtaining a fused prompt representation that integrates the event text information. Then, the cosine similarity between the prompt representation and the event context representation is computed to obtain the probability distribution of the trigger positions corresponding to the event types in the event text. Finally, the position of the trigger corresponding to the event type is determined based on the probability distribution of positions, thus achieving parallel execution of trigger recognition and classification tasks. Experimental results on the ACE2005 and MACCROBAT-EE datasets demonstrate an improvement in the F1 score of the proposed method in event detection tasks.

    • Prediction Method of EV Charging Demand Power Based on Reinforcement Learning and Variable Weight Combination Model

      2025, 40(2):530-544. DOI: 10.16337/j.1004-9037.2025.02.019

      Abstract (341) HTML (251) PDF 3.21 M (199) Comment (0) Favorites

      Abstract:When an electric vehicle (EV) is connected to a charging pile, it is very important to accurately predict the charging demand power of the battery pack of the EV to prevent the battery pack from being overcharged. Due to the complexity of the physical model of battery pack, it is usually difficult to build a power prediction method based on it, and its real-time performance is not high. In addition, the prediction accuracy of a single prediction model is low. Aiming at the above problems, combining charging data with machine learning, this paper proposes an EV charging demand power prediction method based on reinforcement learning (RL) and variable weight combination model. Firstly, based on the traditional grey wolf optimization (GWO) algorithm, chaos mapping and elite reverse learning strategy are combined to improve the quality of the initial population, and the dynamic weight strategy of reinforcement learning is used to update the individual position of grey wolf to optimize the parameters in the least square support vector machine (LSSVM) algorithm. Then, the weights of the extreme learning machine prediction model and the improved LSSVM prediction model are reasonably distributed by the variable weight combination method based on time-varying weight distribution, so as to solve the shortcomings of the single prediction model method. Finally, the actual charging data of electric vehicles are used to verify the proposed prediction algorithm. Compared with the other three traditional methods, the prediction accuracy of the new method is improved by 4.75%, 3.84% and 0.38%, respectively.

    • Lightweight Object Detection Algorithm for Electric Meter Calibration Line

      2025, 40(2):545-560. DOI: 10.16337/j.1004-9037.2025.02.020

      Abstract (333) HTML (485) PDF 7.90 M (218) Comment (0) Favorites

      Abstract:In the industrial production line scenario, target detection technology with visual information has become a new hotspot for intelligent production, providing decision-making information for fault detection and elimination. In response to issues such as target occlusion and dense arrangement of small targets in the electric energy meter production line inspection scenario, this study proposes a lightweight target detection algorithm based on YOLOv8n. By introducing the O-GELAN module, the algorithm achieves richer feature levels while maintaining low computational complexity. The neck architecture of feature collection, fusion, and distribution, along with the channel position attention mechanism, enables cross-layer feature fusion. Furthermore, a re-parameterized convolutional detection head is employed to enhance detection efficiency. Experiments conducted on field-collected production line data demonstrate that the improved model’s mAP(0.5) and mAP(0.5∶0.95) have reached 0.994 and 0.828, respectively, with a detection speed of up to 111.5 frames per second. This meets the high precision and real-time requirements of industrial scenarios and can provide auxiliary decision-making for fault elimination.

Quick search
Search term
Search word
From To
Volume retrieval