• Issue 2,2026 Table of Contents
    Select All
    Display Type: |
    • Hybrid RF/FSO Transmission Technologies and Development Trends for 6G Space-Air-Ground Integrated Networks

      2026(2):288-302. DOI: 10.16337/j.1004-9037.2026.02.002

      Abstract (13) HTML (44) PDF 4.40 M (29) Comment (0) Favorites

      Abstract:To meet the demands for comprehensive three-dimensional coverage and massive connectivity in sixth generation of communication system (6G), establishing a space-air-ground integrated networks (SAGIN) has become a crucial development direction. However, both standalone radio frequency (RF) and free-space optical (FSO) communication technologies have inherent limitations, making it challenging for either to independently fulfill the future network’s comprehensive requirements for ultra-high speed, ultra-high reliability, and wide-area dynamic access. Against this backdrop, integrating the complementary advantages of RF and FSO communications to build an intelligent and cooperative hybrid RF/FSO transmission network for SAGIN has become a key pathway to overcoming current technological bottlenecks. This paper provides a systematic review of domestic and international research advancements in this field, proposing an integrated optical and RF for cognitive software defined network architecture for SAGIN. It focuses on channel modeling methodologies for RF and FSO links applicable to heterogeneous space-air-ground environments, whilst conducting an in-depth analysis of core challenges including high-dynamic link precision alignment, intelligent allocation of heterogeneous resources, and robust transmission under extreme conditions. The paper then elaborates on key enabling technologies such as hybrid RF/FSO beam tracking, adaptive RF/FSO switching, parallel collaborative transmission, and scenario-specific link selection. Future research trends are also outlined, encompassing deep integration of intelligent algorithms, enhancement of cross-domain disturbance-resistant transmission, and holistic system performance optimization. Studies demonstrate that hybrid RF/FSO technology can significantly improve the overall performance of SAGIN. Nevertheless, its path towards large-scale application necessitates further in-depth research on cross-layer coordination mechanisms, dynamic resource management, and system-level performance evaluation.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
    • Human-Centered Trustworthy Visual Intelligence

      2026(2):303-331. DOI: 10.16337/j.1004-9037.2026.02.003

      Abstract (15) HTML (32) PDF 6.63 M (23) Comment (0) Favorites

      Abstract:This survey reviews human-centered trustworthy visual intelligence by summarizing its application landscape, key techniques, and emerging trends. As computer vision advances from perception to highly autonomous decision making and physical execution, risks related to privacy, fairness, robustness, transparency, and safety become increasingly salient. When system outputs may affect human safety and rights, performance optimization alone can no longer satisfy the requirements for trustworthiness. From a computer vision perspective, the paper traces the concept and evolution of trustworthy visual intelligence, emphasizing the multiple roles of humans as data subjects, cognitive participants, and ultimate controllers. A unified framework is then presented along three complementary spaces, information, cognitive, and physical, and a progressive paradigm is formulated that focuses on humans, serves humans, and remains under human control. The survey synthesizes human-oriented visual data analysis methods under fairness and privacy constraints, robust and responsible model design strategies, and human-machine collaborative control mechanisms centered on transparency and safety, with discussions across representative scenarios such as image enhancement, video analysis, robotic manipulation, and 3D visual perception. Finally, open challenges and future directions are outlined, including robustness evaluation, cross-scenario generalization, collaborative governance, and sustainable deployment, providing a roadmap for trustworthy visual intelligence in real-world systems.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
    • A Survey of Datasets Collection and Processing for Embodied Intelligence

      2026(2):332-346. DOI: 10.16337/j.1004-9037.2026.02.004

      Abstract (9) HTML (31) PDF 2.61 M (22) Comment (0) Favorites

      Abstract:In recent years, vision-language-action (VLA) models have attracted significant attention in the field of embodied intelligence. As model scale continues to grow, their ability to generalize across complex tasks has steadily improved. However, such performance improvements rely heavily on the availability of large-scale, high-quality training data. Unlike natural language processing and computer vision, which can directly leverage massive internet data, data collection in embodied intelligence typically involves physical interactions between real robots and their environments, leading to high collection costs and complex acquisition processes. Efficiently obtaining, processing, and organizing such data has therefore become a critical challenge for advancing embodied intelligence. To address this issue, this paper provides a systematic review of data collection and processing methods in embodied intelligence. First, we summarize the major data acquisition paradigms from the perspective of data sources and collection strategies, and analyze their characteristics and limitations in terms of data quality, scalability, and collection cost. Second, we present a standardized processing pipeline for embodied intelligence datasets, focusing on key technical components such as action representation alignment, multimodal temporal synchronization, language semantic normalization, and data quality control. Finally, we discuss the evolving data ecosystem in embodied intelligence, highlighting current challenges and potential future directions. The analysis presented in this paper aims to provide insights for dataset construction and large-scale robot learning research in embodied intelligence.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
    • Speech Deepfake Attribution: The State of the Art and Prospects

      2026(2):347-370. DOI: 10.16337/j.1004-9037.2026.02.005

      Abstract (7) HTML (28) PDF 2.45 M (26) Comment (0) Favorites

      Abstract:With the rapid evolution of generative artificial intelligence, speech deepfake technologies have achieved unprecedented realism, enabling the synthesis of highly natural and speaker-specific speech from only a few seconds of reference audio. While traditional countermeasures have primarily focused on binary detection—such approaches are insufficient for forensic investigation, legal accountability, and security governance. In real-world adversarial scenarios, it is not enough to determine whether speech is fake; it is equally critical to identify how it was generated, whose voice characteristics were exploited, and which specific model instance may have been involved. This paradigm shift from “detection” to “attribution” marks a fundamental transformation in speech security research. This paper presents a comprehensive survey of speech deepfake attribution, systematically organizing the field into a hierarchical forensic framework that includes three progressive tasks: forgery method attribution, source speaker attribution, and model inversion. Forgery method attribution aims to identify the generative architecture or vocoder family responsible for producing the fake speech by exploiting intrinsic “model fingerprints” embedded in spectral, temporal, and phase domains. Source speaker tracing focuses on recovering or verifying the identity of the original speaker whose voice was converted, leveraging residual prosodic, behavioral, and physiological cues that survive imperfect disentanglement in voice conversion systems. Model inversion represents a deeper forensic objective, attempting to infer specific model parameters or configurations from generated speech, thereby bridging the gap between class-level attribution and instance-level accountability. From both the perspectives of generative model mechanisms and physical acoustic characteristics of speech signals, the feasible core principles for each subtask are elaborated. Different dimensions, such as architectural frameworks and training strategies, are distinguished to systematically organize the research status, mainstream methodologies, and technological evolution paths of each subtask. Furthermore, benchmark datasets and evaluation metrics for both closed-set and open-set scenarios are systematically summarized. Finally, the paper discusses emerging challenges such as open-world generalization, robustness under complex channel distortions and neural codecs, adversarial attacks, and ethical constraints related to privacy and legal admissibility. Future directions are outlined toward proactive traceability, model-level reverse engineering, robust feature disentanglement, and the integration of active watermarking with passive forensic techniques. The survey aims to provide a structured roadmap for advancing speech deepfake attribution and fostering a trustworthy digital speech ecosystem.Highlights:1. A hierarchical framework for speech deepfake attribution is systematically established, unifying forgery method attribution, source speaker tracing, and model inversion into a progressive forensic paradigm beyond binary real/fake detection.2. The intrinsic mechanisms of attribution are analyzed from generative model fingerprints and acoustic signal characteristics, revealing how architectural design, training strategies, and inference processes leave distinguishable trace patterns.3. Open-world robustness, complex channel conditions, and model instance reverse engineering are identified as key challenges, with future directions proposed toward proactive traceability and a comprehensive speech security defense ecosystem.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
    • Sound Source Localization and Tracking Based on Deep Learning: A Survey

      2026(2):371-396. DOI: 10.16337/j.1004-9037.2026.02.006

      Abstract (12) HTML (18) PDF 1.94 M (24) Comment (0) Favorites

      Abstract:Sound source localization and tracking constitute an important means for machine hearing to acquire spatial information. With the growing adoption of multi-microphone devices in applications such as speech interaction, conference systems, and acoustic monitoring, the demand for stable estimation of a sound source’s direction and position in complex acoustic environments continues to increase. Accordingly, this paper presents a systematic review of deep-learning-based techniques for sound source localization and tracking. Existing review articles have mainly focused on sound source localization, whereas deep-learning-based sound source tracking has not yet been systematically reviewed. To fill this gap, this paper presents a unified analysis of both sound source localization and tracking. First, the fundamental problem formulation and the framework of traditional approaches are outlined. Then, from the perspectives of input representation, model architecture, and learning objectives, the main lines of deep learning methods are introduced with respect to feature design, network modeling, and training strategies. Next, commonly used datasets, experimental settings, and evaluation metrics are summarized, and key considerations for comparing results under different conditions are discussed. Finally, the reviewed techniques are summarized and potential future research directions are outlined.Highlights:1.This paper systematically reviews research on deep learning-based sound source localization and tracking, with particular emphasis on the technological evolution from instantaneous spatial localization to continuous trajectory estimation.2.The development of mainstream methods is summarized from the perspectives of input representation, network architecture, and temporal modeling, covering typical deep learning models such as CNN, RNN/LSTM, CRNN, and Transformer.3.The performance advantages of deep learning-based methods in noisy, reverberant, multi-source-overlapping, and dynamic scenarios are summarized, and future directions are identified, including robustness in real-world environments, generalization ability, and lightweight deployment.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
    • Research Progress in Target Audio Processing Methods Based on Pre-trained Models

      2026(2):397-415. DOI: 10.16337/j.1004-9037.2026.02.007

      Abstract (10) HTML (25) PDF 3.90 M (21) Comment (0) Favorites

      Abstract:Target audio processing aims to recover or identify a specific target sound source from mixed audio signals based on user-provided cues. As an important branch of audio signal processing and machine listening, it plays a vital role in a wide range of applications, including human-computer interaction, smart office environments, assistive technologies, and multimedia forensics. In recent years, the emergence of large-scale pre-trained models has opened up new possibilities for target audio processing by significantly improving representation learning, cross-modal understanding, and adaptation to low-resource conditions. This paper presents an overview of the recent research progress made by our team in this area, with particular emphasis on the integration of pre-trained models into target audio processing frameworks. First, we review the research status of several related tasks, including target speaker automatic speech recognition, speech extraction, target audio extraction, and sound source separation, and introduce representative pre-trained models such as Whisper and contrastive language-audio pretraining (CLAP) together with parameter-efficient fine-tuning strategies. Focusing on the tasks of target audio extraction and target speaker recognition, we then summarize our recent studies, including a contrastive-learning-based multimodal query method for target audio extraction, a language-queried target audio extraction method that removes the reliance on paired training data, a multitask-learning-based method for target speaker speech extraction, and a prompt-tuning-based method for target speaker automatic speech recognition. These studies have achieved substantial advances in multimodal generalization, reduction of labeled-data dependence, preservation of target semantic information, and parameter-efficient model adaptation. We further show that the combination of pre-trained models and task-oriented fine-tuning provides an effective pathway toward more robust and flexible target audio processing systems. Finally, we discuss several future research directions, including improving inference efficiency, promoting deeper multimodal fusion, enhancing open-domain generalization, and developing universal foundation models for target audio processing.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
      • 15+1
    • Medical Imaging-Pathology-Genomic Fusion and Its Applications in Clinical Diagnosis and Treatment

      2026(2):416-438. DOI: 10.16337/j.1004-9037.2026.02.008

      Abstract (7) HTML (21) PDF 2.56 M (19) Comment (0) Favorites

      Abstract:Medical imaging, pathology, and genomics respectively provide information on tumor spatial-morphological phenotypes, histopathological architecture, and molecular mechanisms. Single-modal approaches are constrained by scale discrepancies, sampling biases, and cross-center domain shifts, limiting their support for clinical decision-making. For precision oncology, imaging-pathology-genomics integration aims at semantical alignment and consistency validation among macroscopic imaging, microscopic histological, and mechanistic molecular evidence. This review systematically examines the field via fusion methodologies and clinical applications. We discuss the clinical advantages of multimodal integration, summarize key fusion paradigms, and emphasize its clinical necessity. In applications, focusing on completing the clinical evidence chain, we summarize its advantages in differential diagnosis, molecular subtyping, surgical planning, treatment response stratification, and systematic decision output, highlighting how integration turns predictive results into verifiable, actionable clinical decisions via cross-modal validation. Finally, we discuss emerging trends: spatial omics with multi-region sampling, longitudinal tumor evolution modeling, multimodal foundation models, and multi-center collaborative validation. We propose clinical translation recommendations and a utility evaluation system, offering a roadmap for next-generation intelligent multimodal systems in precision oncology.Highlights 1. Imaging-pathology-genomics integration completes the clinical evidence chain by fusing macroscopic, microscopic, and molecular evidence. It transforms fragmented observations into cohesive tumor biology understanding and enables verifiable, actionable decisions, beyond simple feature aggregation.2. Fusion paradigms integrate into oncology workflows: differential diagnosis, molecular subtyping, surgical planning, treatment response stratification, and standardized reporting integrating multimodal predictions into decisions.3. Emerging advances shape next-generation systems: spatial omics with multi-region sampling, longitudinal modeling, multimodal foundation models, and multi-center validation. Combined with translation guidelines and evaluation systems, they provide a deployment roadmap.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
    • Foundation Model-Driven Paradigms in Brain-Computer Interface Encoding and Decoding

      2026(2):439-460. DOI: 10.16337/j.1004-9037.2026.02.009

      Abstract (7) HTML (26) PDF 2.83 M (34) Comment (0) Favorites

      Abstract:Brain-computer interface (BCI) establishes a mapping relationship between external stimuli and internal neural activity in the brain, providing an effective means to understand brain information processing mechanisms and achieve human-machine intelligent interaction. In recent years, foundational models have achieved breakthrough progress in various computer vision tasks, which has also propelled BCIs from task-specific models toward a general intelligence new paradigm. This paper is the first to review the latest research advances of foundational models in neural encoding and decoding for BCIs. It systematically outlines key studies and research trajectories in natural stimulus encoding-decoding, multimodal brain representation learning, and generalization studies. The analysis identifies current challenges in sample size, data heterogeneity, multimodal fusion, and model interpretability. Finally, it highlights future research directions for general-purpose BCIs. This work aims to provide a systematic reference and research insights for building general BCI models capable of handling complex cognitive scenarios.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
    • A Survey on Probabilistic Modeling of Data: From Traditional to Modern

      2026(2):461-488. DOI: 10.16337/j.1004-9037.2026.02.010

      Abstract (10) HTML (16) PDF 1.68 M (21) Comment (0) Favorites

      Abstract:Probabilistic modeling of data is the core in machine learning and modern generative AI. This survey reviews the methodological evolution from traditional statistical formulations to recent deep generative frameworks under a unified view of probability distribution learning. Representative methods are organized into three connected routes: Maximum-likelihood-based modeling, score-matching-based modeling, and flow-based modeling. On the traditional side, the survey revisits Gaussian assumptions, Gaussian mixture models, expectation-maximization (EM) algorithms, and variational inference, emphasizing how tractability-flexibility trade-offs shape model design. On the modern side, it discusses variational autoencoders (VAEs), generative adversarial net (GAN)-related generative mechanisms, diffusion probabilistic models, score-based stochastic differential equation (SDE) formulations, normalizing flows, and flow matching, with focus on objective functions, parameterization choices, and sampling dynamics. A structured comparison is provided from the perspectives of explicit likelihood, trajectory modeling, computational efficiency, controllability, and deployment stability. To bridge methodology and practice, the paper summarizes benchmark-oriented observations and application trends in image generation, video and audio synthesis, inverse problems, and science-and-control scenarios. It also identifies practical bottlenecks, including dependence on high-quality large-scale data, limited semantic operability of latent representations, and inference latency caused by multi-step sampling. Finally, future directions are discussed around coordinated advances in path design, training objectives, numerical solvers, and guidance strategies, together with unified evaluation over quality, efficiency, safety, and compliance for trustworthy large-scale deployment.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
    • Research Progress in Information Processing Methods for Computational Optical Systems

      2026(2):489-514. DOI: 10.16337/j.1004-9037.2026.02.011

      Abstract (6) HTML (21) PDF 10.14 M (26) Comment (0) Favorites

      Abstract:Aberration is a crucial factor that restricts the imaging performance of optical systems, leading to degraded imaging effects such as blurred details and reduced resolution. Computational optical aberration correction technology breaks through the limitations of traditional hardware-based correction by integrating optical physical modeling with advanced information processing algorithms, achieving accurate and flexible compensation for imaging degradation caused by various aberrations. This paper systematically reviews the research progress of information processing methods for aberration correction based on computational optical systems. First, it expounds the theoretical foundation of aberration correction, including the wavefront aberration characterization method based on Zernike polynomials, which provides a rigorous mathematical basis for quantifying different types of aberrations, and the aberration-dominated light field degradation model that establishes the quantitative correlation between wavefront distortion and point spread function (PSF) degradation. It also introduces classical restoration algorithms such as Wiener filtering and Richardson-Lucy iteration, which lay a technical foundation for solving the imaging degradation inverse problem. On this basis, the paper analyzes the principles and practical applications of three mainstream aberration correction technologies from the perspectives of active adjustment, optical coding and pure computational restoration: Adaptive optics (AO) realizes real-time dynamic compensation of dynamic wavefront distortion through a closed-loop system of wavefront sensing and deformable mirror adjustment; wavefront coding and coded aperture technology transform complex aberrations into computable degradation forms via artificial phase modulation in the optical front-end, realizing collaborative optimization of optical coding and back-end digital decoding; phase retrieval and blind deconvolution techniques invert wavefront phase information and estimate unknown PSF only through algorithm iteration without additional hardware intervention. Finally, the paper focuses on deep learning-driven aberration correction methods, including data-driven end-to-end learning frameworks, physical model-embedded hybrid architectures and unsupervised/few-shot learning methods, and discusses their typical applications in biomedical microscopic imaging, lensless imaging, astronomical remote sensing and other frontier fields. This study clarifies the technical characteristics, advantages and limitations of various methods, and provides important theoretical reference and technical path guidance for the collaborative optimization design and practical application of computational optical imaging systems.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
      • 15+1
      • 16+1
      • 17+1
      • 18+1
      • 19+1
      • 20+1
      • 21+1
    • Deep Learning-Driven Video Coding: Methods, Progress, and Perspectives

      2026(2):515-542. DOI: 10.16337/j.1004-9037.2026.02.012

      Abstract (7) HTML (27) PDF 2.79 M (22) Comment (0) Favorites

      Abstract:With the explosive growth of video data, limited network bandwidth and high computational demands pose significant challenges for video transmission and storage. In this context, the continuous development of efficient video coding methods is of critical theoretical significance and practical value, as it ensures the delivery of high-quality video services under resource-constrained conditions. However, traditional hybrid video coding frameworks have gradually reached performance bottlenecks, making further improvements in coding efficiency increasingly difficult. In recent years, deep learning, with its powerful nonlinear fitting and representation capabilities, has provided new opportunities for optimizing video coding. This paper presents a systematic and detailed analysis of deep learning-driven video coding technologies. First, we briefly introduce video coding techniques under conventional coding frameworks and further explore the optimization of key modules, such as intra- and inter-frame prediction, through deep learning. Then, we focus on the development and key technical routes of end-to-end video coding frameworks based on deep learning, providing a comparative analysis of their performance. Finally, we highlight significant research achievements of deep learning in the field of video coding, examine the challenges and limitations of existing techniques, and offer an outlook on future trends in video coding technologies.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
    • A Survey on 3D Face Generation Technology

      2026(2):543-565. DOI: 10.16337/j.1004-9037.2026.02.013

      Abstract (10) HTML (18) PDF 10.22 M (23) Comment (0) Favorites

      Abstract:In recent years, benefiting from the rapid development of computer vision and graphics, 3D face generation technology has achieved significant breakthroughs; 3D vision technologies, such as digital avatar creation, have become increasingly popular on the internet, attracting extensive attention from both academia and industry. This generation technology synthesizes realistic multi-view face images by reconstructing geometric structures and texture details from explicit or implicit underlying representations. 3D face generation technology has sparked many related entertainment and interactive applications, such as using attribute editing technology to modify facial features via text descriptions, or using talking head generation technology to drive a static portrait to generate a talking video. However, early technologies based on linear parametric models suffered from poor realism and detail performance. And the subsequently emerging implicit neural representation technologies, while significantly improving visual quality, face the challenges of high computational costs and difficulty in achieving real-time interaction, which have brought great limitations to practical deployment and application. In order to overcome the contradiction between speed and quality, numerous scholars have conducted in-depth research on novel representations based on explicit Gaussian primitives and generative models based on probabilistic diffusion, and have proposed a series of hybrid generation methods from different perspectives. However, due to problems such as difficulty in generalizing from small sample data, incomplete modeling of full-head physical structures, and insufficient consistency in dynamic driving, there is still a long way to go for generation technology on the path to becoming fully photorealistic and capable of real-time interaction. In fact, research on 3D face generation and driving technology is still in a developmental stage, and the connotations and extensions of its technology are rapidly updating and iterating. This review provides a systematic summary of the main research works to date, along with a brief analysis of the limitations of current technologies. It also explores potential challenges and future directions for 3D face generation and application technologies, offering a guidance for future research.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
    • A Review of Deep Learning-Based Change Detection Methods for Bi-temporal Optical Remote Sensing Images

      2026(2):566-591. DOI: 10.16337/j.1004-9037.2026.02.014

      Abstract (10) HTML (29) PDF 2.88 M (23) Comment (0) Favorites

      Abstract:Bi-temporal optical remote sensing image change detection constitutes a pivotal domain within the broader field of Earth observation, aimed at systematically quantifying terrestrial surface dynamics. By conducting comparative analyses of co-registered imagery acquired over identical geographical coordinates at distinct temporal intervals, this methodology facilitates critical applications ranging from urban expansion monitoring and resource management to disaster damage assessment. The exponential expansion of remote sensing data, coupled with the precipitous maturation of deep learning paradigms, has instigated a transformative era for this discipline. Consequently, the field is witnessing a phase of rapid algorithmic iteration and profound evolutionary growth, significantly enhancing the capability to interpret complex spatiotemporal patterns. Against this backdrop, this manuscript employs a comprehensive chronological framework to systematically represent deep learning-based change detection architectures established over the past two decades. Complementing this survey, it rigorously conducts a comparative analysis, explicitly evaluating both the detection accuracy and computational efficiency of these state-of-the-art methodologies across mainstream benchmark datasets. Beyond mere algorithmic review, the paper consolidates widely utilized public datasets and essential evaluation metrics, thereby providing a standardized reference for benchmarking model performance. Furthermore, this study structurally deconstructs the comprehensive change detection pipeline into its fundamental components. Subsequently, the specific technological advancements and methodological innovations driving the evolution of each critical stage are scrutinized in granular detail to illustrate the workflow’s maturation. Ultimately, prospective research frontiers are delineated to forecast the field’s developmental trajectory. This outlook aims to serve as a roadmap, offering essential reference and guidance to steer subsequent investigations and foster continued innovation within the domain.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
    • A Review of Autonomous Localization Technologies for Unmanned Aerial Vehicles in Complex Low-Altitude Environments

      2026(2):592-619. DOI: 10.16337/j.1004-9037.2026.02.015

      Abstract (11) HTML (24) PDF 6.37 M (22) Comment (0) Favorites

      Abstract:Complex low-altitude environments are typically characterized by the superposition of multi-source interference, drastic variations in sensing conditions, and incomplete environmental information, which collectively pose significant challenges to the continuity, reliability, and integrity of autonomous localization for unmanned aerial vehicles (UAVs). In such scenarios, Global Navigation Satellite System (GNSS) signals are prone to blockage and interference, visual perception suffers from weak textures, dynamic disturbances, and abrupt illumination changes, and inertial measurements inevitably accumulate long-term drift. The coupled degradation of these sensing modalities substantially undermines the stability and robustness of localization systems. To address these challenges, this paper systematically reviews representative types of degraded low-altitude environments and analyzes key technical bottlenecks under multi-source hybrid interference, including visual feature loss, inertial error divergence, and satellite positioning performance deterioration. Building upon this analysis, the developmental trajectory of vision-based navigation and localization techniques for UAVs is comprehensively surveyed, covering visual matching methods based on satellite signals or prior maps as well as recent advances in visual simultaneous localization and mapping (SLAM). Furthermore, visual-inertial fusion modeling and perception enhancement strategies are summarized, highlighting their technical advantages in improving localization accuracy and robustness. Subsequently, multi-sensor fusion navigation frameworks and robust fusion strategies tailored for GNSS-denied or degraded environments are discussed, with particular emphasis on collaborative modeling, degradation awareness, and integrity monitoring across heterogeneous modalities, including vision, inertial sensors, LiDAR, and satellite positioning. Finally, the paper outlines future directions for data-driven multimodal adaptive navigation methods, as well as the development trends of lightweight and intelligent high-integrity navigation technologies for unmanned aerial vehicles. This survey aims to provide a systematic reference for the research and engineering implementation of highly reliable autonomous localization technologies for UAVs operating in complex low-altitude environments.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
      • 10+1
      • 11+1
      • 12+1
      • 13+1
      • 14+1
      • 15+1
      • 16+1
    • A Survey on Risks and Governance of Content Generated by Visual Generation Models

      2026(2):620-640. DOI: 10.16337/j.1004-9037.2026.02.016

      Abstract (7) HTML (27) PDF 3.91 M (20) Comment (0) Favorites

      Abstract:With breakthroughs in deep generative technologies such as diffusion models, visual generation models have achieved significant leaps in generation quality and semantic consistency, finding extensive applications in fields like artistic creation and industrial design. However, the powerful generative capability has also triggered severe content safety risks. Malicious users can induce models to generate pornographic, violent, or copyright-infringing images, posing an urgent need for the safety governance of generative AI. This paper provides a systematic review that focuses on two core adversarial tasks of T2I models: (1) Jailbreak attacks, which aim to induce models to breach safety guardrails; (2) Concept erasure, which aims to eliminate internal risk knowledge from the models. First, we establish a taxonomy of jailbreak attacks. By analyzing them across four dimensions: Technical category, perturbation strategy, query type, and adversary knowledge, we reveal the evolutionary trend of attack methods shifting from feature-space perturbations to semantic-space reasoning. Second, regarding risk governance, this paper delves into concept erasure technologies, comparatively analyzing three mainstream technical routes: Model fine-tuning, model editing, and inference guidance. We elucidate the trade-offs among erasure effectiveness, computational efficiency, and the preservation of general generation capabilities. Finally, we summarize the commonly used benchmark datasets in this field and identify the current challenges and future directions regarding adversarial robustness and multi-concept joint governance, aiming to provide theoretical references and technical guidance for building safe and controllable T2I systems.

      • 0+1
      • 1+1
      • 2+1
      • 3+1
      • 4+1
      • 5+1
      • 6+1
      • 7+1
      • 8+1
      • 9+1
Quick search
Search term
Search word
From To
Volume retrieval