• Volume 39,Issue 5,2024 Table of Contents
    Select All
    Display Type: |
    • Research Situation and Prospects of Multi-speaker Separation and Target Speaker Extraction

      2024, 39(5):1044-1061. DOI: 10.16337/j.1004-9037.2024.05.002

      Abstract (2246) HTML (1610) PDF 2.33 M (995) Comment (0) Favorites

      Abstract:As a cutting-edge technology in speech signal processing, speech separation has significant research value and broad application prospects. Typically, the signal captured by the microphones contains speech signals from multiple speakers, noise and reverberation. To improve the user experience and the performance of backend devices, it is necessary to perform speech separation. Speech separation originated from the well-known cocktail party problem. It aims to separate the speech signals from the mixed signal. In recent years, researchers have proposed a large number of speech separation methods, which have significantly improved separation performance. This paper systematically reviews and summarizes these methods. First, based on whether the auxiliary information of the target speaker is leveraged, speech separation is divided into two categories, i.e., multi-speaker separation and target speaker extraction. Second, these methods are introduced in detail, following the progression from conventional approaches to deep learning-based techniques. Finally, the existing challenges in speech separation are discussed and prospective research in the future are highlighted.

    • State of the Art and Prospects of Deep Learning-Based Speaker Verification

      2024, 39(5):1062-1084. DOI: 10.16337/j.1004-9037.2024.05.003

      Abstract (1256) HTML (930) PDF 1.60 M (945) Comment (0) Favorites

      Abstract:With the development of deep learning, speaker verification has made great progress. Compared with other biometric identification technologies, this technology has advantages of remote operation, low cost, easy human-computer interaction, etc., thus it shows a wide range of application prospects in the fields of public security, criminal investigation, and financial services. A systematic overview of the development lineage of deep learning-based speaker verification techniques is provided. Firstly, the development history and research status of deep learning-based speaker representation model are introduced in four aspects: Model input and structure, pooling layer, supervised loss function, and self-supervised learning and pre-training model. Then, the challenges faced by speaker verification are discussed, such as cross-domain mismatch problems like noise interference, channel mismatch and far-field speech, and the corresponding domain adaptation and domain generalization methods are outlined. Finally, the further research directions are presented.

    • A Survey on Sound Acquisition Theories and Application Methods of Distributed Microphone Arrays

      2024, 39(5):1085-1113. DOI: 10.16337/j.1004-9037.2024.05.004

      Abstract (1835) HTML (1394) PDF 2.65 M (1097) Comment (0) Favorites

      Abstract:Over the past few decades of development, microphone array technology is becoming more mature, which has been applied to various human-machine interaction systems, e.g., video-conferencing, intelligent television, mobile telephony, hearing aids. However, in realistic noisy or distant interaction scenarios, the sound acquisition quality (SAQ) of conventional topology-constrained microphone arrays cannot be guaranteed. With the wide range of using wireless intelligent terminal devices, distributed microphone array (DMA) or so-called wireless acoustic sensor network (WASN) provides more possibilities of improving the SAQ for speech interaction systems in complex and open domains, and shows a superiority in array organization, application experience and scene coverage. Recently, DMA exhibits a good applicable potential in many speech interaction tasks, which almost cover the tasks that conventional microphone array can handle. This survey will mainly summarize some existing important sound acquisition theories and application methods of DMA, including principles of array organization, utility evaluation of microphone nodes and the application methods in combination of downstream speech tasks. Finally, we will briefly discuss some key challenges and developing trends of the road of DMA to practical usages.

    • Improved Degenerate Unmixing Estimation Technique Algorithm Based on Two-Step Single-Source Point Screening

      2024, 39(5):1114-1125. DOI: 10.16337/j.1004-9037.2024.05.005

      Abstract (681) HTML (477) PDF 4.09 M (513) Comment (0) Favorites

      Abstract:The degenerate unmixing estimation technique (DUET) algorithm is a typical underdetermined blind source separation algorithm. However, as a binary time-frequency mask-based method, DUET erroneously results in some interference signals retention. This paper proposes an improved DUET algorithm based on two-step single-source point screening. The cosine angle algorithm is used for the first-step single-source point screening, and then a similarity calculation method is employed for the second-step single-source point screening. After obtaining more accurate target and interference signals through two-step single-source point screening, the filter designed to cancel the interference signals replaces the binary time-frequency mask of DUET, achieving interference signal suppression and target signal extraction. Simulation results show that the proposed method has good performance in both positive definite and underdetermined blind source separation.

    • Kalman-Filter-Based Acoustic Feedback Cancellation with State Detection Model for Fast Recovery from Abrupt Path Changes

      2024, 39(5):1126-1134. DOI: 10.16337/j.1004-9037.2024.05.006

      Abstract (1100) HTML (563) PDF 1.89 M (597) Comment (0) Favorites

      Abstract:The partitioned block frequency domain Kalman filter (PBFDKF) has been applied in acoustic feedback cancellation (AFC) due to its fast convergence and low steady-state misalignment. However, the Kalman filter at steady state might encounter the issue of deadlock when the feedback path experiences abrupt changes, exhibiting suboptimal tracking capabilities. In this paper, the Kalman-filter-based AFC with state detection model (KFSD) is proposed to effectively improve the robustness against abrupt path changes. The narrowband energy of the microphone signal, the residual signal and the update of Kalman filter are used as the input to the state detection model. And then, the state detection results are merged into the state estimation error covariance matrix of the Kalman filter, achieving better re-convergence performance against the abrupt path changes. Experimental results demonstrate the superior performance of the proposed KFSD algorithm, showcasing a high true positive rate, a low false alarm rate, and a short state detection latency. These advantages lead to faster re-convergence and enhanced acoustic feedback cancellation..

    • Multi-channel Linear Prediction for Speech Dereverberation Using Cross-Band Filters and Sparse Priors

      2024, 39(5):1135-1146. DOI: 10.16337/j.1004-9037.2024.05.007

      Abstract (662) HTML (686) PDF 3.06 M (520) Comment (0) Favorites

      Abstract:The multi-channel linear prediction (MCLP) is one of the most popular speech dereverberation methods. The band-to-band spectral subtraction model has been adopted by most existing studies to obtain the desired speech signal in each frequency band, but it neglects the interaction between different frequencies. This paper proposes a MCLP-based speech dereverberation method using the cross-band spectral subtraction model instead of the widely adopted band-to-band spectral subtraction model. The proposed model employs cross-band filters to account for the interactions between different frequencies. We model the desired signal using the complex generalized Gaussian (CGG) distribution. Compared with the Gaussian distribution, the CGG distribution can capture the sparse nature of speech signals using a suitable shape parameter. Within the maximum likelihood estimation framework, the speech dereverberation problem is formulated as an optimization problem involving the band-to-band and cross-band filters. An optimization algorithm with guaranteed convergence is derived based on the majorization-minimization method. A series of speech dereverberation experiments under various reverberation times, different channel numbers and different source-to-microphone distances demonstrate that the proposed method significantly outperforms traditional methods in terms of dereverberation performance.

    • Data-Driven Decision Support System Construction Based on Graph Model for Conflict Resolution

      2024, 39(5):1147-1162. DOI: 10.16337/j.1004-9037.2024.05.008

      Abstract (1015) HTML (433) PDF 3.87 M (582) Comment (0) Favorites

      Abstract:Nowadays, conflicts frequently occur due to issues such as economy, technology, geostrategy, and international order, and the scale of conflicts is shifting from individual and small-scale group conflicts to complex large-scale group conflicts. Compared to conflicts between individuals, large-scale group conflicts have a longer duration and wider scope, which have a negative impact on China’s social order and economic development. Graph model for conflict resolution(GMCR) has been widely applied to water resources, environmental management and economic policy as a theoretical tool for solving conflict problems, and has achieved good results. However, the increasing number of participants and strategies in conflict have led to an exponential increase in situation, and the uncertainty of the subject’s preference behavior is enhanced, so the traditional decision support system GMCRⅡ is difficult to solve such complex conflicts. Based on the algebraic expression of strength preference conflict analysis theory, this paper designs a conflict analysis WEB system SP-GMCRDSS based on .NET platform, including four modules: feasible state generation, state transition setting, strength preference sequence generation and stability analysis engine. Compared with existing systems, SP-GMCRDSS can more efficiently assist conflict analysts in solving large and complex data-driven conflicts. The text mining technology is used to extract strategy, which can assist analysts to determine the input of decision support system, and reduce the subjectivity of model building. Finally, modeling, solving, and analysis functions of the system are demonstrated through the case “Lanzhou Water Pollution Conflict Event”.

    • Distributed Mining Algorithm for Co-movement Patterns in Spatio-Temporal Trajectory Streams

      2024, 39(5):1163-1181. DOI: 10.16337/j.1004-9037.2024.05.009

      Abstract (646) HTML (471) PDF 2.69 M (466) Comment (0) Favorites

      Abstract:Mining co-movement patterns from trajectory streams refers to discovering groups of moving objects with same behaviors at the same time, which is essential for transportation logistics, epidemic prevention and control and so on. However, the existing research faces difficulties in responding quickly to large-scale trajectory data streams. Therefore, this paper proposes a novel distributed sliding window algorithm for mining co-movement patterns from spatio-temporal trajectory streams. The algorithm employs a sliding window computing model instead of a snapshot computing model, and utilizes incremental updates instead of re-computing, making it more suitable for handling unbounded and rapidly arriving trajectory data streams. The proposed algorithm demonstrates superior performance in terms of efficiency and effectiveness. Secondly, to address the issue of load imbalance in distributed stream processing systems, this paper proposes an adaptive multi-level dynamic data partitioning strategy. This strategy can adapt to the dynamic changes in trajectory stream data, continuously monitor the system load in real-time, and make appropriate adjustments based on the degree of load imbalance. Finally, this paper implements the above functions on the Flink distributed big data processing platform and uses real data sets for experiments. Comprehensive empirical study demonstrates that the proposed algorithm has faster response speed and lower delay than the baseline method.

    • Terrain-Adaptive Motion Imitation Based on Multi-task Reinforcement Learning

      2024, 39(5):1182-1191. DOI: 10.16337/j.1004-9037.2024.05.010

      Abstract (768) HTML (405) PDF 1.74 M (534) Comment (0) Favorites

      Abstract:Terrain adaptive ability is the basis for the stable movement of agents under complex terrain conditions. Due to the complexity of the dynamical systems of these agents, such as humanoid robots, it is usually difficult for traditional inverse dynamics methods to have such ability. Recent research has used the advantages of reinforcement learning in solving sequential decision-making problems to train agents to adapt to terrain. However, these single-task learning methods cannot effectively learn the correlation in various terrains. In fact, complex terrain adaptive tasks can be considered as a multi-task problem, and the relationship between sub-tasks can be measured by different terrain factors. And then, the problem of incomplete acquisition of data distribution information can be solved by mutual learning of sub-task models. Therefore, this paper proposes a multi-task reinforcement learning method. It contains an execution layer which is consist of pre-trained subtask models and a decision layer based on reinforcement learning method. Moreover, the decision layer uses soft constraints to fuse models of the execution layer. Experiments on LeggedGym terrain simulator prove that the agent trained by the method in this paper is more stable in movement and has fewer falls down on complex terrains, showing better generalization performance.

    • Unsupervised Video Person Re-identification Based on Multiple Kernel Dilated Convolution

      2024, 39(5):1192-1203. DOI: 10.16337/j.1004-9037.2024.05.011

      Abstract (725) HTML (616) PDF 3.15 M (517) Comment (0) Favorites

      Abstract:Person re-identification aims to identify specific individuals across surveillance cameras, overcoming challenges such as pose variations, occlusions, and background noise that often lead to insufficient feature extraction. This paper proposes a novel unsupervised video-based person re-identification method that utilizes multi-kernel dilated convolution to provide a more comprehensive and accurate representation of individual differences and features. Initially, we employ a pre-trained ResNet50 as an encoder. To further enhance the encoder’s feature extraction capability, we introduce a multiple kernel dilated convolution module. Enlarging the receptive field of convolutional kernels allows the network to more effectively capture both local and global feature information, offering a more comprehensive depiction of a person’s appearance features. Subsequently, a decoder is employed to restore high-level semantic information to a more fundamental feature representation, thereby strengthening feature representation and improving system performance under complex imaging conditions. Finally, a multi-scale feature fusion module is introduced in the decoder output to merge features from adjacent layers, reducing semantic gaps between different feature channel layers and generating more robust feature representations. Offline experiments are conducted on three mainstream datasets, and results show that the proposed method achieves significant improvements in both accuracy and robustness.

    • Dynamic SLAM Based on Background Restoration

      2024, 39(5):1204-1213. DOI: 10.16337/j.1004-9037.2024.05.012

      Abstract (701) HTML (462) PDF 3.14 M (568) Comment (0) Favorites

      Abstract:In the context of simultaneous localization and mapping (SLAM), the accuracy of positioning is significantly affected by interference caused by dynamic objects. This paper addresses the challenges of SLAM in dynamic environments through the removal of dynamic objects and restoration of empty regions. Semantic information is obtained using Mask-RCNN, while a polar geometry approach is employed to eliminate dynamic objects. Keyframe pixel weighted mapping enables precise recovery of void regions in both RGB and depth maps at a pixel-by-pixel level. Experimental results on the TUM dataset demonstrate an average improvement of 85.26% in pose estimation accuracy compared to ORB-SLAM2, as well as a 28.54% enhancement over DynaSLAM performance. The proposed method exhibits robust performance even in real-world scenarios.

    • Target Position Detection Based on Bidirectional Fusion of Texture and Depth Information

      2024, 39(5):1214-1227. DOI: 10.16337/j.1004-9037.2024.05.013

      Abstract (575) HTML (559) PDF 4.29 M (522) Comment (0) Favorites

      Abstract:Aiming at the problem of how to obtain accurate positional information of objects in unstructured scenes by depth cameras with limited hardware device resources, a target position detection method based on bidirectional fusion of texture and depth information is proposed. In the learning phase, two networks adopt the full-flow bidirectional fusion (FFB6D) module, the texture information extraction part introduces the lightweight Ghost module to reduce the computation of the network, and adds the attention mechanism CBAM that can enhance useful features, and the depth information extraction part extends the local features and multilevel feature fusion to obtain more comprehensive features. In the output stage, in order to improve the efficiency, the instance semantic segmentation results are utilized to filter background points, then 3D keypoint detection is performed, and finally the position information is obtained by the least square fitting algorithm. Validations are carried out on LINEMOD, Occlusion LINEMOD and YCB-Video public datasets, whose accuracies reach 99.8%, 66.3% and 94%, respectively, and the amount of parameters is reduced by 31%, showing that the improved position estimation method can canreduce the number of parameters while guaranteeing the accuracy.

    • Detection and Classification of Banded Carbide in Steel Based on Improved Cascade R-CNN

      2024, 39(5):1228-1239. DOI: 10.16337/j.1004-9037.2024.05.014

      Abstract (751) HTML (728) PDF 4.23 M (597) Comment (0) Favorites

      Abstract:In the steel industry, carbide is a vital constituent, whose distribution in steel materials holds significant reference value for evaluating steel quality. However, the current detection methods for carbide in steel bars primarily rely on manual inspection, which is costly and lacks stability. This study introduces advanced deep learning techniques from the domain of artificial intelligence, which collects and annotates 3 192 high quality images of banded carbides on steel bars, alongside 11 complete samples to create a banded carbide dataset on object detection for steel bars (BCDOD). Common deep learning methods for object detection are applied to the dataset through experimental analysis. With a focus on the specific characteristics of the application scenario and data, the cascade R-CNN model is enhanced with rotation data augmentation, improvement to the Focal Loss function and negative sample fine-tuning, resulting in performance improvement. The achieved average precision reaches 96%, with 100% recognition accuracy on complete sample data, showcasing promising outcomes that address the existing gap in artificial intelligence technology within the field of carbide metallographic detection.

    • Medical Image Segmentation Method with Integrated Self-attention

      2024, 39(5):1240-1250. DOI: 10.16337/j.1004-9037.2024.05.015

      Abstract (926) HTML (805) PDF 2.15 M (635) Comment (0) Favorites

      Abstract:Aiming at the limitations of the UNet architecture in capturing local features and preserving edge details in medical image segmentation, this paper presents an improved UNet algorithm integrating self-attention mechanism. The proposed algorithm is based on traditional encoder-decoder structure, incorporating a multi-scale convolution (MSC) block for multi-granularity feature extraction, and a convolution mixer attention (CMA) block, which combines the modeling of local features by convolutional layers with global contextual modeling by self-attention layers. In the segmentation task of BUSI and DDTI datasets, compared with the existing classical network architecture, a large number of experimental data verify the excellent segmentation ability of the model. Additionally, Statistical data analysis and ablation studies further confirm the effectiveness of the MSC and CMA modules. This research provides an innovative approach for high-precision medical image segmentation, holding significant theoretical and practical implications for enhancing the accuracy and efficiency of medical diagnoses.

    • Robust Optimization Design for Multicast Transmission in IRS-Aided Cognitive Satellite and Terrestrial Network

      2024, 39(5):1251-1259. DOI: 10.16337/j.1004-9037.2024.05.016

      Abstract (654) HTML (377) PDF 1.40 M (544) Comment (0) Favorites

      Abstract:To improve spectrum efficiency, this paper proposes a robust multicast transmission algorithm for intelligent reflecting surface (IRS) aided cognitive satellite and terrestrial network (CSTN). Specifically, the satellite uses multicast technology to serve multiple primary users, while the terrestrial base station (BS), sharing spectrum resources with the satellite network, serves direct users and blocked users through space division multiple access technique and intelligent reflecting surfaces, respectively. Then, a joint optimization problem is formulated to minimize the BS transmit power, while satisfying the outage constraints of both the signal-to-interference-plus-noise ratio of terrestrial users and the interference power of the primary users. To address this nonconvex problem, the nonconvex outage constraint is first transformed into a deterministic form with the assistance of the cumulative distribution function of the exponential distribution. Then, a robust beamforming algorithm combining alternating optimization with semi-positive definite relaxation is proposed to obtain a solution with better performance. Computer simulation results demonstrate the robustness and superiority of the proposed algorithm.

    • Direction-of-Arrival Estimation for Hybrid mMIMO Systems via Sparse Bayesian Learning

      2024, 39(5):1260-1270. DOI: 10.16337/j.1004-9037.2024.05.017

      Abstract (588) HTML (362) PDF 820.57 K (499) Comment (0) Favorites

      Abstract:The direction-of-arrival (DOA) estimation is the premise of beamforming for hybrid massive multiple-input multiple-output (mMIMO) systems. The subspace methods based on covariance matrix reconstruction suffer from a large performance loss under the conditions of correlated signals and limited snapshots. To address the above challenges, this paper proposes a DOA estimation method for hybrid mMIMO systems via sparse Bayesian learning (SBL). It can be seen that the problem of DOA estimation for hybrid mMIMO systems is transformed into the issue of sparse signal recovery, bypassing the spatial covariance matrix reconstruction and avoiding the performance loss caused by the subspace methods. By using variational Bayesian inference (VBI), unknown parameters are estimated adaptively, which significantly improves the robustness of noise and correlated signals and enhances the performance of DOA estimation in the case of limited snapshots. Numerical simulation results verify the superiority of the proposed method.

    • Trajectory Optimization Scheme Based on Dynamic Interference in UAV Data Collection System

      2024, 39(5):1271-1286. DOI: 10.16337/j.1004-9037.2024.05.018

      Abstract (740) HTML (482) PDF 3.27 M (563) Comment (0) Favorites

      Abstract:Aiming at the dynamic interference problem in UAV data collection, this paper proposes a real-time optimization scheme for UAV flight trajectory. In the case of limited collection distance, by optimizing the UAV flight trajectory, the energy consumption of the UAV in the limited mission time is minimized. In order to avoid interference, the scheme is divided into two stages: initial trajectory planning and online trajectory optimization. In the initial trajectory planning stage, offline planning is carried out according to trajectory cost and corner energy consumption without considering interference; in the online trajectory optimization stage, on the basis of the initial trajectory, dynamic interference is considered, and an interference localization algorithm based on Markov prediction model is designed and the interference potential field is also proposed to optimize the initial trajectory. Simulation analysis shows that the proposed scheme can effectively improve the anti-interference performance of UAV communication and improve the UAV data collection ability.

    • A Transmission Scheme for Cooperative-IRS-Aided CoMP-NOMA Networks

      2024, 39(5):1287-1296. DOI: 10.16337/j.1004-9037.2024.05.019

      Abstract (567) HTML (357) PDF 1.21 M (479) Comment (0) Favorites

      Abstract:To address the uplink transmit power minimization problem for multi-cell scenarios, this paper proposes an uplink transmission scheme for the coordinated multiple point-nonorthogonal multiple access (CoMP-NOMA) system with the collaboration of multiple intelligent reflecting surfaces (IRSs). Specifically, a couple of IRSs are deployed at the cell-center and the cell-edge respectively, to improve the transmission quality for both the cell-center and the cell-edge users, in which the inter-IRS reflection between the cell-center and the cell-edge IRSs is considered. To solve the formulated power minimization problem, the relation between the power allocation coefficients and the phase shifts is developed. Further, the joint optimization problem of power allocation and phase shift is converted into a pure phase shift determination problem, which is transformed to a series of one-dimensional search problems by using the sequential rotation method. Simulation results demonstrate that the proposed solution significantly outperforms other benchmark schemes in terms of transmit power consumption under the same simulation setups.

    • Optimal Power Allocation Scheme for Indoor Visible Light Communication Based on NOMA

      2024, 39(5):1297-1308. DOI: 10.16337/j.1004-9037.2024.05.020

      Abstract (644) HTML (421) PDF 887.79 K (508) Comment (0) Favorites

      Abstract:In multi-user downlink indoor visible light communication system based on non-orthogonal multiple access technology (VLC-NOMA), an iterative power allocation scheme based on weighted sum-rate maximization is proposed to solve the problem of the conflict between sum-rate and user fairness. The objective of this scheme is to maximize the weighted sum-rate, and the user fairness can be adjusted by changing the weighted factor. Since the target problem is a non-convex optimization problem, this non-convex problem is transformed into a concave problem by auxiliary variable method and convex optimization theory, then solved by the Lagrange dual method, and an iterative power allocation algorithm is designed according to the solution of the problem. The convergence of the proposed algorithm, system sum-rate and user fairness are simulated. Results show that the proposed iterative power allocation algorithm has good convergence, and VLC-NOMA system can obtain better sum-rate performance than VLC-OMA system. By adjusting the weighted factor, better system sum-rate and user fairness can be obtained than the existing power allocation scheme at the smaller expense of system sum-rate.

Quick search
Search term
Search word
From To
Volume retrieval