Zhang Feifei , Ma Zewei , Zhou Ling , Meng Lingtao
2023, 38(3):479-505. DOI: 10.16337/j.1004-9037.2023.03.001
Abstract:With the rapid development of Internet technology, the volume of different types of data has grown tremendously, such as texts and images. How to obtain valuable information from such heterogeneous but semantic related multimodal data is particularly important. Cross-modal retrieval is an essential way to meet users’ requirements for obtaining different information on the Internet, which can effectively deal with the multimodal data. In recent years, cross modal retrieval has become a hot issue in both academic and industrial area. In this paper, we make a comprehensive overview of the image-text cross modal retrieval task, including definitions, challenges, and detailed discussions about the existing methods. Specifically, we first divide the existing methods into three main categories: (1) traditional methods, (2) methods based on deep learning; and (3) Hash based representation method. Then, we introduce the commonly used cross-modal retrieval benchmarks and discuss the existing methods on these benchmarks in detail. Finally, the future development direction of image-text cross modal retrieval task is prospected.
TIAN Chang , Jia Qian , Chen Runfeng , Wang Haichao , Li Guoxin , Jiao Yutao
2023, 38(3):506-524. DOI: 10.16337/j.1004-9037.2023.03.002
Abstract:Unmanned aerial vehicle(UAV) swarms have become critical equipment for performing complex tasks due to their flexibility, low cost, and the ability to carry various sensors. Their application depends on timely and efficient communication. Therefore, the research on UAV swarm communication networks has also received widespread attention in recent years. The inherent characteristics of UAV swarms, such as high mobility, high information interaction, and low energy storage, impose various severe challenges on the management of communication resources. This paper summarizes the application scenarios, advantages, and characteristics of the UAV swarm communication network, and extracts the challenges faced by resource optimization. From the perspectives of strategies and methods, this paper summarizes the existing resource optimization schemes, and sorts out the technical difficulties, such as communication performance improvement in large-scale cluster scenarios, timely decision update in high-complex environments, and communication satisfaction improvement in multi-heterogeneous requirements. Finally, the technical direction and development prospects of the UAV swarm communication network are prospected based on the research status, potential application value and the application advantages of emerging technologies.
Chen Jiaqi , He Yulin , Huang Zhexue , Fournier-Viger Philippe
2023, 38(3):525-538. DOI: 10.16337/j.1004-9037.2023.03.003
Abstract:Gaussian mixture model (GMM) is a classic probability model, which is usually used in the field of unsupervised learning to determine the class distribution of unlabeled samples. As an important method for solving GMM parameters, the expectation-maximization (EM) algorithm determines the parameters and component coefficients by calculating the optimal solution of the GMM likelihood function. The use of EM algorithm to solve GMM has the following two defects: EM algorithm is prone to getting stuck in a local optimal solution, and the relevant parameters of the GMM basic model determined by the EM algorithm are unstable, especially for high-dimensional data. For this reason, this paper proposes a GMM solution method based on statistical-aware (SA) strategy, i.e. SA-GMM method. Starting from the estimation of the unknown probability density function of a given data set, the method establishes the correlation between kernel density estimation (KDE) technology and GMM. To avoid the selection of KDE’s over-smoothing bandwidth, the goal is to simultaneously minimize the empirical risk between KDE and GMM and the structural risk of KDE’s bandwidth. The experiments on 11 standard probability distributions confirm the feasibility, rationality, and effectiveness of SA-GMM. And it is also shown that the proposed SA-GMM method can obtain the better performance on probability density function estimation than EM-based GMM and its variant.
YU Ying , ZHANG Zhiqiang , QIAN Jin , WAN Ming
2023, 38(3):539-548. DOI: 10.16337/j.1004-9037.2023.03.004
Abstract:Multi-label feature selection is an important research component in the field of multi-label learning. Existing multi-label feature selection methods mainly measure the importance of each feature based on the dependency between features and labels, and the redundancy among features. Then, feature ranking is performed based on feature importance, often ignoring the influence of label relationships on feature importance. To solve this problem, a multi-label feature selection algorithm based on label complementarity(MLLC) is designed, which introduces neighbourhood mutual information. The algorithm takes dependency, redundancy and label relationships as the evaluation elements of feature importance. And then it redesigns the feature importance evaluation function based on these three elements, so as to select features with stronger discriminative power and achieve better classification performance. Finally, the effectiveness and robustness of the algorithm are verified on six classical multi-label datasets.
ZHANG Minghua , WU Xuan , SONG Wei , MEI Haibin , HE Qi , SU Cheng
2023, 38(3):549-564. DOI: 10.16337/j.1004-9037.2023.03.005
Abstract:Hyperspectral images are usually contaminated by Gaussian noise, impulse noise, dead lines and stripes. So, denoising is an essential step. The existing denoising methods based on low-rank characteristics introduce spatial information to improve the noise reduction effect. But because they often only use local similarity or non-local self-similarity, it has poor removal effect of sparse noise with structural information in the spectral dimension. Therefore, we propose a hyperspectral image denoising method based on superpixel block clustering and low-rank characteristics. The method realizes the adaptive partition and clustering of blocks, and makes full use of the non-local spatial self-similarity while retaining the local details. The experiments show that the same object block composed of clustered superpixel blocks has a good spatial-spectral dual low-rank attributes. Firstly, a superpixel segmentation method is applied to hyperspectral images, and the superpixel blocks are clustered to obtain the same object blocks. Secondly, the low-rank matrix restoration model is established and solved, and finally the denoised image is obtained. We conduct experiments on simulated data and real data respectively, and compare with other methods based on low-rank characteristics. The results show that this method has better denoising performance for mixed noise, especially sparse noise with structural information.
WANG Jinhua , ZHOU Fei , BAI Menglin , SHU Haofeng
2023, 38(3):565-573. DOI: 10.16337/j.1004-9037.2023.03.006
Abstract:Video-based person re-identification (Re-ID) is to match a video track with a clipped video frame, so as to recognize the same pedestrian under different cameras. However, due to the complexity of the real scene, the collected pedestrian trajectories will have serious appearance loss and dislocation, and the traditional 3D convolution will no longer be suitable for the video pedestrian re-identification task. Therefore, a 3D feature block reconstruction model(3D-FBRM) is proposed, which uses the first feature map to align subsequent feature maps at the level of horizontal blocks. In order to fully mine the time information of the trajectory under the premise of ensuring the quality of the features, a 3D convolution kernel is added after the FBRM, and it is combined with the existing 3D ConvNets. In addition, a coarse-to-fine feature block reconstruction network(CF-FBRNet) is introduced, which not only enables the model to perform feature reconstruction in two different scales of spatial dimensions, but also further reduces computational overhead. Experiments show that the CF-FBRNet achieves state-of-the-art results on the MARS and DukeMTMC-VideoReID datasets.
SHEN Tao , JIN Kai , SI Changkai , ZHENG Jianfeng , LIU Yingli
2023, 38(3):574-585. DOI: 10.16337/j.1004-9037.2023.03.007
Abstract:An improved model of class attention network (CA-Net) incorporating a class attention block (CAB) is proposed to extract the primary silicon regions of the microscopic images of Al-Si alloys in this paper. The correlation information of each channel to each class is calculated from the feature map by class attention block, and the correlation information of different classes is fused to generate attention weights for correlating the weights of feature channels with their contributions to the class in the task, thus the representation of important features is enhanced and the interference of irrelevant features is suppressed. Experiments are conducted on the Al-Si alloy microscopic image dataset, and the proposed method obtains results of 94.82%, 90.16%, 94.54%, 98.80%, and 97.97% for Dice coefficient, Jaccard similarity, sensitivity, specificity, and segmentation accuracy, respectively. The proposed CA-Net can effectively improve the segmentation effect of the primary silicon region in Al-Si alloy microscopic images compared with CCNet, SPNet, TA-Net, and other methods.
2023, 38(3):586-597. DOI: 10.16337/j.1004-9037.2023.03.008
Abstract:To solve the problem that the regional active contour model cannot effectively segment weak targets, a regional active contour model with local entropy constraints is proposed for image segmentation. Firstly, the image is divided into two feature regions based on local entropy information. Then a local entropy binary fitting energy is constructed by using local entropy feature information, and finally a level set evolution equation is obtained, which is combined with a region-scalable fitting (RSF) model. The model considers the clustering characteristics of the gray distribution and the statistical information of the local area of the image, and it is effective in handing intensity inhomogeneity, weak edge segmentation, and flexible contour initialization. Medical image experiment results verify the effectiveness of the proposed model.
Gao Zihang , Liu Zhaoying , Zhang Ting , Li Yujian
2023, 38(3):598-607. DOI: 10.16337/j.1004-9037.2023.03.009
Abstract:To improve the segmentation accuracy of infrared ship target, we present an adversarial domain adaptation network for infrared ship target segmentation (ISADA), where the labeled visible ship images are used as the source domain and the unlabeled infrared ship images as the target domain. To address the issue of style difference between the two domains, we preprocess the visible images of the source domain in turn with graying and whitening to convert them into the images with the style of the target domain. For the infrared images in the target domain, we optimize them with a denoising network. Furthermore, to solve the matter of limited receptive field of the discriminative network, we design a discriminative network based on atrous convolution. Finally, for the problem of low confidence of the target domain prediction images, the information entropy of the target domain prediction images is added to the adversarial loss. The experimental results on the datasets composed of visible and infrared ship images is superior than the state-of-the-art methods, which demonstrates the effectiveness of the proposed method.
2023, 38(3):608-615. DOI: 10.16337/j.1004-9037.2023.03.010
Abstract:Long-term detection and evaluation of electrocardiogram (ECG) signals is crucial for the diagnosis and prevention of cardiovascular disease. However, the detection of ECG signals usually needs to install electrodes on the patient, which can easily cause discomfort to the subject, and the scope of application is thus limited. In contrast, pulse wave signals detected by photoplethysmography (PPG) not only contains rich cardiovascular physiological and pathological information, but also is easy to be measured. Considering the inherent mapping relationship between PPG and ECG signals, a model of transferring PPG to ECG signals based on generative adversarial network (GAN) is proposed. The generator network is composed of the Unet model, the structure of Unet++ is referenced in the feature map fusion, and the discriminator network is composed of a convolutional neural network. During the training process, gradient penalty is utilized to increase the stability of the model. The experiment is conducted based on public datasets. By comparing the processing results of a sample of 53 subjects, the root mean square error (RMSE), Pearson correlation coefficient (ρ) and Fréchet distance (FD) of the ECG signal generated by the new model are improved by 3.4%, 5.5% and 0.4%, respectively, proving that the new model has better PPG-ECG transfer effect.
Zhang Kai , Men Changqian , Wang Wenjian
2023, 38(3):616-628. DOI: 10.16337/j.1004-9037.2023.03.011
Abstract:Kernel method transforms the linear non-separable problem in low-dimensional space into the linear separable problem in high-dimensional space. It is widely used in a variety of learning models. However, the existing kernel selection methods have low computational efficiency and high time cost in large-scale data. Aiming at above problems, this paper introduces the random Fourier feature to transform the original kernel feature space into another relatively low dimensional explicit random feature space. The theoretical analysis of the upper bound of the kernel approximation error and the upper bound of the error of training the learning model in the kernel approximation random feature space are given. The convergence consistency of kernel approximation and the relationship between error upper bound and kernel approximation parameters are obtained. Moreover, the optimal model parameters are selected based on random Fourier feature space, which can avoid the large-scale search for the optimal original Gaussian kernel model parameters, so as to greatly reduce the time cost required for the selection of the original Gaussian kernel model. Experiments show that the error upper bound proved in this paper is controlled by the kernel approximation parameters. The optimal model selected by the kernel approximation has good performance compared with the original Gaussian kernel function model, and the model selection time is greatly reduced compared with the grid search method.
Chen Huafeng , Dong Yongquan , Yang Haolin , Zhang Guoxi
2023, 38(3):629-642. DOI: 10.16337/j.1004-9037.2023.03.012
Abstract:Truth discovery is one of the challenging research hotspots in the field of data integration. Traditional methods use the interaction between data sources and values to infer the truth, which lack sufficient feature information. Deep learning-based methods can effectively perform feature extraction, but their performance depends on a large number of manual annotations, and it is difficult to obtain a large number of high-quality truth labels in practical applications. To overcome these problems, this paper proposes an unsupervised truth discovery method based on multi-feature fusion(MFUTD). First, ensemble learning is used to label truth without supervision. Then, the pre-training Bert model and the one-hot coding method are used to obtain the semantic features and interactive features of the values. Finally, the initial training set is constructed by fusing multiple features of the values and using their “truth” labels to train the truth prediction model by self-training. Experimental results on two real data sets show that the proposed method has the higher truth discovery accuracy than the existing methods.
Liu Jing , Qiu Ziying , Gao Maozu , Yu Donghua
2023, 38(3):643-651. DOI: 10.16337/j.1004-9037.2023.03.013
Abstract:Aiming at shortcomings of the K-means algorithm to be improved, such as selection of initial center points and the problems that abnormal points and outliers can easily affect the clustering results, this paper proposes an improved K-means algorithm based on Tukey rules and optimizing initial center points selection. The proposed algorithm uses Tukey rules to construct core and non-core subsets, and divides the clustering process into two stages. At the same time, the strategy of increasing the center points one by one is implemented on the core subset to optimize the initial center points. The clustering results on 20 real-world datasets from UCI show that the proposed algorithm is better than the most popular K-means++ clustering algorithm and effectively improves the clustering performance.
Song Wang , Hu Xiang , Zhang Yuhui , Wei Wenjiang , Zhou Yashi , Kang Ao
2023, 38(3):652-664. DOI: 10.16337/j.1004-9037.2023.03.014
Abstract:This paper proposes an order dispatch algorithm of online ride-hailing platform based on mean-field multi-agent reinforcement learning with the ability to globally perceive supply-demand dynamics. Our algorithm improves the collaboration between agents in the local area by integrating multi-agent reinforcement learning with mean-field theory, and enhances the ability of agents on perceiving and optimizing the global supply-demand gap across the global area by injecting the context about global supply-demand dynamics. Besides, we built a data-driven simulator for the training and evaluation of algorithms. Extensive experiments show that in two different scenarios of a whole day and rush hour, our algorithm significantly outperforms the existing order dispatch algorithms in terms of order response rate and accumulated drivers’ income. The experimental results convincingly validate the effectiveness of our algorithm.
Tong Guoxiang , Dong Tianrong , HU Hengzhang
2023, 38(3):665-675. DOI: 10.16337/j.1004-9037.2023.03.015
Abstract:Irregular text recognition in scenes is still a challenging problem. For arbitrary shapes and low-quality text in scenes, this paper proposes a multimodal network that combines a visual attention module and a semantic perception module. The visual attention module uses a parallel attention-based approach to extract visual features of images combined with positional encoding. The semantic perception module based on weak supervised learning is used to learn linguistic information to compensate for the deficiencies of visual features. The module uses a Transformer-based variant that improves the model’s contextual semantic inference by randomly masking a character in a word for training. The visual semantic fusion module interacts information from different modalities through a gating mechanism to generate robust features for character prediction. The proposed approach is demonstrated through extensive experiments to be effective in recognizing arbitrarily shaped and low-quality scene text, and competitive results are obtained on several benchmark datasets. In particular, accuracy rates of 93.6% and 86.2% are achieved for the datasets SVT and SVTP, which contain low-quality text, respectively. Compared with the method containing only the visual module, the accuracy is improved by 3.5% and 3.9%, respectively, which fully demonstrates the importance of semantic information for text recognition.
Zhong Zhaoman , Li Heng , Yang Hong , Guan Yan
2023, 38(3):676-689. DOI: 10.16337/j.1004-9037.2023.03.016
Abstract:After the occurrence of an emergency, it is of great practical significance to accurately analyze the emotional state of netizens and guide the evolution of the emotional state of netizens to control public opinion on an emergency and maintain social stability. According to the characteristics of netizens’ comments on emergencies, a complete set of netizens’ emotional states are constructed, and different emotional sets are established from the perspectives of stakeholders and emergencies themselves. According to the transmission mode of epidemic model, the evolution models of netizens’ emotional states
CHENG Cuina , FENG Songlyu , MO Liping
2023, 38(3):690-703. DOI: 10.16337/j.1004-9037.2023.03.017
Abstract:Aiming at the shortcomings of slow convergence speed, easy to fall into local optimum and low convergence accuracy of basic harmony search (HS) algorithm, an improved HS(IHS) algorithm is proposed by combining sine cosine optimization operator, Levy flight mechanism and parameter dynamic adjustment strategy. In the improvisation stage, the algorithm first introduces a combination of sine cosine optimization operator and fine-tuning bandwidth to fine-tune the harmony vectors, makes full use of the position information of the optimal individual and the current individual, and improves the calculation accuracy and convergence speed of the algorithm.The Levy flight mechanism is then used to update the fine-tuned bandwidth to avoid the algorithm falling into local optimization and improve the global search capability. During the algorithm iteration process, adaptive dynamic adjustments are made to the storage probability, base tone fine-tuning probability and search domain of the harmony memory to further improve the convergence performance of the algorithm. The results of the performance test comparison experiment on ten reference functions show that the proposed algorithm has the stronger global search ability, the faster convergence speed and the better calculation accuracy.
SUN Linhui , ZHAO Min , WANG Shun
2023, 38(3):704-716. DOI: 10.16337/j.1004-9037.2023.03.018
Abstract:In cross-corpus speech emotion recognition, the mismatch between target domain and source domain samples leads to poor performance of emotion recognition. In order to improve the cross-corpus speech emotion recognition performance, this paper proposes a cross-corpus speech emotion recognition method based on deep domain adaptation and convolutional neural network (CNN) decision tree model. Firstly, a local feature transfer learning network based on joint constrained deep domain adaptation is constructed. By minimizing the joint difference between the target and source domains in the feature space and Hilbert space, the correlation between the two corpora is mined and the transferable invariant features from the target domain to the source domain are learned. Then, in order to reduce the classification error of confusable emotions among multiple emotions in the cross-corpus context, a CNN decision tree multi-level classification model is constructed based on the emotional confusion degree, and multiple emotions are first coarsely classified and then finely classified. The experiments are validated using three corpora, CASIA, EMO-DB and RAVDESS. The results show that the average recognition rate of the proposed cross-corpus speech emotion recognition method are 19.32%—31.08% higher than that of CNN baseline method, and the system performance is greatly improved.
SHAO Xia , ZHAI Yakun , LUO Wenyu , XU Li
2023, 38(3):717-726. DOI: 10.16337/j.1004-9037.2023.03.019
Abstract:With the development of communication technology, increasingly higher communication frequency bands are adopted. However, the electromagnetic wave diffraction capability decreases with increasing frequency. New generation communication systems become more dependent on line-of-sight propagation. Frequent beam switching is required in complex mobile scenarios, which increases excessive system overhead and delay. To address this problem, a position information-assisted grid-based beam switching method is proposed. The feature that the optimal beam pair remains constant in the presence of line of sight(LOS) path is utilized. The grid beam one-to-one correspondence and coverage distribution structure are divided. A position-beam mapping table is establised. All the next switching points and switching information are predicted based on position information and motion speed. The simulation and analysis results show that the proposed method significantly improves the spectral efficiency of the system compared with the non-grid switching method, and the proposed square hexagonal grid switching performance is better than the square grid, and the beam switching probability is reduced by 50%.It guarantees the communication quality and verifies the rationality of the grid-based beam switching method assisted by the position information.
2023, 38(3):727-740. DOI: 10.16337/j.1004-9037.2023.03.020
Abstract:With the advent of the data era, the classification of unbalanced data is receiving more and more attention. In the classification of unbalanced data, classification results are often incorrect due to an imbalance in the ratio of minority class samples to majority class ones. Therefore, we propose an oversampling algorithm based on the Bootstrap method under the maximum entropy principle. Firstly, the probability distribution of the data sample is obtaited through self-help method and optimized using the principle of maximum entropy. Secondly, a probability enhancement algorithm based on minority class sample distribution is proposed based on different abilities of minority classes to generate new minority classes. The algorithm allows the randomness of the data to be fully represented and ensures that the probability density of the minority class remains consistent before and after the data set is balanced, thus improving the effectiveness of the classification algorithm. Finally, experiments are conducted by selecting eight data sets from the UCI and KEEL databases, whose results show that the proposed algorithm is more effective than other algorithms.
Quick search
Volume retrievalYou are the visitor 
Mailing Address:29Yudao Street,Nanjing,China
Post Code:210016 Fax:025-84892742
Phone:025-84892742 E-mail:sjcj@nuaa.edu.cn
Supported by:Beijing E-Tiller Technology Development Co., Ltd.
Copyright: ® 2026 All Rights Reserved
Author Login
Reviewer Login
Editor Login
Reader Login
External Links