一种基于多尺度特征和改进采样策略的异构网络对齐方法

doi:10.16337/j.1004-9037.2021.04.016

首页 > 按月查看>2021年第4月 >779-788. DOI:10.16337/j.1004-9037.2021.04.016

一种基于多尺度特征和改进采样策略的异构网络对齐方法
DOI:
                        10.16337/j.1004-9037.2021.04.016
                    
作者:
                        
                        
                    
作者单位:太原理工大学大数据学院，晋中 030600
作者简介:
通讯作者:
基金项目:国家自然科学基金（61872260）资助项目。

A Method of Heterogeneous Network Alignment Based on Multi-scale Feature and Improved Sampling Strategy

Author:

Affiliation:

School of Big Data, Taiyuan University of Technology, Jinzhong 030600, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

网络对齐是集成不同平台数据的重要途径。利用网络表示学习得到节点表征并建立节点匹配策略是当前异构网络对齐的主流技术之一。在这类研究中，网络表示模型和计算复杂性为两大关键问题。本文提出一种基于多尺度特征建模和优化采样策略的无监督网络对齐方法。首先，提出一种不同尺度的节点特征表示，提取节点特征；然后利用网络嵌入模型获得网络的初表征，在此基础上设计了一种基于节点重要性的采样策略选择地标节点，改进随机抽样策略；建立了基于地标节点的网络节点相似关系矩阵，引入低秩矩阵近似方法进行矩阵分解，得到节点表示；最后，根据节点表示的相似性对网络进行对齐。在3个数据集上的实验结果表明，本模型优于其他基线模型。

Abstract:

Network alignment is a key way to integrate data from different platforms. Obtaining node representations by using network representation learning and establishing node matching strategies is one of the current mainstream technologies for alignment of heterogeneous networks. In this kind of research， network representation model and computational complexity are two key problems. This paper proposes an unsupervised network alignment method based on multi-scale feature modeling and improved sampling strategy. Firstly， a node feature representation with different scales is proposed to extract node features. Then， a network embedding model is used to obtain the initial representation of the network. On this basis， a sampling strategy based on node importance is designed to select landmark nodes and improve the random sampling strategy. The similarity matrix of network nodes based on landmark nodes is established， and the low rank matrix approximation is introduced. Finally， the two networks are aligned according to the similarity of node representation. Experimental results on three data sets show that the proposed model is better than other baselines.