Multi-shapelet : A Multivariate Time Series Classification Method Based on Shapelet
Author:
Affiliation:
College of Command and Control Engineering, The Army Engineering University of PLA, Nanjing 210007, China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
shapelet是时间序列中最具有辨识性的子序列,其一经提出就被来自各个领域的研究人员广泛研究,并在此过程中提出了许多有效的shapelet发现技术用于进行时间序列分类。然而,多变量时间序列的候选shapelet可能长度不同且变量来源不同,故很难直接对其进行比较,这对基于shapelet多变量时间序列分类方法提出了独特的挑战。为了应对这一挑战,提出了一种基于无监督表示学习和shapelet的多变量时间序列分类方法Multi-shapelet。Multi-shapelet首先使用混合模型DC-GNN(Dilated convolution neural network and graph neural network, DC-GNN)作为编码器,将不同长度的候选shapelet嵌入统一的shapelet选择空间,以进行shapelet之间的比较;其次,提出了一种新的损失函数以无监督学习方式训练该编码器,使得DC-GNN对shapelet编码得到相应的嵌入(Embedding)后,属于同类shapelet对应的嵌入之间的相对位置形成的拓扑与原空间中shapelet之间相对位置形成的拓扑之间的关系更接近于一种等比例的缩小,这对后续基于相似性的剪枝过程十分重要;最后,使用K-means聚类和模拟退火算法进行shapelet剪枝和选择操作。在UEA的18个多变量时间序列数据集上的实验结果表明,Multi-shapelet的整体精度相比于其他方法得到了显著提升。
Abstract:
Shapelet is the most identifiable subsequence in time series, which has been extensively studied by researchers from various fields since it was proposed. In this process, many effective shapelet discovery techniques have been proposed for time series classification. However, candidate shapelets of multivariate time series may have different lengths and different sources of variables, making it difficult to directly compare them, which presents a unique challenge to the classification method of multivariable time series based on shapelet. we propose Multi-shapelet, a multivariate time series classification method based on unsupervised representation learning and shapelets. Firstly, Multi-shapelet uses a hybrid model DC-GNN (Dilated convolution neural network and graph neural network) as an encoder to embed candidate shapelets of different lengths into a unified shapelet selection space for comparison between shapelets. Secondly, a new loss function is proposed to train the encoder in an unsupervised learning manner, so that after DC-GNN encodes the shapelet to obtain the corresponding embedding, the topology and the original space formed by the relative positions between the embeddings corresponding to the shapelet belonging to the same class. The relationship between the topologies formed by the relative positions of the shapelet in the middle is closer to a proportional reduction, which is very important for the subsequent similarity-based pruning process. Finally, the K-means clustering and simulated annealing algorithm are proposed to prune and select shapelets to select a set of shapelets with strong classification ability. Experimental results on 18 UEA multivariable time series datasets show that the overall accuracy of Multi-shapelet is significantly better than other methods.