一种基于压缩感知和动态时间规整的信号肽特征提取新算法
作者:
作者单位:

1.江南大学理学院, 无锡, 214122;2.江南大学食品学院, 无锡, 214122

作者简介:

通讯作者:

基金项目:

国家自然科学基金青年基金 61402202;中国博士后科学基金 2015M581724;江苏省博士后科学基金 1401099C;江苏省自然科学基金青年基金 BK20150124国家自然科学基金青年基金(61402202)资助项目;中国博士后科学基金(2015M581724)资助项目;江苏省博士后科学基金(1401099C)资助项目;江苏省自然科学基金青年基金(BK20150124)资助项目。


A New Algorithm of Feature Extraction for Signal Peptide Based on Compressed Sensing and Dynamic Time Warping
Author:
Affiliation:

1.School of Science, Jiangnan University, Wuxi, 214122, China;2.School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    准确识别出信号肽对蛋白质的研究和定位有着非常重要的意义。压缩感知技术能够在保留生物序列主要信息的同时降低冗余信息,将高维信息投影到低维空间上进行特征提取。因此本文基于压缩感知技术再结合动态时间规整算法提取出新的特征向量,提出一种高鉴别性的信号肽特征提取新方法。该算法所提取的特征不但体现了信号肽中的氨基酸组成、排列顺序、结构等重要信息,还能把信号肽的不同区域在时间维度中非线性地弯曲对整,为机器学习算法提供有效的信号肽特征表达。实验结果显示,新方法提取的特征向量在3个数据集Eukaryotes, Gram+ bacteria, Gram-bacteria上的识别率分别达到99.65%, 98.05%和98.56%,并且这种方法能简单地运用到其他生物序列的识别过程中。

    Abstract:

    Identifying signal peptide accurately is significant for protein research and localization. This paper presents a new method to extract high discriminant features for signal peptide sequence. Firstly, features based on compressed sensing are extracted by projecting a high-dimensional sequence onto a low-dimensional space, which remove redundant data while preserving the important information. And then dynamic time warping (DTW) algorithm is introduced to create the new features. The features extracted by the new method can reflect the important information of amino acid composition, sequence order and structure in the signal peptide, and also can nonlinearly align the different regions of signal peptide in the time dimension. Therefore the effective feature expression of the signal peptide for machine learning algorithm is provided. Experimental results show that the recognition accuracies with the extracted features are 99.65%, 98.05% and 98.56% respectively in the three datasets Eukaryotes, Gram+ bacteria and Gram- bacteria. Moreover, the new method can be simply applied to the identification of several biological sequences.

    参考文献
    相似文献
    引证文献
引用本文

张洋俐君,高翠芳,陈卫,田丰伟.一种基于压缩感知和动态时间规整的信号肽特征提取新算法[J].数据采集与处理,2019,34(2):303-311

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2017-03-11
  • 最后修改日期:2017-10-30
  • 录用日期:
  • 在线发布日期: 2019-04-22