Abstract:Identifying signal peptide accurately is significant for protein research and localization. This paper presents a new method to extract high discriminant features for signal peptide sequence. Firstly, features based on compressed sensing are extracted by projecting a high-dimensional sequence onto a low-dimensional space, which remove redundant data while preserving the important information. And then dynamic time warping (DTW) algorithm is introduced to create the new features. The features extracted by the new method can reflect the important information of amino acid composition, sequence order and structure in the signal peptide, and also can nonlinearly align the different regions of signal peptide in the time dimension. Therefore the effective feature expression of the signal peptide for machine learning algorithm is provided. Experimental results show that the recognition accuracies with the extracted features are 99.65%, 98.05% and 98.56% respectively in the three datasets Eukaryotes, Gram+ bacteria and Gram- bacteria. Moreover, the new method can be simply applied to the identification of several biological sequences.