融合文本特征与词频隐因子的线性注意力文本分类
DOI:
作者:
作者单位:

上海理工大学

作者简介:

通讯作者:

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Linear attention text classification by combining text features and word frequency implicit factors
Author:
Affiliation:

University of Shanghai for Science and Technology

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在文本分类任务中,如何有效地提取文本特征并提高计算效率是关键问题,但传统方法难以同时兼顾特征丰富性和计算效率。针对这一问题,本文提出了一种融合词频隐因子与文本特征的文本分类模型,并引入线性注意力机制来捕捉文本关键特征。具体而言,模型通过非负矩阵分解从词频矩阵中提取词频隐因子,以捕捉潜在语义信息;同时,利用预训练模型提取文本语义特征,并与词频隐因子融合,构建更为丰富的文本表示。在此基础上,采用线性注意力机制,有效捕捉全局依赖关系并提高长文本序列的处理效率。本文在公开数据集上进行了实验验证,结果显示,所提出的模型在准确性和计算效率上均优于现有主流方法,尤其在处理长序列数据时表现出显著的效率优势。研究表明,词频隐因子的引入补充了预训练模型在语义特征提取方面的不足,线性注意力机制能够有效捕捉文本关键特征的同时提升了序列处理的效率,可以有效提升文本分类的效果和效率。

    Abstract:

    In text classification tasks, effectively extracting text features while improving computational efficiency is a critical challenge. However, traditional methods often struggle to balance feature richness and computational efficiency. To address this issue, this paper proposes a novel text classification model, Linear Attention Text Classification by Combining Text Features and Word Frequency Implicit Factors (LTTW), which incorporates word frequency implicit factors and textual features, and introduces a linear attention mechanism to capture key features in the text. Specifically, the model leverages Non-negative Matrix Factorization (NMF) to extract word frequency implicit factors from the term frequency matrix, capturing latent semantic information. Simultaneously, it utilizes pre-trained models to extract semantic features of the text, which are then fused with the word frequency implicit factors to construct a richer text representation. Based on this representation, the linear attention mechanism is applied to effectively capture global dependencies and enhance the processing efficiency of long text sequences. Experiments conducted on public datasets demonstrate that the proposed model outperforms mainstream methods in terms of both accuracy and computational efficiency, with particularly significant efficiency advantages when handling long sequences. The study highlights that the integration of word frequency implicit factors complements the shortcomings of pre-trained models in semantic feature extraction, while the linear attention mechanism effectively captures key textual features and improves sequence processing efficiency. Together, these contributions significantly enhance the performance and efficiency of text classification.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-12-04
  • 最后修改日期:2025-02-18
  • 录用日期:2025-02-26
  • 在线发布日期: