融合文本特征与词频隐因子的线性注意力文本分类
作者:
作者单位:

上海理工大学光电信息与计算机工程学院, 上海 200093

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61803264)。


Linear Attention Text Classification by Combining Text Features and Word Frequency Implicit Factors
Author:
Affiliation:

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Fund Project:

National Natural Science Foundation of China (No.61803264).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在文本分类任务中,有效地提取文本特征并提高计算效率是关键问题,但传统方法难以同时兼顾特征丰富性和计算效率。针对这一问题,本文提出了一种融合文本特征与词频隐因子的文本分类(Linear attention text classification by combining text features and word frequency implicit factors,LTTW)模型,并引入线性注意力机制来捕捉文本关键特征。模型通过非负矩阵分解(Non-negative matrix factorization,NMF)从词频矩阵中提取词频隐因子,以捕捉潜在语义信息;同时,利用预训练模型提取文本语义特征,并与词频隐因子融合,构建更为丰富的文本表示。在此基础上,采用线性注意力机制,有效捕捉全局依赖关系并提高长文本序列的处理效率。本文在公开数据集上进行了实验验证,结果显示,所提出的模型在准确性和计算效率上均优于现有主流方法,尤其在处理长序列数据时表现出显著的效率优势。研究表明,词频隐因子的引入补充了预训练模型在语义特征提取方面的不足,线性注意力机制能够在有效捕捉文本关键特征的同时提高序列处理的效率,有效提升了文本分类的效果和效率。

    Abstract:

    In text classification tasks, effectively extracting text features while improving computational efficiency is a critical challenge. However, traditional methods often struggle to balance feature richness and computational efficiency. To address this issue, this paper proposes a novel text classification model, i.e., the linear attention text classification by combining text features and word frequency implicit factors (LTTW), which introduces a linear attention mechanism to capture key features in the text. Specifically, the model leverages non-negative matrix factorization (NMF) to extract word frequency implicit factors from the term frequency matrix, capturing latent semantic information. Simultaneously, it utilizes pre-trained models to extract semantic features of the text, which are then fused with the word frequency implicit factors to construct a richer text representation. Based on this representation, the linear attention mechanism is applied to effectively capture global dependencies and enhance the processing efficiency of long text sequences. Experiments conducted on public datasets demonstrate that the proposed model outperforms mainstream methods in terms of both accuracy and computational efficiency, with particularly significant efficiency advantages when handling long sequences. The study highlights that the integration of word frequency implicit factors complements the shortcomings of pre-trained models in semantic feature extraction, while the linear attention mechanism effectively captures key textual features and improves sequence processing efficiency. Together, these contributions significantly enhance the performance and efficiency of text classification.

    参考文献
    相似文献
    引证文献
引用本文

苏湛,张旭,艾均,徐温果.融合文本特征与词频隐因子的线性注意力文本分类[J].数据采集与处理,2026,(3):795-813

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-12-04
  • 最后修改日期:2025-02-18
  • 录用日期:
  • 在线发布日期: 2026-06-10