基于关键词结构编码的涉案微博评价对象抽取模型
作者:
作者单位:

1.昆明理工大学信息工程与自动化学院,昆明 650500;2.昆明理工大学云南省人工智能重点实验室,昆明 650500

作者简介:

通讯作者:

基金项目:

国家重点研发计划(2018YFC0830105, 2018YFC0830101,2018YFC0830100);云南省重大科技专项计划项目(202002AD080001);云南省基础研究专项面上项目(202001AT070047, 202001AT070046)。


A Model for Extracting Evaluation Objects of Cased-Involved Microblog Based on Keyword Structured Encoding
Author:
Affiliation:

1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    涉案微博评价对象抽取旨在从微博评论中识别出用户评价的案件对象词项,有助于掌握大众对于特定案件不同方面的舆论。现有方法通常将评价对象抽取视为一个序列标注任务,但并未考虑涉案微博的领域特点,即评论通常围绕正文中出现的案件关键词展开讨论。为此,本文提出一种基于关键词结构编码的序列标注模型,进行涉案微博评价对象抽取。首先从微博正文中获取多个案件关键词,并使用结构编码机制将其转换为关键词结构表征,然后将该表征通过交互注意力机制融入评论句子表征,最后利用条件随机场(Conditional random field, CRF)抽取评价对象词项。在两个案件的数据集上进行了实验,结果表明:相较于多个基线模型,本文方法性能得以提升,验证了所提方法的有效性。

    Abstract:

    The purpose of extracting evaluation object of the microblog involved in a case is to identify the case object terms of the user evaluation from the microblog comments, which helps to grasp public thought on different aspects of a certain case. In general, the existing methods regard evaluation object extraction as a sequence labeling task, but do not take into account the domain characteristics of the microblog involved in the case, that is, comments are usually discussed around the case keywords that appear in the microblog text. For this reason, this paper proposes a sequence labeling model based on case keyword structured encoding to extract the evaluation objects of the microblog involved in the case. First of all, a number of case keywords are obtained from the text of microblogs, and the structured encoding mechanism is used to convert them into keyword structural representations. After that, the representations are integrated into the comment sentence representation through the cross attention mechanism. In the end, the evaluation target terms are extracted by the conditional random field (CRF). Experiments are conducted on the data sets of two cases. Compared with the multiple baselines, the encouraging progress validates the effectiveness of the proposed approach.

    参考文献
    相似文献
    引证文献
引用本文

王静赟,余正涛,相艳,陈龙.基于关键词结构编码的涉案微博评价对象抽取模型[J].数据采集与处理,2022,37(5):1026-1035

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-08-30
  • 最后修改日期:2022-01-27
  • 录用日期:
  • 在线发布日期: 2022-09-25