基于多尺度注意力和图神经网络的多模态医学实体识别研究
作者:
作者单位:

1.南京邮电大学管理学院,南京 210003;2.江苏省数据工程与知识服务重点实验室,南京 210023

作者简介:

通讯作者:

基金项目:

国家社会科学基金项目(22BTQ096);南京邮电大学1311人才计划资助。


Multi-modal Medical Entity Recognition Based on Multi-scale Attention and Graph Neural Networks
Author:
Affiliation:

1.School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;2.Jiangsu Province Key Lab of Data Engineering and Knowledge Service, Nanjing 210023, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着信息技术的快速发展,医疗健康领域中文文本、图像等多模态数据呈现出了爆发式增长。多模态医学实体识别(Multi-modal medical entity recognition, MMER)是多模态信息抽取的关键环节,近期受到了极大关注。针对多模态医学实体识别任务中存在图像细节信息损失和文本语义理解不足问题,提出一种基于多尺度注意力和图神经网络(Multi-scale attention and dependency parsing graph convolution,MADPG)的MMER模型。该模型一方面基于ResNet引入多尺度注意力机制,协同提取不同空间尺度融合的视觉特征,减少医学图像重要细节信息丢失,进而增强图像特征表示,补充文本语义信息;另一方面利用依存句法结构构建图神经网络,捕捉医学文本中词汇间复杂语法依赖关系,以丰富文本语义表达,促进图像文本特征深层次融合。实验表明,本文提出的模型在多模态中文医学数据集上F1值达到95.12%,相较于主流的单模态和多模态实体识别模型性能得到了明显提升。

    Abstract:

    With the rapid development of information technology, multi-modal data such as Chinese texts and images in the medical and health field has shown explosive growth. Multi-modal medical entity recognition (MMER) is a key step in multi-modal information extraction, and has attracted great attention recently. Aiming at the problems of image detail loss and insufficient text semantic understanding in multi-modal medical entity recognition tasks, this paper proposes a novel MMER model based on multi-scale attention and dependency parsing graph convolution(MADPG). This model introduces a multi-scale attention mechanism based on ResNet to collaborate to extract visual features fused with different spatial scales and to reduce the loss of important details of medical images. Thus the image feature representation and complementing text semantic information are enhanced. Then, the dependency syntactic structure is used to construct the graph neural network to capture the complex grammatical dependencies between words in medical texts, so as to enrich the semantic expression of texts and promote the deep integration of image text features. Experiments show that the F1 value of the proposed model reaches 95.12% on the multi-modal Chinese medical data set, and the performance of the proposed model is significantly improved compared with the mainstream single- and multi-modal entity recognition models.

    参考文献
    相似文献
    引证文献
引用本文

韩普,刘森嶺,陈文祺.基于多尺度注意力和图神经网络的多模态医学实体识别研究[J].数据采集与处理,2025,40(4):922-933

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-01-07
  • 最后修改日期:2025-03-03
  • 录用日期:
  • 在线发布日期: 2025-08-15