面向教学评价的课堂视频镜头边界检测新方法
作者:
作者单位:

1.常熟理工学院计算机科学与工程学院,苏州215500;2.重庆三峡学院外国语学院,万州404100

作者简介:

通讯作者:

基金项目:

教育部供需对接就业育人项目(20220102204);江苏省教育科学“十四五”规划课题(D/2021/01/110);江苏高校哲学社会科学研究项目(2020SJA1425);苏州市图书馆学会重点项目(21-A-02);常熟理工学院高等教育研究项目(GJ1905)。


A New Shot Boundary Detection Method of Lecture Video for Teaching Evaluation
Author:
Affiliation:

1.School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou 215500, China;2.School of Foreign Languages, Chongqing Three Gorges University, Wanzhou 404100, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    课堂视频镜头边界检测对教学评价具有重要意义。针对教学视频视觉信息变化不明显、镜头边界信息不足、检测结果不利于教学评价等问题,引入注意力机制,提出了基于视觉和文本特征描述学习的课堂视频镜头边界检测方法。首先,提出了层次视觉Transformer模型学习教学评价关注的屏幕、教师和学生等感兴趣区域的视觉特征。其次,提出了层次文本Transformer模型从屏幕和语音文本中学习教学评价关注的文本特征。最后,构建基于二值交叉熵的镜头分类和边界检测损失函数。在数据集CLShots上的实验结果表明,本文方法在准确率、召回率、F1分数和平均交并比等指标比当前先进的教学镜头检测方法SBLV分别提高了23.3%、22.4%、22%和35.7%,比通用领域深度学习方法TransNet V2分别提高了13.8%、14.5%、14.3%和21.3%。

    Abstract:

    Shot boundary detection (SBD) of lecture video is of great significance to teaching evaluation (TE). This paper proposes a new SBD method to address the problems that the changes of visual information of lecture videos are subtle, only boundary information is insufficient and the detection results of current methods are not beneficial to TE. The proposed method is based on the vision and text representation learning features with attention mechanism. Firstly, the hierarchical vision transformer (HViT) model is proposed to learn the visual features from the regions of interest (ROI) such as screen projection, teacher and students. Secondly, the hierarchical text transformer (HTT) model is proposed to learn features concerned in teaching evaluation from the speech and screen text. Finally, the loss function is constructed with binary cross entropies of the shot classification and boundary detection jointly. Experimental results on CLShots dataset show that the average precision, recall, F1-score and mean intersection over union of our method are higher by 23.3%, 22.4%, 22% and 35.7% compared with those of the state-of-art method of SBLV, while higher by 13.8%,14.5%,14.3% and 21.3% compared with those of the method of TransNet V2.

    参考文献
    相似文献
    引证文献
引用本文

谢从华,罗德凤,方雨洁.面向教学评价的课堂视频镜头边界检测新方法[J].数据采集与处理,2023,38(1):174-185

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-04-10
  • 最后修改日期:2022-05-18
  • 录用日期:
  • 在线发布日期: 2023-01-25