Recent Advances in Visual Question Answering and Reasoning
CSTR:
Author:
Affiliation:

1.School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China;2.School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the rapid development of the social media and human-computer interaction, the volume of multimedia data, such as video, image and text, has grown tremendously. Therefore, researchers have focused their attention on the multi-modal intelligence research. As an essential and fundamental research topic in the multi-modal intelligence and artificial intelligence area, some scientific research results on the visual question answering and reasoning task have been successfully implemented in the fields of human-computer interaction, intelligent medical care, and unmanned driving. This paper makes a comprehensive overview of the related algorithms of visual question answering and reasoning, meanwhile classifies and analyzes the existing methods. Firstly, we introduce the definition of the visual question answering and reasoning task, and briefly describe the main challenges of this task. Then, we summarize the existing methods that focus on attention mechanism, graph network, model pretraining, external knowledge and explainable reasoning mechanism. After that, we comprehensively introduce the common visual question answering and reasoning benchmarks and discuss the existing methods on these benchmarks in detail. Finally, we prospect future directions of the visual question answering and reasoning task.

    Reference
    Related
    Cited by
Get Citation

ZHANG Feifei, ZHANG Jianqing, QU Sijia, ZHOU Wanting. Recent Advances in Visual Question Answering and Reasoning[J].,2023,38(1):1-20.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 28,2022
  • Revised:December 09,2022
  • Adopted:
  • Online: January 25,2023
  • Published:
Article QR Code