Abstract:Multimodal Continual Learning (MMCL), as a significant research direction in the fields of machine learning and artificial intelligence, aims to achieve continuous knowledge accumulation and task adaptation through the integration of multiple modal data (such as images, text, audio, etc.). Compared with traditional single-modal learning methods, MMCL not only enables parallel processing of multi-source heterogeneous data but also effectively retains existing knowledge while adapting to new task requirements, demonstrating immense application potential in intelligent systems. This paper provides a systematic review of multimodal continual learning. Firstly, the fundamental theoretical framework of MMCL is elaborated from three dimensions: basic concepts, evaluation systems, and classical single-modal CL methods. Secondly, the advantages and challenges of MMCL in practical applications are thoroughly analyzed: despite its significant advantages in multimodal information fusion, it still faces critical challenges such as modal imbalance and heterogeneous fusion, which not only constrain the performance of current methods but also indicate future research directions. Based on this, the paper then comprehensively reviews the research status and latest advancements in MMCL methods from four main aspects: replay-based, regularization-based, parameter isolation-based, and large model-based approaches. Finally, a forward-looking perspective on the future development trends of MMCL is presented.