Hyperspectral image fusion algorithms based on deep learning typically stack multiple convolutional layers to learn mapping relationships, which suffer from the problems of not fully utilizing the characteristics of the task and lack of interpretability. To address these problems, this paper proposes a deep network combining deep unfolding and dual-stream networks. Firstly, an image fusion model is established using convolutional sparse coding, which maps low-resolution hyperspectral images (LR-HSI) and high-resolution multispectral images (HR-MSI) into a low-dimensional subspace. In the design of the fusion model, we consider the common information of LR-HSI and HR-MSI as well as the unique information of LR-HSI, and add HR-MSI to the model as auxiliary information. Next, the fusion model is unfolded into a learnable interpretable deep network. Finally, the dual-stream network is used to get more accurate high-resolution hyperspectral images (HR-HSI). Experiments prove that the network obtains excellent results in the hyperspectral image fusion task.