Structural brain network (SC) and functional brain network (FC) can reflect the changes in brain structure information caused by epilepsy from different perspectives. Currently, the fusion of two types of brain network information for auxiliary diagnosis of epilepsy has become one of the important studies in the field. However, common fusion models only fuse the information of the two types of brain networks at a single granularity, ignoring the multi-grained attribute of brain networks. This paper proposes an epilepsy identification method based on multi-modal multi-grained fusion network (MMFN), which integrates the features of the multi-modal brain network from global and local granularities to take full advantage of multi-modal brain network information. Specifically, at the local granularity, two modules (i.e., edge features fusion module and node features fusion module) are designed to reconstruct the feature maps of edge layer and node layer of two types of brain network, so that these two modes can learn features interactively. At the global granularity, a multimodal decomposition bilinear pooling module is designed to learn the joint representation of the two types of brain networks. Compared to current methods, experimental results show that the proposed method can improve the accuracy of epilepsy recognition significantly and assist doctors in the diagnosis of epilepsy.