The shape, direction and category of text in natural scenes are varied, and scene text detection is still a challenge. In order to better separate text from non-text and accurately locate the text area in natural scene image, this paper proposes a text detection network that fuses local and global features. Multi-scale global feature fusion is realized through jump connection, and the constant residual block is improved to realize local fine-grained feature fusion, thereby reducing the loss of feature information and enhancing the strength of feature extraction in text regions. The combination of polygon offset text field and text edge information is used to local text region accurately. In order to evaluate the effectiveness of the method in this paper, multiple sets of comparative experiments are conducted on the existing classic data sets ICDAR2015 and CTW1500. The experimental results show that the method has better performance in text detection in complex scenes.