The field of multimedia visual representation and transmission is undergoing profound transformation, with end-to-end optimized intelligent video coding technologies serving as the driving force. The compression of emerging video content represented by unmanned aerial vehicle (UAV) videos has further stimulated the development of core technologies and innovation in application scenarios. Focusing on end-to-end video coding technology and its initial exploration in UAV video coding, this study proposes a hierarchical bi-directional reference structure-based video coding method that addresses the shortcomings of existing models in motion representation efficiency and predictive coding accuracy. The targeted design introduces a parameter-shared motion codec, a bi-directional scaled motion representation method, and credible motion modeling technology, significantly improving the rate-distortion performance of UAV video compression and outperforming traditional video coding standards such as H.266/VVC. This work provides novel insights for the advancement of key intelligent video coding technologies and their practical applications, demonstrating promising potential for future deployment in UAV visual perception and related domains.