Abstract:In cognitive radio network environment, the base station needs to carry out an effective spectrum management policy to guarantee the licensed user's communication and to improve the quality of service of the cognitive radio users at the same time. In the process of allocating spectrum holes to cognitive radio users, the base station faces massive passive channel switching due to the unpredictability of the licensed user and it results in the throughput of cognitive radio users' degradation. To solve this problem, this paper proposes a novel base station-cognitive base station, which contains reinforcement learning model with novel state and action sets. The cognitive base station can perform two-step decision of channel allocation, that is, whether to switch the channel for cognitive radio users and how to select the best channel if the cognitive base station decides to switch, so as to avoid excessive channel switching and improve the throughput of the cognitive radio user. Also, the performance of reinforcement learning spectrum management policy highly depends on the exploration of environment. In this paper, epsilon-greedy exploration method is used to solve the balance problem of cognitive base station in exploring the unknown environment and exploiting the existing knowledge. Simulation results show that the implementation of the epsilon-greedy in each decision step has a remarkable effect on the system performance. Also, we set up the best evaluation of a combination of two-step epsilon so that the proposed method is superior to traditional reinforcement learning spectrum allocation scheme in improving cognitive radio users' throughput and reducing channel switching.