基于两步决策与ε-greedy探索的增强学习频谱分配算法
作者:
作者单位:

作者简介:

尹之杰(1992-),男,硕士研究生,研究方向:认知无线电、机器学习;汪一鸣(1956-),女,教授,博士生导师,中国电子学会高级会员,IEEE会员,苏州大学通信与信息系统学科带头人之一,研究方向:无线通信网络、认知无线电、超宽带通信等;吴澄(1978-),男,副教授,硕士生导师,研究方向:认知无线电、图像处理,E-mail:cwn@suda.edu.cn

通讯作者:

基金项目:

国家自然科学基金(61471252)资助项目。


Double-Step Decision Reinforcement Learning Spectrum Management Using ε-greedy Exploration
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在认知无线网络中,认知基站需要进行频谱管理来提升非授权用户的服务质量。基站在寻找频谱空洞分配给非授权用户的过程中,需要做出最好的选择,但极可能是局部最优解,从而造成非授权用户频繁的频谱切换和吞吐率的下降。针对此问题,本文提出基于两步决策与探索的集中式增强学习频谱分配算法。通过设计新型状态动作集,认知基站进行信道分配的两步决策,并应用探索模式,解决认知基站在增强学习过程中探索环境和利用经验进行决策的平衡问题,防止决策的局部最优,提升频谱管理的性能。仿真结果表明,该算法在提升非授权用户吞吐率以及降低频谱切换方面明显优于现有的一些频谱分配策略。

    Abstract:

    In cognitive radio network environment, the base station needs to carry out an effective spectrum management policy to guarantee the licensed user's communication and to improve the quality of service of the cognitive radio users at the same time. In the process of allocating spectrum holes to cognitive radio users, the base station faces massive passive channel switching due to the unpredictability of the licensed user and it results in the throughput of cognitive radio users' degradation. To solve this problem, this paper proposes a novel base station-cognitive base station, which contains reinforcement learning model with novel state and action sets. The cognitive base station can perform two-step decision of channel allocation, that is, whether to switch the channel for cognitive radio users and how to select the best channel if the cognitive base station decides to switch, so as to avoid excessive channel switching and improve the throughput of the cognitive radio user. Also, the performance of reinforcement learning spectrum management policy highly depends on the exploration of environment. In this paper, epsilon-greedy exploration method is used to solve the balance problem of cognitive base station in exploring the unknown environment and exploiting the existing knowledge. Simulation results show that the implementation of the epsilon-greedy in each decision step has a remarkable effect on the system performance. Also, we set up the best evaluation of a combination of two-step epsilon so that the proposed method is superior to traditional reinforcement learning spectrum allocation scheme in improving cognitive radio users' throughput and reducing channel switching.

    参考文献
    相似文献
    引证文献
引用本文

尹之杰, 汪一鸣, 吴澄.基于两步决策与ε-greedy探索的增强学习频谱分配算法[J].数据采集与处理,2018,33(6):1003-1012

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2017-03-26
  • 最后修改日期:2017-09-25
  • 录用日期:
  • 在线发布日期: 2018-12-06