基于深度强化学习的不确定作业车间调度方法
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院, 南京 211106;2.软件新技术与产业化协同创新中心, 南京 210093

作者简介:

通讯作者:

基金项目:

“十四五”装备预研项目( JZX7Y20210401001801)。


Deep Reinforcement Learning Model for Job Shop Scheduling Problems with Uncertainty
Author:
Affiliation:

1.College of Computer Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing211106, China;2.Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing210093, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    作业车间调度是具有非确定性多项式(Non-deterministic polynomial,NP)难的经典组合优化问题。在作业车间调度中,通常假设调度环境信息已知且在调度过程中保持不变,然而实际调度过程往往受到诸多不确定因素影响(如机器故障、工序变化)。本文提出基于混合优先经验重放的近端策略优化(Proximal policy optimization with hybrid prioritized experience replay,HPER-PPO)调度算法,用于求解不确定条件下的作业车间调度问题。将作业车间调度问题建模为马尔科夫决策过程,设计作业车间的状态特征、回报函数、动作空间和调度策略网络。为了提高深度强化学习模型的收敛性,提出一种新的混合优先经验重放模型训练方法。在标准数据集和基于标准数据集生成的数据集上评估了提出的调度方法,结果表明:在静态调度试验中,本文提出的调度模型比现有的深度强化学习方法和优先调度规则取得了更精确的结果。在动态调度试验中,针对作业车间的工序不确定性,本文所提出的调度模型可以在合理的时间内获得更精确的调度结果。

    Abstract:

    Job shop scheduling problem (JSSP) is a non-deterministic polynomial (NP)-hard classical combinatorial optimization problem. In JSSP, it is usually assumed that the scheduling environment information is known and remains unchanged during the scheduling process. However, the actual scheduling process is often affected by many uncertain factors (such as machine failures and process changes). A proximal policy optimization with hybrid prioritized experience replay (HPER-PPO) scheduling algorithm is proposed for solving JSSPs with uncertainties. The JSSP is modeled as a Markov decision process where the state features, reward function, action space, and scheduling policy networks are designed. In order to improve the convergence of the proposed deep reinforcement learning model, a new hybrid prioritized experiential replay training method is proposed. The proposed scheduling method is evaluated on standard datasets and datasets generated based on standard datasets. The results show that in static scheduling experiments, the proposed scheduling model achieves more accurate results than existing deep reinforcement learning methods and priority dispatching rules. In dynamic scheduling experiments, the proposed scheduling model can achieve more accurate scheduling results in a reasonable time for JSSP with process order uncertainty.

    参考文献
    相似文献
    引证文献
引用本文

吴新泉,燕雪峰,魏明强,关东海.基于深度强化学习的不确定作业车间调度方法[J].数据采集与处理,2024,39(6):1517-1531

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2023-06-06
  • 最后修改日期:2023-09-28
  • 录用日期:
  • 在线发布日期: 2024-12-12