OpenAI Gym行业标杆

gym-industrial的Python项目详细描述


体育产业标杆

gym-industrial是OpenAI-Gym的Industrial Benchmark(IB)的独立Python重新实现。在

D. Hein et al., 2017 A benchmark environment motivated by industrial control problems. In IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8).

安装

pip install gym-industrial

环境

要在Gym中注册环境,只需在调用gym.make之前随时导入包。在

^{pr2}$

主要环境在健身房注册为IndustrialBenchmark-v0。IB的子动力学也被实现为健身房环境。每个人都对整个任务提出了不同的挑战。在

SystemEnvironment IDFeatures
Industrial BenchmarkIndustrialBenchmark-v0All of the following
Operational CostIBOperationalCost-v0Delayed, blurred, nonlinear rewards
Mis-calibrationIBMisCalibration-v0Partial observability, non-stationary dynamics
FatigueIBFatigue-v0Heteroscedatisc noise, self-amplifying processes

作为随机计算图的动力学

以下是工业基准子动力学的观点,加上报酬函数,即随机计算图(SCG)。在

使用的图形表示法和SCG定义取自Gradient Estimation Using Stochastic Computation Graphs。在

Definition 1 (Stochastic Computation Graph). A directed, acyclic graph, with three types of nodes:

  1. Input nodes, which are set externally, including the parameters we differentiate with respect to.
  2. Deterministic nodes, which are functions of their parents.
  3. Stochastic nodes, which are distributed conditionally on their parents. Each parent v of a non-input node w is connected to it by a directed edge (v, w).

正方形表示确定性节点,圆形表示随机节点。一种特殊类型的确定性节点,用菱形表示,表示变量是一种成本/回报,因此不是观察/状态的一部分。在

节点标签使用来自工业基准paper的符号,并对应于其中等式中的变量。在

运营成本

The sub-dynamics of operational cost are influenced by the external driver setpoint p and two of the three steerings, velocity v and gain g.

运营成本的观察被过去运营成本的卷积所延迟和模糊。在下图中,\overrightarrow{\theta}表示隐藏操作成本的过去10个值的向量\theta。在

The motivation for this dynamical behavior is that it is non-linear, it depends on more than one influence, and it is delayed and blurred. All those effects have been observed in industrial applications, like the heating process observable during combustion.

错误校准动态

The sub-dynamics of mis-calibration are influenced by external driver setpoint p and steering shift h. The goal is to reward an agent to oscillate in h in a pre-defined frequency around a specific operation point determined by setpoint p. Thereby, the reward topology is inspired by an example from quantum physics, namely Goldstone’s ”Mexican hat” potential.

为了便于演示,下面用m_{t+1}节点表示Goldstone潜在激励奖励。函数的详细信息可以在implementation或本文的附录B中找到。在

以下是一个直观的描述,摘自论文,惩罚景观和振荡动力学。在

疲劳动力学

The sub-dynamics of fatigue are influenced by the same variables as the sub-dynamics of operational cost, i.e., setpoint p, velocity v, and gain g. The IB is designed in such a way that, when changing the steerings velocity v and gain g as to reduce the operational cost, fatigue will be increased, leading to the desired multi-criterial task, with two reward components showing opposite dependencies on the actions.

以下SCG强调了疲劳动力学的复杂随机性。随机变量在本文中没有专门的方程,但是按照如下方式进行抽样(\exp表示exponential分布,\sigma,即logistic函数)。在

奖励功能

In the real-world tasks that motivated the IB, the reward function has always been known explicitly. In some cases it itself was subject to optimization and had to be adjusted to properly express the optimization goal. For the IB we therefore assume that the reward function is known and all variables influencing it are observable.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在SWT中关闭CTabItem时如何获取警告消息?   java如何从中获取文本字符串   java带有(int[][])的方法意味着什么?   java我在创建这个安卓浮动泡泡动画时做错了什么?   将边距属性作为列表项的java表抛出异常ClassCastException   java如何在Storm拓扑中测量延迟和吞吐量   java如何在javafx中序列化事件?   java访问main()之外的线程   java如何强制某些方法仅对kotlin可见   java如何使用quartzscheduler启动具有多个crontrigger的作业?   java无法使用VM选项获取转储文件:引发OOM异常时出现HEAPDUMPONAUTOFMEMORYERROR   java无法在安卓中的FTP服务器上上载文件   java RecordView未显示   java有没有办法在Eclipse中隐藏/折叠虚张声势的注释?   java如何从xml中提取xml。广州?