我正在用rllib
从这个article构建一个说话人-听话人培训环境。
使用pettingzoo and supersuit
我遇到了以下错误:
NotImplementedError: Cannot convert a symbolic Tensor (default_policy/cond/strided_slice:0) to a numpy array
当试图运行我的代码时,但由于我缺乏使用这些包的经验,我不知道问题是在我的代码中还是在使用这些包,因为据推测它们足以使用rllib
。
我在最后附上我的代码,下面是有问题的一行:
agent = a2c.A2CTrainer(env="simple_speaker_listener", config=config)
我相信我已接近成功,以下是代码的其余部分:
import numpy as np
import supersuit
from copy import deepcopy
from ray.rllib.env import PettingZooEnv
import ray.rllib.agents.a3c.a2c as a2c
import ray
from ray.tune.registry import register_env
from ray.rllib.env import BaseEnv
from pettingzoo.mpe import simple_speaker_listener_v3
alg_name = "PPO"
config = deepcopy(a2c.A2C_DEFAULT_CONFIG)
config["env_config"] = None
config["rollout_fragment_length"] = 20
config["num_workers"] = 5
config["num_envs_per_worker"] = 1
config["lr_schedule"] = [[0, 0.007], [20000000, 0.0000000001]]
config["clip_rewards"] = True
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f}"
multiagent_dict = dict()
multiagent_policies = dict()
env = simple_speaker_listener_v3.env()
agents_name = deepcopy(env.possible_agents)
config = {
"num_gpus": 0,
"num_workers": 1,
}
env = simple_speaker_listener_v3.env()
mod_env = supersuit.aec_wrappers.pad_action_space(env)
mod_env = supersuit.aec_wrappers.pad_observations(mod_env)
mod_env = PettingZooEnv(mod_env)
register_env("simple_speaker_listener", lambda stam: mod_env)
ray.init(num_gpus=0, local_mode=True)
agent = a2c.A2CTrainer(env="simple_speaker_listener", config=config)
for it in range(5):
result = agent.train()
print(s.format(
it + 1,
result["episode_reward_min"],
result["episode_reward_mean"],
result["episode_reward_max"],
result["episode_len_mean"]
))
mod_env.reset()
目前没有回答
相关问题 更多 >
编程相关推荐