Pybrain：训练ActionValueNetwork不合适

2024-05-12 18:42:25 发布

男 | 程序猿一只，喜欢编程写python代码。

我试图在一个简单的异或函数上训练一个ActionValueNetwork，但是结果看起来是随机的。在

""" Reinforcement Learning to learn xor function
"""
# generic import
import numpy as np
import random

# pybrain import 
from pybrain.rl.explorers import EpsilonGreedyExplorer
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners.valuebased import ActionValueNetwork, NFQ

# The parameters of your algorithm
av_network = ActionValueNetwork(2, 2) # 2 dimensions in input, 2 actions possible (1 or 0)
learner = NFQ()
learner._setExplorer(EpsilonGreedyExplorer(0.0)) # No exploration
agent = LearningAgent(av_network, learner)

# The training 
for _ in xrange(1,25): # we iterate 25 times
    for x in xrange(1,4): # batch of 4 questions.

        listxor = random.choice([[0, 0],[0, 1], [1, 0], [1, 1]])
        resultxor = listxor[0]^listxor[1] # xor operation


        agent.integrateObservation(listxor)
        action = agent.getAction()
        reward =  1 - 2*abs(resultxor - float(action[0])) # 1 if correct, -1 otherwise

        print "xor(",listxor,") = ", resultxor, " || action = " , action[0], "reward = ", reward

        agent.giveReward(reward) 

    agent.learn()

# Test
print "test : "
print "[0, 0] ",  learner.module.getMaxAction([0, 0])
print "[0, 1] ",  learner.module.getMaxAction([0, 1])
print "[1, 0] ",  learner.module.getMaxAction([1, 0])
print "[1, 1] ",  learner.module.getMaxAction([1, 1])

我知道，这不是Pybrain的定向方式（tast、env等），但我必须这样做。我对ActionValueTable和Q有很好的结果，但是我想使用每个维度的权重。在

有人能解释一下我哪里错了吗？网络好像什么也学不到。在

谢谢！在

Tags： in from import action agent rl module print

0条回答

目前没有回答

Pybrain：训练ActionValueNetwork不合适

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pybrain：训练ActionValueNetwork不合适

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >