马尔可夫决策过程python库
markov-rlzoo的Python项目详细描述
啊![mdp image](https://cdn-images-1.medium.com/max/1200/1*qboz2yq5fy6ynzyvspxzw.png)
\markov:用于马尔可夫决策过程的简单python库执行以体验分布式系统的性能优势。
State
Policy Probabilities.
#### Policies:
- Greedy Policy
- e-Greedy Policy
- More to come...
#### Algorithms:
- Dynamic Programming
- Linear coming soon
#### Optimizers:
- Value/Policy Iteration
- More to come...
#### Environments:
- Gridworld (ASCII, pygame马上就要来了)
-健身房马上就要来了
-还有更多…
折扣系数=1.):
en state in env.states:
v=0
i,枚举中的操作(state.actions):
policy=state.policy[i]
next_state=action(env,state.action_args)
r=next_state.reward
v+=policy*(r+折扣系数*next_state.value)
values[state.index]=v
=argparse.argument parser()
parser.add_argument(“--k”,help=“k次迭代次数”,
type=int,默认值=1)
args=parser.parse_args()
k=args.k
\markov:用于马尔可夫决策过程的简单python库执行以体验分布式系统的性能优势。
State
Policy Probabilities.
#### Policies:
- Greedy Policy
- e-Greedy Policy
- More to come...
#### Algorithms:
- Dynamic Programming
- Linear coming soon
#### Optimizers:
- Value/Policy Iteration
- More to come...
#### Environments:
- Gridworld (ASCII, pygame马上就要来了)
-健身房马上就要来了
-还有更多…
折扣系数=1.):
v=0
i,枚举中的操作(state.actions):
policy=state.policy[i]
next_state=action(env,state.action_args)
r=next_state.reward
v+=policy*(r+折扣系数*next_state.value)
values[state.index]=v
=argparse.argument parser()
parser.add_argument(“--k”,help=“k次迭代次数”,
type=int,默认值=1)
args=parser.parse_args()
k=args.k