强化学习代理的性能
rldb的Python项目详细描述
RLDB
rl算法数据库
Atari Space Invaders Scores | MuJoCo Walker2d Scores |
---|---|
示例
可以使用^ {CD1>}来检索^ {CD2>}中的所有现有条目。
importrldball_entries=rldb.find_all({})
还可以通过指定条目必须匹配的键值对来筛选条目:
importrldbdqn_entries=rldb.find_all({'algo-nickname':'DQN'})breakout_noop_entries=rldb.find_all({'env-title':'atari-breakout','env-variant':'No-op start',})
您还可以使用rldbl.find_one(filter_dict)
查找与filter_dict
中指定的键值对匹配的项:
importrldbimportpprintentry=rldb.find_one({'env-title':'atari-pong','algo-title':'Human',})pprint.pprint(entry)
输出
{'algo-nickname':'Human','algo-title':'Human','env-title':'atari-pong','env-variant':'No-op start','score':14.6,'source-arxiv-id':'1511.06581','source-arxiv-version':3,'source-authors':['Ziyu Wang','Tom Schaul','Matteo Hessel','Hado van Hasselt','Marc Lanctot','Nando de Freitas'],'source-bibtex':'@article{DBLP:journals/corr/WangFL15,\n'' author = {Ziyu Wang and\n'' Nando de Freitas and\n'' Marc Lanctot},\n'' title = {Dueling Network Architectures for Deep ''Reinforcement Learning},\n'' journal = {CoRR},\n'' volume = {abs/1511.06581},\n'' year = {2015},\n'' url = {http://arxiv.org/abs/1511.06581},\n'' archivePrefix = {arXiv},\n'' eprint = {1511.06581},\n'' timestamp = {Mon, 13 Aug 2018 16:48:17 +0200},\n'' biburl = ''{https://dblp.org/rec/bib/journals/corr/WangFL15},\n'' bibsource = {dblp computer science bibliography, ''https://dblp.org}\n''}','source-nickname':'DuDQN','source-title':'Dueling Network Architectures for Deep Reinforcement ''Learning'}
详细信息>入口结构
以下是每个条目的格式:
{# BASICS"source-title":"","source-nickname":"","source-authors":[],# MISC."source-bibtex":"",# ALGORITHM"algo-title":"","algo-nickname":"","algo-source-title":"",# SCORE"env-title":"","score":0,}
source-title
是分数源的完整标题:它可以是论文的标题或github存储库的标题。{< CD6> }是该标题的一个流行昵称或首字母缩写,如果它存在,则与{{CD5}}相同。source-authors
是作者或贡献者的列表。source-bibtex
是bibtex格式的引文。algo-title
是所用算法的完整标题。{< CD11}}是该算法的昵称或首字母缩写,如果它存在,否则它与^ {CD11}}相同。algo-source-title
是算法源的标题。它可以而且经常不同于source-title
。
例如,Asynchronous Advantage Actor Critic(A3C)算法中Asynchronous Advantage Actor Critic(A3C)的分数由以下条目表示:
{# BASICS"source-title":"Noisy Networks for Exploration","source-nickname":"NoisyNet","source-authors":["Meire Fortunato","Mohammad Gheshlaghi Azar","Bilal Piot","Jacob Menick","Ian Osband","Alex Graves","Vlad Mnih","Remi Munos","Demis Hassabis","Olivier Pietquin","Charles Blundell","Shane Legg",],# ARXIV"source-arxiv-id":"1706.10295","source-arxiv-version":2,# MISC."source-bibtex":"""@article{DBLP:journals/corr/FortunatoAPMOGM17, author = {Meire Fortunato and Mohammad Gheshlaghi Azar and Bilal Piot and Jacob Menick and Ian Osband and Alex Graves and Vlad Mnih and R{\'{e}}mi Munos and Demis Hassabis and Olivier Pietquin and Charles Blundell and Shane Legg}, title = {Noisy Networks for Exploration}, journal = {CoRR}, volume = {abs/1706.10295}, year = {2017}, url = {http://arxiv.org/abs/1706.10295}, archivePrefix = {arXiv}, eprint = {1706.10295}, timestamp = {Mon, 13 Aug 2018 16:46:11 +0200}, biburl = {https://dblp.org/rec/bib/journals/corr/FortunatoAPMOGM17}, bibsource = {dblp computer science bibliography, https://dblp.org}}""",# ALGORITHM"algo-title":"Asynchronous Advantage Actor Critic","algo-nickname":"A3C","algo-source-title":"Asynchronous Methods for Deep Reinforcement Learning",# HYPERPARAMETERS"algo-frames":320*1000*1000,# Number of frames# SCORE"env-title":"atari-space-invaders","env-variant":"No-op start","score":1034,"stddev":49,}
注意,如图所示,条目可以包含其他信息。
来源
论文
深度q网络
- [x]Playing Atari with Deep Reinforcement Learning (Mnih et al., 2013)
- [X]Human-level control through deep reinforcement learning (Mnih et al., 2015)
- [X]Deep Recurrent Q-Learning for Partially Observable MDPs (Hausknecht and Stone, 2015)
- [X]Massively Parallel Methods for Deep Reinforcement Learning (Nair et al., 2015)
- [X]Deep Reinforcement Learning with Double Q-learning (Hasselt et al., 2015)
- [X]Prioritized Experience Replay (Schaul et al., 2015)
- [X]Dueling Network Architectures for Deep Reinforcement Learning (Wang et al., 2015)
- [X]Noisy Networks for Exploration (Fortunato et al., 2017)
- [X]A Distributional Perspective on Reinforcement Learning (Bellemare et al., 2017)
- [X]Rainbow: Combining Improvements in Deep Reinforcement Learning (Hessel et al., 2017)
- [X]Distributional Reinforcement Learning with Quantile Regression (Dabney et al., 2017)
- [X]Implicit Quantile Networks for Distributional Reinforcement Learning (Dabney et al., 2018)
政策梯度
- [X]Asynchronous Methods for Deep Reinforcement Learning (Mnih et al., 2016)
- [X]Trust Region Policy Optimization (Schulman et al., 2015)
- [X]Proximal Policy Optimization Algorithms (Schulman et al., 2017)
- [X]Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (Wu et al., 2017)
- [X]Addressing Function Approximation Error in Actor-Critic Methods (Fujimoto et al., 2018)
- [X]IMPALA: Importance Weighted Actor-Learner Architectures (Espeholt et al., 2018)
- [X]The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning (Gruslys et al., 2017)
勘探
其它
存储库
推荐PyPI第三方库
source-title
是分数源的完整标题:它可以是论文的标题或github存储库的标题。{< CD6> }是该标题的一个流行昵称或首字母缩写,如果它存在,则与{{CD5}}相同。source-authors
是作者或贡献者的列表。source-bibtex
是bibtex格式的引文。algo-title
是所用算法的完整标题。{< CD11}}是该算法的昵称或首字母缩写,如果它存在,否则它与^ {CD11}}相同。algo-source-title
是算法源的标题。它可以而且经常不同于source-title
。- [x]Playing Atari with Deep Reinforcement Learning (Mnih et al., 2013)
- [X]Human-level control through deep reinforcement learning (Mnih et al., 2015)
- [X]Deep Recurrent Q-Learning for Partially Observable MDPs (Hausknecht and Stone, 2015)
- [X]Massively Parallel Methods for Deep Reinforcement Learning (Nair et al., 2015)
- [X]Deep Reinforcement Learning with Double Q-learning (Hasselt et al., 2015)
- [X]Prioritized Experience Replay (Schaul et al., 2015)
- [X]Dueling Network Architectures for Deep Reinforcement Learning (Wang et al., 2015)
- [X]Noisy Networks for Exploration (Fortunato et al., 2017)
- [X]A Distributional Perspective on Reinforcement Learning (Bellemare et al., 2017)
- [X]Rainbow: Combining Improvements in Deep Reinforcement Learning (Hessel et al., 2017)
- [X]Distributional Reinforcement Learning with Quantile Regression (Dabney et al., 2017)
- [X]Implicit Quantile Networks for Distributional Reinforcement Learning (Dabney et al., 2018)