强化学习代理的性能

rldb的Python项目详细描述


RLDB

Build Status

Environments tracked in rldbPapers tracked in rldbRepos tracked in rldbAlgorithms tracked in rldbEntries tracked in rldb

rl算法数据库

Atari Space Invaders ScoresMuJoCo Walker2d Scores
Atari Space Invaders ScoresMuJoCo Walker2d Scores

示例

可以使用^ {CD1>}来检索^ {CD2>}中的所有现有条目。

importrldball_entries=rldb.find_all({})

还可以通过指定条目必须匹配的键值对来筛选条目:

importrldbdqn_entries=rldb.find_all({'algo-nickname':'DQN'})breakout_noop_entries=rldb.find_all({'env-title':'atari-breakout','env-variant':'No-op start',})

您还可以使用rldbl.find_one(filter_dict)查找与filter_dict中指定的键值对匹配的项:

importrldbimportpprintentry=rldb.find_one({'env-title':'atari-pong','algo-title':'Human',})pprint.pprint(entry)
输出

{'algo-nickname':'Human','algo-title':'Human','env-title':'atari-pong','env-variant':'No-op start','score':14.6,'source-arxiv-id':'1511.06581','source-arxiv-version':3,'source-authors':['Ziyu Wang','Tom Schaul','Matteo Hessel','Hado van Hasselt','Marc Lanctot','Nando de Freitas'],'source-bibtex':'@article{DBLP:journals/corr/WangFL15,\n''    author    = {Ziyu Wang and\n''                 Nando de Freitas and\n''                 Marc Lanctot},\n''    title     = {Dueling Network Architectures for Deep ''Reinforcement Learning},\n''    journal   = {CoRR},\n''    volume    = {abs/1511.06581},\n''    year      = {2015},\n''    url       = {http://arxiv.org/abs/1511.06581},\n''    archivePrefix = {arXiv},\n''    eprint    = {1511.06581},\n''    timestamp = {Mon, 13 Aug 2018 16:48:17 +0200},\n''    biburl    = ''{https://dblp.org/rec/bib/journals/corr/WangFL15},\n''    bibsource = {dblp computer science bibliography, ''https://dblp.org}\n''}','source-nickname':'DuDQN','source-title':'Dueling Network Architectures for Deep Reinforcement ''Learning'}

入口结构

以下是每个条目的格式:

{# BASICS"source-title":"","source-nickname":"","source-authors":[],# MISC."source-bibtex":"",# ALGORITHM"algo-title":"","algo-nickname":"","algo-source-title":"",# SCORE"env-title":"","score":0,}
  • source-title是分数源的完整标题:它可以是论文的标题或github存储库的标题。{< CD6> }是该标题的一个流行昵称或首字母缩写,如果它存在,则与{{CD5}}相同。
  • source-authors是作者或贡献者的列表。
  • source-bibtex是bibtex格式的引文。
  • algo-title是所用算法的完整标题。{< CD11}}是该算法的昵称或首字母缩写,如果它存在,否则它与^ {CD11}}相同。
  • algo-source-title算法源的标题。它可以而且经常不同于source-title

例如,Asynchronous Advantage Actor Critic(A3C)算法中Asynchronous Advantage Actor Critic(A3C)的分数由以下条目表示:

{#  BASICS"source-title":"Noisy Networks for Exploration","source-nickname":"NoisyNet","source-authors":["Meire Fortunato","Mohammad Gheshlaghi Azar","Bilal Piot","Jacob Menick","Ian Osband","Alex Graves","Vlad Mnih","Remi Munos","Demis Hassabis","Olivier Pietquin","Charles Blundell","Shane Legg",],#  ARXIV"source-arxiv-id":"1706.10295","source-arxiv-version":2,#  MISC."source-bibtex":"""@article{DBLP:journals/corr/FortunatoAPMOGM17,    author    = {Meire Fortunato and                 Mohammad Gheshlaghi Azar and                 Bilal Piot and                 Jacob Menick and                 Ian Osband and                 Alex Graves and                 Vlad Mnih and                 R{\'{e}}mi Munos and                 Demis Hassabis and                 Olivier Pietquin and                 Charles Blundell and                 Shane Legg},    title     = {Noisy Networks for Exploration},    journal   = {CoRR},    volume    = {abs/1706.10295},    year      = {2017},    url       = {http://arxiv.org/abs/1706.10295},    archivePrefix = {arXiv},    eprint    = {1706.10295},    timestamp = {Mon, 13 Aug 2018 16:46:11 +0200},    biburl    = {https://dblp.org/rec/bib/journals/corr/FortunatoAPMOGM17},    bibsource = {dblp computer science bibliography, https://dblp.org}}""",# ALGORITHM"algo-title":"Asynchronous Advantage Actor Critic","algo-nickname":"A3C","algo-source-title":"Asynchronous Methods for Deep Reinforcement Learning",# HYPERPARAMETERS"algo-frames":320*1000*1000,# Number of frames# SCORE"env-title":"atari-space-invaders","env-variant":"No-op start","score":1034,"stddev":49,}

注意,如图所示,条目可以包含其他信息。

来源

论文

深度q网络

政策梯度

勘探

其它

存储库

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
ApachePOI如何通过java从excel文件中删除空白列?   linux到后台Java服务应用程序的简单发送/接收接口   java ActionBarPullToRefresh什么都没发生   java从millis获取错误的整数天   java相同的代码在两个不同的包上表现不同   java将每个新的char元素写入一个文件(如果被覆盖)   mysql如何在Java中通过外键链接的多个表中插入数据   java环境下mysql网络文件访问   java当使用构建器模式时,为什么我不应该重用builderobject来访问对象配置?   java jQueryServlet post异常失败   java应该使用什么逻辑来创建像《愤怒的小鸟》中那样的锁屏   java Android:在不滑动的情况下更改ViewPager中的片段   java在使用我的程序逻辑时获得空输出