• 问答
  • 技术
  • 实践
  • 资源
【资源】TensorLayer 强化学习模型集
技术讨论

作者:TensorLayer Community
项目:RLzoo


本文汇总了TensorLayer强化学习模型集,TensorLayer 是为研究人员和工程师设计的一款基于Google TensorFlow开发的深度学习与强化学习库。

Reinforcement Learning Algorithms Zoo

RLzoo is a collection of most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0
and API of neural network layers in TensorLayer 2, to provide a hands-on fast-developing approach for reinforcement learning practices. It supports
basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations.
Moreover, RLzoo supports large-scale distributed training framework for more realistic scenarios with Unity 3D,
Mujoco, Bullet Physics, and robotic learning tasks with Vrep/Pyrep, etc.

Please note that this repository using RL algorithms with high-level API. So if you want to get familiar with each algorithm more quickly, please look at our RL tutorials where each algorithm is implemented individually in a more straightforward manner.

Status: Work-in-Progress:

Currently the repository is still in development, and there may be some envrionments incompatible with our algorithms. If you find any problems or have any suggestions, feel free to contact with us!

Contents:

Algorithms:

Algorithms Action Space Tutorial Env Papers
value-based
Q-learning Discrete FrozenLake Technical note: Q-learning. Watkins et al. 1992
Deep Q-Network (DQN) Discrete FrozenLake Human-level control through deep reinforcement learning, Mnih et al. 2015.
Prioritized Experience Replay Discrete Pong, CartPole Schaul et al. Prioritized experience replay. Schaul et al. 2015.
Dueling DQN Discrete Pong, CartPole Dueling network architectures for deep reinforcement learning. Wang et al. 2015.
Double DQN Discrete Pong, CartPole Deep reinforcement learning with double q-learning. Van et al. 2016.
Retrace Discrete Pong, CartPole Safe and efficient off-policy reinforcement learning. Munos et al. 2016:
Noisy DQN Discrete Pong, CartPole Noisy networks for exploration. Fortunato et al. 2017.
Distributed DQN (C51) Discrete Pong, CartPole A distributional perspective on reinforcement learning. Bellemare et al. 2017.
policy-based
REINFORCE(PG) Discrete/Continuous CartPole Reinforcement learning: An introduction. Sutton et al. 2011.
Trust Region Policy Optimization (TRPO) Discrete/Continuous Pendulum Abbeel et al. Trust region policy optimization. Schulman et al.2015.
Proximal Policy Optimization (PPO) Discrete/Continuous Pendulum Proximal policy optimization algorithms. Schulman et al. 2017.
Distributed Proximal Policy Optimization (DPPO) Discrete/Continuous Pendulum Emergence of locomotion behaviours in rich environments. Heess et al. 2017.
actor-critic
Actor-Critic (AC) Discrete/Continuous CartPole Actor-critic algorithms. Konda er al. 2000.
Asynchronous Advantage Actor-Critic (A3C) Discrete/Continuous BipedalWalker Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.
DDPG Discrete/Continuous Pendulum Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016
TD3 Discrete/Continuous Pendulum Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.
Soft Actor-Critic (SAC) Discrete/Continuous Pendulum Soft actor-critic algorithms and applications. Haarnoja et al. 2018.

Applications:

Prerequisites:

  • python 3.5
  • tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
  • tensorlayer >= 2.0.1
  • tensorflow-probability
  • tf-nightly-2.0-preview

pip install -r requirements.txt

Usage:

python3 main.py --env=Pendulum-v0 --algorithm=td3 --train_episodes=600 --mode=train
python3 main.py --env=BipedalWalker-v2 --algorithm=a3c --train_episodes=600 --mode=train --number_workers=2
python3 main.py --env=CartPole-v0 --algorithm=ac --train_episodes=600 --mode=train
python3 main.py --env=FrozenLake-v0 --algorithm=dqn --train_episodes=6000 --mode=train

Troubleshooting:

  • If you meet the errorAttributeError: module 'tensorflow' has no attribute 'contrib' when running the code after installing tensorflow-probability, try:
    pip install --upgrade tf-nightly-2.0-preview tfp-nightly


更多Awsome Github资源请关注:【Awsome】GitHub 资源汇总


推荐阅读:
【框架】PyTorch 图像检索框架
1987 ~2017 年历届 ICCV 最佳论文(Marr Prize Paper)汇总
训练网络像是买彩票?神经网络剪枝最新进展之彩票假设解读

file
△ 扫一扫关注 极市平台
每天推送最新CV干货

  • 0
  • 0
  • 2442
收藏
暂无评论
Find me
大咖

一个大的公司 ·

  • 13,246

    关注
  • 246

    获赞
  • 48

    精选文章
近期动态
  • 哈工大深圳研究生院CV汪,请原谅我这一生放纵不羁爱CV~
文章专栏
  • Awsome-Github 资源列表
作者文章
更多
  • 通过可微分神经渲染数据增强(附 GitHub 源码及论文下载)CVPR2021 目标检测
    442
  • CVPR2021 深度框架训练,不是所有数据增强都可以提升最终精度
    719
  • GitHub Star 7.2K,超级好用的 OCR 数据合成与半自动标注工具,强烈推荐!
    991
  • 李宏毅强化学习完整笔记!开源项目《LeeDeepRL-Notes》发布
    1.0k
  • 在 Pytorch 中构建流数据集
    582