【资源】TensorLayer 强化学习模型集

作者:TensorLayer Community
项目:RLzoo


本文汇总了TensorLayer强化学习模型集,TensorLayer 是为研究人员和工程师设计的一款基于Google TensorFlow开发的深度学习与强化学习库。

Reinforcement Learning Algorithms Zoo

RLzoo is a collection of most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0
and API of neural network layers in TensorLayer 2, to provide a hands-on fast-developing approach for reinforcement learning practices. It supports
basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations.
Moreover, RLzoo supports large-scale distributed training framework for more realistic scenarios with Unity 3D,
Mujoco, Bullet Physics, and robotic learning tasks with Vrep/Pyrep, etc.

Please note that this repository using RL algorithms with high-level API. So if you want to get familiar with each algorithm more quickly, please look at our RL tutorials where each algorithm is implemented individually in a more straightforward manner.

Status: Work-in-Progress:

Currently the repository is still in development, and there may be some envrionments incompatible with our algorithms. If you find any problems or have any suggestions, feel free to contact with us!

Contents:

Algorithms:

Algorithms Action Space Tutorial Env Papers
value-based
Q-learning Discrete FrozenLake Technical note: Q-learning. Watkins et al. 1992
Deep Q-Network (DQN) Discrete FrozenLake Human-level control through deep reinforcement learning, Mnih et al. 2015.
Prioritized Experience Replay Discrete Pong, CartPole Schaul et al. Prioritized experience replay. Schaul et al. 2015.
Dueling DQN Discrete Pong, CartPole Dueling network architectures for deep reinforcement learning. Wang et al. 2015.
Double DQN Discrete Pong, CartPole Deep reinforcement learning with double q-learning. Van et al. 2016.
Retrace Discrete Pong, CartPole Safe and efficient off-policy reinforcement learning. Munos et al. 2016:
Noisy DQN Discrete Pong, CartPole Noisy networks for exploration. Fortunato et al. 2017.
Distributed DQN (C51) Discrete Pong, CartPole A distributional perspective on reinforcement learning. Bellemare et al. 2017.
policy-based
REINFORCE(PG) Discrete/Continuous CartPole Reinforcement learning: An introduction. Sutton et al. 2011.
Trust Region Policy Optimization (TRPO) Discrete/Continuous Pendulum Abbeel et al. Trust region policy optimization. Schulman et al.2015.
Proximal Policy Optimization (PPO) Discrete/Continuous Pendulum Proximal policy optimization algorithms. Schulman et al. 2017.
Distributed Proximal Policy Optimization (DPPO) Discrete/Continuous Pendulum Emergence of locomotion behaviours in rich environments. Heess et al. 2017.
actor-critic
Actor-Critic (AC) Discrete/Continuous CartPole Actor-critic algorithms. Konda er al. 2000.
Asynchronous Advantage Actor-Critic (A3C) Discrete/Continuous BipedalWalker Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.
DDPG Discrete/Continuous Pendulum Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016
TD3 Discrete/Continuous Pendulum Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.
Soft Actor-Critic (SAC) Discrete/Continuous Pendulum Soft actor-critic algorithms and applications. Haarnoja et al. 2018.

Applications:

Prerequisites:

  • python 3.5
  • tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
  • tensorlayer >= 2.0.1
  • tensorflow-probability
  • tf-nightly-2.0-preview

pip install -r requirements.txt

Usage:

python3 main.py --env=Pendulum-v0 --algorithm=td3 --train_episodes=600 --mode=train
python3 main.py --env=BipedalWalker-v2 --algorithm=a3c --train_episodes=600 --mode=train --number_workers=2
python3 main.py --env=CartPole-v0 --algorithm=ac --train_episodes=600 --mode=train
python3 main.py --env=FrozenLake-v0 --algorithm=dqn --train_episodes=6000 --mode=train

Troubleshooting:

  • If you meet the errorAttributeError: module 'tensorflow' has no attribute 'contrib' when running the code after installing tensorflow-probability, try:
    pip install --upgrade tf-nightly-2.0-preview tfp-nightly


更多Awsome Github资源请关注:【Awsome】GitHub 资源汇总


推荐阅读:
【框架】PyTorch 图像检索框架
1987 ~2017 年历届 ICCV 最佳论文(Marr Prize Paper)汇总
训练网络像是买彩票?神经网络剪枝最新进展之彩票假设解读

file
△ 扫一扫关注 极市平台
每天推送最新CV干货

微信公众号: 极市平台(ID: extrememart )
每天推送最新CV干货