强化学习基础 Ⅳ: State-of-the-art 强化学习经典算法汇总

封面是OpenAI在 spinning up 中给出的分类,然而这已不足以囊括现有的SOTA算法,再次感慨AI领域发paper的速度。(然而在智能方面好像也没有推进很多,不过不积跬步无以至千里嘛)

为了让大家对 RL 的 SOTA 算法有一个直观的概念,我重新整理了一下 SOTA 算法目录,有些我已经在self-implement,有些写了相关的paper reading.

Model-free

Value-based

  1. Q-learning
    [ Paper | Code | Blog | 1992 ]
  2. Sarsa, Sarsa( [公式] ) 
    [ Paper | Code | Blog | 1994 ]
  3. Deep Q Network (DQN) 
    [ Paper | Code | Blog | 2015 ]
  4. Double Deep Q Network 
    [ Paper | Code | Blog | 2015 ]
  5. Dueling Deep Q Network
    [ Paper | Code | Blog | 2015 ]
  6. Double Dueling Deep Q Network (D3QN) 
    [ No Paper | Code | Blog | 2015 ]
  7. Rainbow 
    [ Paper | Code | Blog | 2017 ]
  8. Hindsight Experience Replay(HER) (也可用于DDPG)
    [ Paper | Code | Blog | 2017 ]

Policy-based

  1. Vanilla Policy Gradient / REINFORCE
    [ Paper | Code | Blog | 2000 ]
  2. Trust Region Policy Optimization (TRPO)
    [ Paper | Code | Blog | 2015 ]
  3. Proximal Policy Optimization (PPO) 
    [ Paper | Code | Blog | 2017 ]

Actor-Critic

  1. Actor-Critic
    [ Paper | Code pytorch | Blog | 2000 ]
  2. Advantage Actor-Critic (A2C)
    [ No Paper | Code | Blog | 未知 ]
  3. Deep Deterministic Policy Gradient (DDPG)
    [ Paper | Code1 OpenAI | Code2 | Blog | 2015 ]
  4. Twin Delayed DDPG (TD3) 
    [ Paper | Code | Blog | 2018 ]
  5. Soft Actor-Critic (SAC) 
    [ Paper | Code tf | Blog | 2018 ]

Model-based

  1. Dyna 
    [ Paper | Code | Blog | 1991 ]
  2. PILCO
    [ Paper | Code | Blog | 2011 ]
  3. Value Prediction Network (VPN)
    [ Paper | Code | Blog | 2018 ]
  4. Guided Policy Search (GPS)
    [ Paper | Code | Blog | 2017 ]
  5. Model-Based Value Expansion (MVE)
    [ Paper | Code | Blog | 2018 ]
  6. Stochastic Ensemble Value Expansion (STEVE)
    [ Paper | Code | Blog | 2018 ]
  7. Model-Based Policy Optimization (MBPO)
    [ Paper | Code | Blog | 2019 ]

Hierarchical RL

  1. Hierarchical DQN (h-DQN)
    [ Paper | Code Keras | Code pytorch | Blog | 2016 ]
  2. Hierarchical DDPG (h-DDPG)
    [ Paper | Code | Blog | 2017 ]
  3. Hierarchical-Actor-Critic (HAC)
    [ Paper | Code pytorch | Code TF | Blog_CN | Blog_EG | 2019 ]

Distributed Architecture

  1. Asynchronous Advantage Actor-Critic (A3C) 
    [ Paper | Code pytorch | Blog | 2016 ]
  2. Distributed PPO (DPPO) 
    [ Paper | Code pytorch | Blog | 2017 ]
  3. IMPALA 
    [ Paper | Code | Blog | 2018 ]
  4. APE-X
    [ Paper | Code | Blog | 2018 ]
  5. Divergence-augmented Policy Optimization (DAPO)
    [ Paper | Code | Blog | 2019 ]

Multi-Agent

  1. Value-Decomposition Networks (VDN)
    [ Paper | Code | Blog | 2017 ]
  2. MADDPG
    [ Paper | Code OpenAI | Blog | 2017 ]
  3. Mean Field Multi-Agent RL 
    [ Paper | Code | Blog | 2018 ]
  4. QMIX
    [ Paper | Code | Blog | 2018 ]
  5. Actor-Attention-Critic for Multi-Agent (MAAC)
    [ Paper | Code | Blog | 2018 ]

链接有误,烦请告知,不胜感激

更多算法实现见本专栏关联Github

欢迎 Watch & Star !!!!!