封面是OpenAI在 spinning up 中给出的分类,然而这已不足以囊括现有的SOTA算法,再次感慨AI领域发paper的速度。(然而在智能方面好像也没有推进很多,不过不积跬步无以至千里嘛)
为了让大家对 RL 的 SOTA 算法有一个直观的概念,我重新整理了一下 SOTA 算法目录,有些我已经在self-implement,有些写了相关的paper reading.
Model-free
Value-based
- Q-learning
[ Paper | Code | Blog | 1992 ] - Sarsa, Sarsa( )
[ Paper | Code | Blog | 1994 ] - Deep Q Network (DQN)
[ Paper | Code | Blog | 2015 ] - Double Deep Q Network
[ Paper | Code | Blog | 2015 ] - Dueling Deep Q Network
[ Paper | Code | Blog | 2015 ] - Double Dueling Deep Q Network (D3QN)
[ No Paper | Code | Blog | 2015 ] - Rainbow
[ Paper | Code | Blog | 2017 ] - Hindsight Experience Replay(HER) (也可用于DDPG)
[ Paper | Code | Blog | 2017 ]
Policy-based
- Vanilla Policy Gradient / REINFORCE
[ Paper | Code | Blog | 2000 ] - Trust Region Policy Optimization (TRPO)
[ Paper | Code | Blog | 2015 ] - Proximal Policy Optimization (PPO)
[ Paper | Code | Blog | 2017 ]
Actor-Critic
- Actor-Critic
[ Paper | Code pytorch | Blog | 2000 ] - Advantage Actor-Critic (A2C)
[ No Paper | Code | Blog | 未知 ] - Deep Deterministic Policy Gradient (DDPG)
[ Paper | Code1 OpenAI | Code2 | Blog | 2015 ] - Twin Delayed DDPG (TD3)
[ Paper | Code | Blog | 2018 ] - Soft Actor-Critic (SAC)
[ Paper | Code tf | Blog | 2018 ]
Model-based
- Dyna
[ Paper | Code | Blog | 1991 ] - PILCO
[ Paper | Code | Blog | 2011 ] - Value Prediction Network (VPN)
[ Paper | Code | Blog | 2018 ] - Guided Policy Search (GPS)
[ Paper | Code | Blog | 2017 ] - Model-Based Value Expansion (MVE)
[ Paper | Code | Blog | 2018 ] - Stochastic Ensemble Value Expansion (STEVE)
[ Paper | Code | Blog | 2018 ] - Model-Based Policy Optimization (MBPO)
[ Paper | Code | Blog | 2019 ]
Hierarchical RL
- Hierarchical DQN (h-DQN)
[ Paper | Code Keras | Code pytorch | Blog | 2016 ] - Hierarchical DDPG (h-DDPG)
[ Paper | Code | Blog | 2017 ] - Hierarchical-Actor-Critic (HAC)
[ Paper | Code pytorch | Code TF | Blog_CN | Blog_EG | 2019 ]
Distributed Architecture
- Asynchronous Advantage Actor-Critic (A3C)
[ Paper | Code pytorch | Blog | 2016 ] - Distributed PPO (DPPO)
[ Paper | Code pytorch | Blog | 2017 ] - IMPALA
[ Paper | Code | Blog | 2018 ] - APE-X
[ Paper | Code | Blog | 2018 ] - Divergence-augmented Policy Optimization (DAPO)
[ Paper | Code | Blog | 2019 ]
Multi-Agent
- Value-Decomposition Networks (VDN)
[ Paper | Code | Blog | 2017 ] - MADDPG
[ Paper | Code OpenAI | Blog | 2017 ] - Mean Field Multi-Agent RL
[ Paper | Code | Blog | 2018 ] - QMIX
[ Paper | Code | Blog | 2018 ] - Actor-Attention-Critic for Multi-Agent (MAAC)
[ Paper | Code | Blog | 2018 ]
链接有误,烦请告知,不胜感激
更多算法实现见本专栏关联Github
欢迎 Watch & Star !!!!!
评论(0)
您还未登录,请登录后发表或查看评论