欢迎光临散文网 会员登陆 & 注册

Reinforcement Learning_Code_Simplest Actor-Critic

2023-04-12 21:59 作者:别叫我小红  | 我要投稿

Following results and code are the implementation of simplest actor-critic in Gymnasium's Cart Pole environment. More actor-critic alorithms will be added in the learning of OpenAi Sunning Up tutorial.


RESULTS:

The simplest actor-critic algorithm takes too many steps to converge, it may be caused by large variance in sampling. If a baseline is reduced when updating policy, which refers to the trick used in A2C, this phenomenon may be alleviated.

Visualizations of (i) changes in score and value approximation loss, and (ii) animation results.

Fig. 1. Changes in score and value approximation loss.
Fig. 2. Animation result which got a score of 357 points.


CODE:

NetWork.py


QACAgent.py


train_and_test.py


The above code are mainly based on Lesson 7 of the David Silver's lecture [1], Chapter 10 of Shiyu Zhao's Mathematical Foundation of Reinforcement Learning [2], and Chapter 10 of Hands-on Reinforcement Learning [3].


Reference

[1] https://www.davidsilver.uk/teaching/

[2] https://github.com/MathFoundationRL/Book-Mathmatical-Foundation-of-Reinforcement-Learning

[3] https://hrl.boyuai.com/


Reinforcement Learning_Code_Simplest Actor-Critic的评论 (共 条)

分享到微博请遵守国家法律