欢迎光临散文网 会员登陆 & 注册

Reinforcement Learning_Code_Policy Gradient

2023-04-10 23:35 作者:别叫我小红  | 我要投稿

Following results and code are the implementation of policy gradient, including REINFORCE, in Gymnasium's Cart Pole environment.

RESULTS:

Visualizations of (i) changes in scores and losses, and (ii) animation results.

Since REINFROCE makes use of Monte Carlo estimation, its convergence rate is slow and it does not converge after 10 thousand steps.

However, it has got a not too bad result and is hopefully to achieve more than 200 points if more steps are given.

Fig. 1. Changes in scores and losses.

Fig. 2. Animation results.


CODE:

NetWork.py


REINFORCEAgent.py


train_and_test.py


The above code are mainly based on Chapter 9 of Hands-on Reinforcement Learning [1] and my previous implementation of value function apporximation with Mente Carlo [2].


Reference

[1] https://hrl.boyuai.com/

[2] https://www.bilibili.com/read/cv22924612



Reinforcement Learning_Code_Policy Gradient的评论 (共 条)

分享到微博请遵守国家法律