欢迎光临散文网 会员登陆 & 注册

Reinforcement Learning_Code_Value Function Approximation

2023-04-08 11:30 作者:别叫我小红  | 我要投稿

Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in Gymnasium's Cart Pole environment.


RESULTS:

Visualizations of (i) changes in scores, losses and epsilons, and (ii) animation results.

1. Monte Carlo

Fig. 1.1. Changes in scores, losses and epsilons.
Fig. 1.2. Animation results.

2. Sarsa

Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer. 

Because in original implementation of  Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability. 

So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.


Fig. 2.1. Changes in scores, losses and epsilons.
Fig. 2.2. Animation results.

3. Deep Q-learning

Here we use experience replay and fixed Q-targets.

Fig. 3.1. Changes in scores, losses and epsilons.
Fig. 3.2. Animation results.


CODE:

NetWork.py


MCAgent.py


SarsaAgent.py


ReplayBuffer.py


DQNAgent.py


train_and_test.py


The above code are mainly based on rainbow-is-all-you-need[1] and extend solutions to Monte Carlo and Sarsa.


Reference

[1] https://github.com/Curt-Park/rainbow-is-all-you-need


Reinforcement Learning_Code_Value Function Approximation的评论 (共 条)

分享到微博请遵守国家法律