Reinforcement Learning_Code_Value Function Approximation

2023-04-08 11:30 作者:别叫我小红 0人读过 | 我要投稿

Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in Gymnasium's Cart Pole environment.

RESULTS:

Visualizations of (i) changes in scores, losses and epsilons, and (ii) animation results.

1. Monte Carlo

Fig. 1.1. Changes in scores, losses and epsilons.

2. Sarsa

Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.

Because in original implementation of Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.

So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.