Reinforcement Learning_Code_Value Function Approximation
Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in Gymnasium's Cart Pole environment.
RESULTS:
Visualizations of (i) changes in scores, losses and epsilons, and (ii) animation results.
1. Monte Carlo


2. Sarsa
Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.
Because in original implementation of Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.
So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.


3. Deep Q-learning
Here we use experience replay and fixed Q-targets.


CODE:
NetWork.py
MCAgent.py
SarsaAgent.py
ReplayBuffer.py
DQNAgent.py
train_and_test.py
The above code are mainly based on rainbow-is-all-you-need[1] and extend solutions to Monte Carlo and Sarsa.
Reference
[1] https://github.com/Curt-Park/rainbow-is-all-you-need