Programming Assignments
- Implemented various algorithms - Epsilon Greedy, Round Robin, UCB, KL-UCB and Thompson Sampling and compared the regrets over different horizons.
- Implemented Linear Programming solver and Howard's Policy Iteration to find the optimal policy and the corresponding value functions.
- Estimated the Value Function for different states using Model-Based and TD(lambda).
- Used SARSA On-Policy TD Control method to train an agent to reach the goal block of a windy gridworld. (Sutton and Barto Example 6.5, Exercise 6.9, Exercise 6.10)