This is final project of final semester in robotic - INT 3409 1
We using A3C - Asynchronous advantage actor critic algorithm to train an agent navigating in side simulated environment ai2thor
This project includes implementations of A3C in ./A3C/
to train the model, using:
python --is_ai2thor
to visualize result, using:
python --is_ai2thor --critic_path */A3C/model/critic-model* --actor_path */A3C/model/actor-model*
The default training parameter is 5000 episodes, 5 threads
Clone this repository:
Install Python dependencies:
pip install -r requirements.txt
Highly recommend to install tensorflow using conda:
conda install tensorflow-gpu
- this project using gym-style interface of ai2thor environment
- objective is simply picking an apple in kitchen environment - FloorPlan28
- observation space is first-view RGB 128x128 image from agent's camera
- maximum step in this project is 500
- reward fuction:
- -0.01 each time step
- 1 if agent can pick an apple, the env than terminate
- 0.01 if agent saw an apple (has been removed in latest code)
- a pre-train mobilenet-v2 model on image-net is used an feature extractor for later dense layer both actor and critic model
- actor optimizer using Advantages + Entropy term to encourage exploration (
- this project trained on xenon E5-2667v2 + GTX1070, with 1,444,234 parameters for actor and 1,443,073 params for critic model
the objective is simple so that the model converge very fast, detail log and trained model in
This project greatly thanks to material: