- The agent is trained using proximal policy optimization
- The following is the result after 500,000 training steps
Trial1.mp4
t2.mp4
- Implement soft-actor-critic (SAC).
- Emperically optimize, neural network parameters: depth, input layer
- Hyperparameter-tuning.
The reward funciton is:
timeSurvived + ( 3 * rocksDestroyed ) + ( 5 * enemyShipsDestroyed )
Rational: the agent would be incentivised to shoot down enemy bullets and rocks and miximize its survival time
Please see this script for more details
Input observations are, positions of objects currently in the scene, feeded sequentially with a label after each position to differentiate between objects
- Input layer neurons: 30
- Depth: 2
- Limited Computational Power