Row stacking experiment - Efficientnet-B0 - trial reward
Pre-release
Pre-release
cpaxton
released this
15 Sep 19:05
·
20 commits
to grasp_pytorch0.4+
since this release
Note that this is a brief run (7k iterations) compared to most others (20-30k iterations), and we cannot yet know if it will become the best stacking nn model. However, the progress is promising.
Training iteration: 7019
WARNING variable mismatch num_trials + 1: 1011 nonlocal_variables[stack].trial: 1520
Change detected: True (value: 2302)
Primitive confidence scores: 4.232723 (push), 2.963957 (grasp), 3.558189 (place)
Strategy: exploit (exploration probability: 0.122838)
Action: push at (15, 157, 66)
Executing: push at (-0.592000, 0.090000, 0.000997)
Trainer.get_label_value(): Current reward: 1.500000 Current reward multiplier: 2.000000 Predicted Future reward: 4.407584 Expected reward: 1.500000 + 0.500000 x 4.407584 = 3.703792
Training loss: 0.014585
check_row: True | row_size: 2 | blocks: ['blue' 'red']
check_stack() stack_height: 2 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
check_row: True | row_size: 2 | blocks: ['blue' 'red']
check_stack() stack_height: 2 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
Push motion successful (no crash, need not move blocks): True
STACK: trial: 1520 actions/partial: 4.505776636713736 actions/full stack: 74.68085106382979 (lower is better) Grasp Count: 1734, grasp success rate: 0.486159169550173 place_on_stack_rate: 1.9402241594022416 place_attempts: 803 partial_stack_successes: 1558 stack_successes: 94 trial_success_rate: 0.06184210526315789 stack goal: [3 2 0] current_height: 2
Experience replay 13416: history timestep index 114, action: push, surprise value: 6.176722
Training loss: 0.044944
Time elapsed: 6.129107
Trainer iteration: 7020.000000
Ran with the --trial_reward --check_row
flags:
export CUDA_VISIBLE_DEVICES="1" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4 --push_rewards --experience_replay --explore_rate_decay --trial_reward --place --check_row