Release Test + Training to 16k -- Arrange in Rows -- blocks -- trial_reward -- EfficientNet -- B0 --V0.8.0 · jhu-lcsr/good_robot

This is training progress for arranging in rows, using trial reward, as well as testing. It contains several bugs which have affected logged success rates but not recorded values, improperly delaying trial reset. This requires detecting the first stack (row) height of 4 and removing subsequent 4s until a stack height of 1 is reached, meaning a new trial.

Grasping an offset block when trying to create a row:

https://youtu.be/-QaxLmAE-wg

Status printout from training:

Training iteration: 16573
WARNING variable mismatch num_trials + 1: 2772 nonlocal_variables[stack].trial: 3141
Change detected: True (value: 609)
Primitive confidence scores: 3.177647 (push), 4.219032 (grasp), 3.461217 (place)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (6, 66, 165)
Executing: grasp at (-0.394000, -0.092000, 0.051009)
Trainer.get_label_value(): Current reward: 0.000000 Current reward multiplier: 2.000000 Predicted Future reward: 4.146331 Expected reward: 0.000000 + 0.650000 x 4.146331 = 2.695115
gripper position: 0.030596047639846802
gripper position: 0.025395959615707397
gripper position: 0.004252210259437561
Training loss: 0.590193
Experience replay 31550: history timestep index 12357, action: grasp, surprise value: 0.554012
Training loss: 0.243598
gripper position: 0.0034487545490264893
Grasp successful: True
check_row: True | row_size: 2 | blocks: ['yellow' 'red']
check_stack() stack_height: 2 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
STACK:  trial: 3141 actions/partial: 7.379341050756901  actions/full stack: 78.54976303317535 (lower is better)  Grasp Count: 9240, grasp success rate: 0.6172077922077922 place_on_stack_rate: 0.40035650623885916 place_attempts: 5610  partial_stack_successes: 2246  stack_successes: 211 trial_success_rate: 0.06717605858007004 stack goal: [3 0 1] current_height: 2
Experience replay 31551: history timestep index 7990, action: grasp, surprise value: 0.665486
Training loss: 1.295654
Time elapsed: 9.935521
Trainer iteration: 16574.000000

Status printout from testing:

Testing iteration: 1647
Change detected: True (value: 150)
Primitive confidence scores: 3.304538 (push), 3.193036 (grasp), 3.259273 (place)
Strategy: exploit (exploration probability: 0.000000)
Action: push at (4, 149, 181)
Executing: push at (-0.362000, 0.074000, 0.001005)
Trainer.get_label_value(): Current reward: 0.000000 Current reward multiplier: 3.000000 Predicted Future reward: 3.470774 Expected reward: 0.000000 + 0.650000 x 3.470774 = 2.256003
Training loss: 0.307084
gripper position: 0.033294931054115295
gripper position: 0.026271313428878784
gripper position: 0.00115203857421875
gripper position: -0.02354462444782257
gripper position: -0.04312487691640854
check_row: True | row_size: 2 | blocks: ['blue' 'red']
check_stack() stack_height: 2 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: True
main.py check_stack() DETECTED A MISMATCH between the goal height: 3 and current workspace stack height: 2
check_row: True | row_size: 2 | blocks: ['yellow' 'red']
check_stack() stack_height: 2 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: True
main.py check_stack() DETECTED A MISMATCH between the goal height: 3 and current workspace stack height: 2
STACK:  trial: 100 actions/partial: 6.699186991869919  actions/full stack: 32.96 (lower is better)  Grasp Count: 909, grasp success rate: 0.8360836083608361 place_on_stack_rate: 0.34405594405594403 place_attempts: 715  partial_stack_successes: 246  stack_successes: 50 trial_success_rate: 0.5 stack goal: [1 3 0 2] current_height: 2
Time elapsed: 5.650612
Trainer iteration: 1648.000000

Testing iteration: 1648
Change detected: True (value: 2058)
Primitive confidence scores: 2.721774 (push), 3.686786 (grasp), 3.710173 (place)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (7, 206, 156)
Executing: grasp at (-0.412000, 0.188000, 0.050982)
Trainer.get_label_value(): Current reward: 1.500000 Current reward multiplier: 2.000000 Predicted Future reward: 3.740939 Expected reward: 1.500000 + 0.650000 x 3.740939 = 3.931610
Training loss: 1.153573
gripper position: 0.029423266649246216
gripper position: 0.024983912706375122
gripper position: 0.0034950077533721924
gripper position: 0.004140764474868774
gripper position: 0.003972411155700684
Grasp successful: True
check_row: True | row_size: 2 | blocks: ['yellow' 'red']
check_stack() stack_height: 2 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: True
main.py check_stack() DETECTED A MISMATCH between the goal height: 3 and current workspace stack height: 2
STACK:  trial: 100 actions/partial: 6.703252032520325  actions/full stack: 32.98 (lower is better)  Grasp Count: 910, grasp success rate: 0.8362637362637363 place_on_stack_rate: 0.34405594405594403 place_attempts: 715  partial_stack_successes: 246  stack_successes: 50 trial_success_rate: 0.5 stack goal: [1 3 0 2] current_height: 2
Time elapsed: 8.114762
Trainer iteration: 1649.000000

Testing iteration: 1649
There have not been changes to the objects for for a long time [push, grasp]: [0, 0], or there are not enough objects in view (value: 699)! Repositioning objects.

Testing iteration: 1649
Change detected: True (value: 3746)
Trainer.get_label_value(): Current reward: 1.562500 Current reward multiplier: 2.000000 Predicted Future reward: 4.563211 Expected reward: 1.562500 + 0.650000 x 4.563211 = 4.528587
Trial logging complete: 100 --------------------------------------------------------------
Training loss: 0.256596

Command to run training:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --future_reward_discount 0.65 --trial_reward --check_row

Command to run testing:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --trial_reward --future_reward_discount 0.65 --place --check_row --is_testing  --tcp_port 19996 --load_snapshot --snapshot_file '/home/costar/Downloads/snapshot-backup.reinforcement-best-stack-rate.pth' --random_seed 1238 --disable_situation_removal --save_visualizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test + Training to 16k -- Arrange in Rows -- blocks -- trial_reward -- EfficientNet -- B0 --V0.8.0