feat: add config_cli.py and refactor configs #1024
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Given the recent friction of merging can sometimes come down to config conflicts, this PR introduces a tool to merge configs so you can commit the minimal config necessary for a recipe. There is now a new risk of config changes in the base config propagating to the recipes, especially if the config was not defined and the new default changes the behavior, but this has the positive side-effect of bisecting being more helpful whereas before we were shielded from regressions until the recipe config was updated explicitly.
We need to now keep in mind that if a flag like "enable_eager" was false and no other config overrode it, turning it on would turn it on for all recipes.
Example of how to use the tool:
Related to #927
Notable issues that arose from merging configs:
1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions
grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long
this recipe is much worse on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1

Issue tracking gemma perf regression #1097
grpo-deepscaler-1.5b-8K.yaml
/grpo-gspo-deepscaler-1.5b-8K.yaml
these recipes have slightly worse perf on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1


2. SFT recipes defaulted to the default chat template recipe