feat: add config_cli.py and refactor configs #1024

terrykong · 2025-08-29T07:47:08Z

Given the recent friction of merging can sometimes come down to config conflicts, this PR introduces a tool to merge configs so you can commit the minimal config necessary for a recipe. There is now a new risk of config changes in the base config propagating to the recipes, especially if the config was not defined and the new default changes the behavior, but this has the positive side-effect of bisecting being more helpful whereas before we were shielded from regressions until the recipe config was updated explicitly.

We need to now keep in mind that if a flag like "enable_eager" was false and no other config overrode it, turning it on would turn it on for all recipes.

Example of how to use the tool:

  # Expand a config with a root level "defaults" key to see the full config; print to stdout
  uv run tools/config_tools.py expand examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml
  # Expand a config with a root level "defaults" key to see the full config; edit the config in place
  uv run tools/config_tools.py expand examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml --in-place
  # Minimize a config and remove all keys that are present in the base config; print to stdout
  # uv run tools/config_tools.py minimize <config> <base_config>
  uv run tools/config_tools.py minimize examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml examples/configs/dpo.yaml
  # Minimize a config and remove all keys that are present in the base config; edit the config in place
  # uv run tools/config_tools.py minimize <config> <base_config>
  uv run tools/config_tools.py minimize examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml examples/configs/dpo.yaml --in-place
  # Minimize all llm the configs:
  for algo in grpo dpo sft; do
    base_config=examples/configs/${algo}.yaml
    if [[ ${algo} == grpo ]]; then
      base_config=examples/configs/grpo_math_1B.yaml
    fi
    for recipe in examples/configs/recipes/llm/${algo}-*.yaml; do
      uv run tools/config_tools.py minimize $recipe $base_config --in-place
    done
  done
  # Compare two configs
  uv run tools/config_tools.py compare examples/configs/grpo_math_1B.yaml examples/configs/grpo_math_8B.yaml
  # Minimize a config and compare it to not minimzing (may see config differences since you are effectively re-parenting the config)
  uv run tools/config_tools.py minimize examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml examples/configs/dpo.yaml >examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml.minimized
  uv run tools/config_tools.py compare \
    examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml \
    examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml.minimized

Related to #927

Notable issues that arose from merging configs:

1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions

`grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long`

this recipe is much worse on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1

Issue tracking gemma perf regression #1097

`grpo-deepscaler-1.5b-8K.yaml`/ `grpo-gspo-deepscaler-1.5b-8K.yaml`

these recipes have slightly worse perf on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1

2. SFT recipes defaulted to the default chat template recipe

coderabbitai · 2025-09-10T06:28:09Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/config-fold

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Terry Kong <[email protected]> compare command Signed-off-by: Terry Kong <[email protected]> config changes Signed-off-by: Terry Kong <[email protected]> Revert "config changes" This reverts commit 25b87e2. Signed-off-by: Terry Kong <[email protected]> cleanup Signed-off-by: Terry Kong <[email protected]> vlm example Signed-off-by: Terry Kong <[email protected]> minimize configs Signed-off-by: Terry Kong <[email protected]> Revert "minimize configs" This reverts commit 1375480. Signed-off-by: Terry Kong <[email protected]> minimize configs Signed-off-by: Terry Kong <[email protected]> Revert "minimize configs" This reverts commit a4cd8a4. Signed-off-by: Terry Kong <[email protected]>

Signed-off-by: Terry Kong <[email protected]>

behavior Signed-off-by: Terry Kong <[email protected]>

Signed-off-by: Terry Kong <[email protected]>

terrykong force-pushed the tk/config-fold branch 4 times, most recently from 4534008 to 1c0290c Compare September 6, 2025 06:41

github-actions bot added the CI Relating to CI label Sep 6, 2025

terrykong force-pushed the tk/config-fold branch from 2254f82 to 260f022 Compare September 10, 2025 06:29

terrykong force-pushed the tk/config-fold branch from 260f022 to 1cd4fd9 Compare September 10, 2025 06:30

github-actions bot removed the CI Relating to CI label Sep 10, 2025

terrykong added 3 commits September 9, 2025 23:35

minimize configs

e54f144

Signed-off-by: Terry Kong <[email protected]>

force sft configs to use default chat template to match last releases

be01df7

behavior Signed-off-by: Terry Kong <[email protected]>

reverting select configs to v1 to address

d81f806

Signed-off-by: Terry Kong <[email protected]>

terrykong changed the title ~~feat: add config_tools.py and refactor configs~~ feat: add config_cli.py and refactor configs Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add config_cli.py and refactor configs #1024

feat: add config_cli.py and refactor configs #1024

Uh oh!

terrykong commented Aug 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025

Review skipped

Uh oh!

Uh oh!

feat: add config_cli.py and refactor configs #1024

Are you sure you want to change the base?

feat: add config_cli.py and refactor configs #1024

Uh oh!

Conversation

terrykong commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable issues that arose from merging configs:

1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions

grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long

grpo-deepscaler-1.5b-8K.yaml/ grpo-gspo-deepscaler-1.5b-8K.yaml

2. SFT recipes defaulted to the default chat template recipe

Uh oh!

coderabbitai bot commented Sep 10, 2025

Review skipped

Uh oh!

Uh oh!

terrykong commented Aug 29, 2025 •

edited

Loading

`grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long`

`grpo-deepscaler-1.5b-8K.yaml`/ `grpo-gspo-deepscaler-1.5b-8K.yaml`