['QEff.finetuning'] Changing some params from training config to model config by tchawada · Pull Request #747 · quic/efficient-transformers

tchawada · 2026-01-21T11:12:20Z

This PR contain:
1.documentation for new finetune experimental stack.
2. Updates inconfig_manager.py

QEfficient/finetune/experimental/core/config_manager.py

quic-swatia · 2026-01-27T10:45:37Z

Please correct the title of the PR. It has been renamed to one of the commit changes it seems.

quic-akuruvil · 2026-01-29T11:38:51Z

docs/source/hf_finetune.md

+*   **completion\_template**: string pattern that tells the fine-tuning pipeline which part of the dataset should be treated as the target output (completion) for the model to learn.
+
+     **Note** : completion_func and completion_template cannot be used together. Please specify only one of these options at a time.
+*   **dataset_subset**: `default = "default"` → The subset of the dataset to use (useful for multi-configuration datasets).


Give more description for this, how to use this, give a sample value as example.

quic-akuruvil · 2026-01-29T11:39:50Z

docs/source/hf_finetune.md

+  dataset_name: "yahma/alpaca-cleaned"
+  train_split: "train"
+  test_split: "test"
+  max_seq_length: 512


Why only Alpaca dataset has max_seq_len ?

As the max_seq_len has default value is it not necessary to provide in every config, I can still add it

okay, what if alpaca dataset has some sample which has seq_len > 512. On what basis is it set to 512 here

This is just an example, user can modify it.

I think any default we set, (for each dataset) should not be any random value. It should be an almost best value. It should be dependent on length of samples in the dataset, or based on which is the max the hardware can support etc. Might need an analysis on the samples of the dataset. We might lose some samples if its length exceeds 512.

Anyways, we can work on setting defaults after further analysis later. Now this would be okay to complete the end to end testing. But please note this down for future resolution.

quic-akuruvil

Add a sample working config in configs/ folder.

QEfficient/finetune/experimental/core/config_manager.py

quic-meetkuma

Looks good, further polishing is needed. Let us close this at the earliest.

PS: add description to the PR.

docs/source/hf_finetune.md

quic-meetkuma · 2026-01-30T08:08:38Z

QEfficient/finetune/experimental/core/config_manager.py

            errors,
-            n_epochs <= 0 and max_steps <= 0,
+            n_epochs <= 0,
            "Either training.num_train_epochs > 0 or training.max_steps > 0 must be set.",


Why max_steps is removed? If it is not needed update the comment as well.

it's default value is -1 for full steps, that's why I removed it

quic-meetkuma · 2026-01-30T08:10:43Z

QEfficient/finetune/experimental/tests/test_config_manager.py

 def test_config(config_path):
-    master_config = parse_arguments(args=[])
-    config_manager = ConfigManager(master_config)
+    master_config = parse_arguments()


As per proposed flow, just pass config_path and pass None inplace of master_config.

quic-meetkuma · 2026-01-30T08:10:58Z

QEfficient/finetune/experimental/tests/test_config.yaml

  early_stopping:
    early_stopping_patience: 3
    early_stopping_threshold: 0.001
-  tensorboard:


Why it is removed?

I think its a mistake , will add it back

quic-meetkuma · 2026-01-30T08:45:47Z

QEfficient/finetune/experimental/core/config_manager.py



-def parse_arguments(config_path: Optional[str] = None, args: Optional[List[str]] = None) -> MasterConfig:
+def parse_arguments() -> MasterConfig:


No need of this function as it is not doing anything. Argument parsing happens inside of ConfigManager.

quic-meetkuma · 2026-01-30T08:48:11Z

QEfficient/finetune/experimental/core/config_manager.py

    """Manages configuration loading, validation, and updates."""

-    def __init__(self, config: MasterConfig):
+    def __init__(self, config: MasterConfig, config_path: Optional[str] = None):


The init should take config only if user wants to override its values and pass it to ConfigManager. Same way any use cases where user want to use a config stored at config_path, then config_path argument is used.

For our use case where we are invoking finetuning from CLI, the ConfigManager should not be given anything because it parses CLI argument within its init.

Accordingly, changes should be made in #731.

CC: @quic-swatia

quic-akuruvil · 2026-02-04T05:07:30Z

docs/source/hf_finetune.md

+
+**Single device using CLI flags**
+```bash
+python finetune_experimental.py --device qaic --lora_r 16 --target_modules q_proj, v_proj --gradient_checkpointing True


--enable-pp and --enable-ddp arguments should also be included and tested for its functionality

quic-akuruvil · 2026-02-04T05:12:02Z

docs/source/hf_finetune.md

+
+**Single device using yaml file**
+```bash
+python finetune_experimental.py --config configs/sample_config.yaml


python -m QEfficient.cloud.finetune_experimental --config configs/sample_config.yaml. I tried above command and it fails due to path issue, if executed from Qefficient base dir location. Use this command with absolute path instead.

Please test and verify all the commands specified here for any potential breakages.

carry over patch quic#693 Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>

Added step wise instructions for MULTI NODE Finetuning. --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Add support for multi-node Distributed Data Parallel (DDP) training to the QEfficient finetuning pipeline. This enables scaling training across multiple nodes while keeping the existing single-node behavior unchanged. Commands for DDP across 2 servers: For the Master Addr or the Primary Machine, use node-rank as 0: QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=2 --nproc-per-node=4 --seed 0 --node-rank=0 --master_addr=<MASTER_NODE_IP> --master_port=8000 -m QEfficient.cloud.finetune --device qaic --enable_ddp --model_name "meta-llama/Llama-3.2-1B" --dataset alpaca_dataset --train_batch_size 1 --val_batch_size 1 --num_epochs 1 --max_train_step 200 --max_eval_step 50 For Node 1, use node-rank as 1: QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=2 --nproc-per-node=4 --seed 0 --node-rank=1 --master_addr=<MASTER_NODE_IP> --master_port=8000 -m QEfficient.cloud.finetune --device qaic --enable_ddp --model_name "meta-llama/Llama-3.2-1B" --dataset alpaca_dataset --train_batch_size 1 --val_batch_size 1 --num_epochs 1 --max_train_step 200 --max_eval_step 50 --------- Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>

Handled the edge case where num samples in a dataset are less than 20. Corrected the dataset link in grammar_dataset.py Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>

Added default NPI file for Gemma3. 1. Eliminates the need to provide NPI file as an extra argument by user. NPI file added as default, no need to provide it explicitly in the example script --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>

Removed OpenGVLab/InternVL2_5-1B and OpenGVLab/InternVL3_5-1B test due to a compiler issue to unblock the CI --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Updated Qeff version to mainline --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

The SW issue came with prompt + generation length > SW. Fix 1. Cache updated with HybridSlidingWindowCache in cache utils --------- Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Added Readme file for the parameters used in sample config. --------- Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com> Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com> Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com> Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com> Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com> Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com> Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com> Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com> Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com> Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com> Co-authored-by: Mohit Soni <quic_mohisoni@quicinc.com> Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com> Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com> Co-authored-by: vjanfaza <vjanfaza@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <quic_akuruvil@quicinc.com> Co-authored-by: smedhe <smedhe@qti.qualcomm.com> Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com> Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com> Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com> Co-authored-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Co-authored-by: Meet Patel <meetkuma@qti.qualcomm.com> Co-authored-by: Swati Allabadi <quic_sallabad@quicinc.com> Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

…l config (quic#747) This PR contain: 1.documentation for new finetune experimental stack. 2. Updates inconfig_manager.py --------- Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com> Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com> Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com> Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com> Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com> Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com> Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com> Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com> Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com> Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com> Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com> Co-authored-by: vjanfaza <vjanfaza@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <quic_akuruvil@quicinc.com> Co-authored-by: smedhe <smedhe@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Co-authored-by: Mohit Soni <quic_mohisoni@quicinc.com> Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com> Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com> Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com> Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com> Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com> Co-authored-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Co-authored-by: Meet Patel <meetkuma@qti.qualcomm.com> Co-authored-by: Swati Allabadi <quic_sallabad@quicinc.com> Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>

quic-swatia · 2026-02-19T09:56:43Z

docs/source/hf_finetune.md

+    ddp_broadcast_buffers: null
+    ddp_timeout: 1800
+ ```
+- **FSDP**: Fully Sharded Data Parallelism (FSDP) is supported for model sharding.


Please remove this from here. We have not done any experiments or added any support for FSDP in the pipeline yet.

tchawada requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners January 21, 2026 11:12

quic-swatia reviewed Jan 27, 2026

View reviewed changes

QEfficient/finetune/experimental/core/config_manager.py Outdated Show resolved Hide resolved

quic-akuruvil reviewed Jan 29, 2026

View reviewed changes

quic-akuruvil reviewed Jan 30, 2026

View reviewed changes

QEfficient/finetune/experimental/core/config_manager.py Show resolved Hide resolved

quic-meetkuma reviewed Jan 30, 2026

View reviewed changes

quic-meetkuma mentioned this pull request Jan 30, 2026

[QEff. Finetuning] Adding finetune_experiemental.py and related files #731

Open

quic-akuruvil reviewed Feb 4, 2026

View reviewed changes

ochougul and others added 17 commits February 5, 2026 09:25

General disagg fix for prefill-only model (quic#698)

a30bd44

carry over patch quic#693 Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

Evaluating the values of CCL lists for different scenarios (quic#710)

198e3bb

Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>

Updated finetune docs for MULTI NODE Training (quic#717)

3869850

Added step wise instructions for MULTI NODE Finetuning. --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

HOTFIX: changes in alpaca and grammar dataset utils (quic#724)

e34107e

Handled the edge case where num samples in a dataset are less than 20. Corrected the dataset link in grammar_dataset.py Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>

Removed OpenGVLab/InternVL2_5-1B and OpenGVLab/InternVL3_5-1B (quic#736)

71a3abe

Removed OpenGVLab/InternVL2_5-1B and OpenGVLab/InternVL3_5-1B test due to a compiler issue to unblock the CI --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Qeff versioning (quic#741)

3bf9b56

Updated Qeff version to mainline --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Fix for Qwen 2.5 VL with subfunction (quic#733)

ff23005

Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>

Fixed torch patch for subfunction with VLMs (quic#750)

e1b9dea

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Fixing SW issue in Gemma3 (quic#740)

ba24144

The SW issue came with prompt + generation length > SW. Fix 1. Cache updated with HybridSlidingWindowCache in cache utils --------- Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Adding Full document for hf_based finetuning stack

97bff0a

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

moving some params from trainconfig to modelconfig

ed03595

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating config manager so it include all params from master config

b115abf

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

updating Config Manager to add CLI flags into file

1fed726

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

updating Config Manager to add CLI flags into file

2dc23ae

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating config manager

f115a26

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

tchawada and others added 9 commits February 5, 2026 09:54

Changing documentation and config manager

88dd7ec

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating config_manager

13fef72

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating config

17d38a7

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating documentation

ba07081

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

making torch_compile=False by default

65f3d3b

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

moving some params from trainconfig to modelconfig

4784629

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Updating config manager so it include all params from master config

692d02a

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Changing documentation and config manager

00feabf

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

tchawada force-pushed the ft_config branch from 2104dbb to 00feabf Compare February 5, 2026 09:56

tchawada added 2 commits February 5, 2026 09:59

Removing create_trainer_config from config_manager.py

a1e2e66

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

Removing create_trainer_config from config_manager.py

7daf4ce

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>

quic-akuruvil merged commit b78efe6 into quic:ft_experimental Feb 5, 2026
2 of 3 checks passed

quic-swatia mentioned this pull request Feb 11, 2026

[QEff. Finetuning]: Adding FinetuningPipeline (finetune_experiemental.py) and related code #791

Merged

quic-swatia reviewed Feb 19, 2026

View reviewed changes



		def parse_arguments(config_path: Optional[str] = None, args: Optional[List[str]] = None) -> MasterConfig:
		def parse_arguments() -> MasterConfig:

Conversation

tchawada commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

quic-swatia commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-akuruvil left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quic-meetkuma left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Comments

tchawada commented Jan 21, 2026 •

edited

Loading

quic-akuruvil left a comment •

edited

Loading

quic-meetkuma left a comment •

edited

Loading