Skip to content

feat(QEff: Attn): add KV & Q blocking strategies for causal LMs#774

Draft
vbaddi wants to merge 9 commits intoquic:mainfrom
vbaddi:enable_blocking_causal_lm_models
Draft

feat(QEff: Attn): add KV & Q blocking strategies for causal LMs#774
vbaddi wants to merge 9 commits intoquic:mainfrom
vbaddi:enable_blocking_causal_lm_models

Conversation

@vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Feb 3, 2026

Summary

  • This PR introduces QEfficient attention blocking for causal language models, adding KV and Q blocking strategies that preserve numerical outputs and scalability at long sequence lengths.

Key Features

  • KV Blocking: Blocked compute over KV cache along sequence dim.
  • Q Blocking: Blocked compute over query sequence.
  • Currently, Configurable via qaic_config (explicit or auto).

WIP

  • Implement the Auto Blocking
  • Move the transforms structure from .from_pretrained() to .compile() and keep a depreciation warning at .from_pretrained() level.
  • Unit tests

++ @kdulla

vbaddi and others added 6 commits January 18, 2026 12:40
- Add strategy registry and AttentionBlockingConfig for extensible blocking
- Implement BlockedKVAttentionTransform for supported attention modules
- Add auto-blocking policy with device and model specific params.
- Integrate KV blocking into Llama-like models using QEffDynamicCache

Key components:
* attention_blocking.py: Strategy registry and config
* attention_blocking_policy.py: Auto-derive policy
* blocked_attention_utils.py: KV blocked attention kernels
* pytorch_transforms.py: Module-level blocking application

Usage:
  qaic_config = {"num_kv_blocks": 2} # explicit
  qaic_config = {"attn_blocking_auto": True}  # automatic

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ic_config

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi added enhancement New feature or request qeff.blocking labels Feb 3, 2026
@vbaddi vbaddi marked this pull request as draft February 3, 2026 10:35
Signed-off-by: Kushal Dulla <quic_kdulla@quicinc.com>
…ferent kinds of blocking

Signed-off-by: Kushal Dulla <quic_kdulla@quicinc.com>
Signed-off-by: Kushal Dulla <quic_kdulla@quicinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request qeff.blocking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments