[PyTorch] Add caching for attention backend selection results #1381

cyanguwa · 2024-12-19T11:15:18Z

Description

This PR adds caching for multiple attention configurations in the same run.

Currently we only cache one attention_params and its corresponding available_backends and selected_backend. If there are multiple configurations in the same training/inference job, for example, it runs like this: config 1, config 2, ..., config 1, config 2, ..., then the second config 1 or config 2 still needs to go through the get_attention_backend() call, which is CPU intensive. With this PR, we cache the backend analysis results for multiple configs, so the second time a config is run, no repeated analysis is needed. The max number of configs cached is currently set to 10.

Fixes #1349

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

Add caching for attention backend selection results

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa and others added 4 commits December 19, 2024 02:57

WIP: add caching for backend selection results

c8141cb

Signed-off-by: Charlene Yang <[email protected]>

remove debug info

92f3f6e

Signed-off-by: Charlene Yang <[email protected]>

remove one more comment

e975247

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6b9eb95

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Add caching for attention backend selection results #1381

[PyTorch] Add caching for attention backend selection results #1381

cyanguwa commented Dec 19, 2024

[PyTorch] Add caching for attention backend selection results #1381

Are you sure you want to change the base?

[PyTorch] Add caching for attention backend selection results #1381

Conversation

cyanguwa commented Dec 19, 2024

Description

Type of change

Changes

Checklist: