Skip to content

Conversation

@draganmladjenovic
Copy link
Contributor

Motivation

Makes sure that user doesn't have to distribute kernels nor set up AITER_ASM_DIR.

Technical Details

Embed code objects into binary. Use hipRegisterFatBinary to make it seamlessly work on multiple gpus. Make CFG tables read-only and AiterAsmKernels statically allocated.

Test Plan

Selected tests from op_tests on gfx942

@draganmladjenovic draganmladjenovic requested review from a team and valarLip January 23, 2026 23:21
@draganmladjenovic draganmladjenovic force-pushed the draganm/embed_hsaco_take_two branch 6 times, most recently from 458e295 to 83c8f2d Compare January 25, 2026 18:04
@draganmladjenovic
Copy link
Contributor Author

draganmladjenovic commented Jan 26, 2026

@valarLip How much memory does your MI350 node have. It seems that op_tests/test_mla_persistent.py is broken before my change on MI300 and I cannot test it on MI350 because 36 GiB is not enough?

@wangye805
Copy link

@yuguo68 This PR from our xla team should completely remove the AITER_ASM_DIR env

@yuguo68
Copy link
Contributor

yuguo68 commented Jan 26, 2026

@yuguo68 This PR from our xla team should completely remove the AITER_ASM_DIR env

thanks, #1862 gets merged over the weekend and I am going to use it for OSS PyTorch aiter update. @draganmladjenovic could this PR build on top of #1862?

Comment on lines 4 to +5
#include "asm_fmoe_configs.hpp"
#include "asm_fmoe_code_objects.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering when do we need both codegen files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still has few standalone kernels that are not in the tables but manually handled.

@draganmladjenovic
Copy link
Contributor Author

@yuguo68 This PR from our xla team should completely remove the AITER_ASM_DIR env

thanks, #1862 gets merged over the weekend and I am going to use it for OSS PyTorch aiter update. @draganmladjenovic could this PR build on top of #1862?

It should supersede it. It has completely different table format that almost completely in rodata and does not rely on file paths at all.

@draganmladjenovic draganmladjenovic force-pushed the draganm/embed_hsaco_take_two branch 7 times, most recently from 5b96bd0 to cf385ec Compare January 27, 2026 22:17
Embed code objects into binary. Use hipRegisterFatBinary to
make it seamlessly work on multiple gpus. Make CFG tables
read-only and AiterAsmKernels statically allocated.
@draganmladjenovic draganmladjenovic force-pushed the draganm/embed_hsaco_take_two branch from cf385ec to 424e9fa Compare January 28, 2026 01:05
@draganmladjenovic
Copy link
Contributor Author

@valarLip Your CI is broken. It is just that you do exit(0) when the kernel is not present https://github.com/ROCm/aiter/blame/main/csrc/include/aiter_hip_common.h#L39 I've changed that to a abort during refactoring and I now cannot pass CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants