Skip to content

Enable shm_comm support for arm#7800

Merged
tohtana merged 8 commits intodeepspeedai:masterfrom
phalani-paladugu:arm_shm
Feb 15, 2026
Merged

Enable shm_comm support for arm#7800
tohtana merged 8 commits intodeepspeedai:masterfrom
phalani-paladugu:arm_shm

Conversation

@phalani-paladugu
Copy link
Contributor

@phalani-paladugu phalani-paladugu commented Jan 20, 2026

This PR enables shared memory communication in single node for arm hosts - #7625

image

Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
@sfc-gh-truwase sfc-gh-truwase requested review from delock and removed request for GuanhuaWang, delock and hwchen2017 January 29, 2026 21:25
@delock
Copy link
Collaborator

delock commented Jan 30, 2026

Overall looks good to me. I'm also glad that the refractory from @heyujiao99 in #7519 really helps alot! Around 100 lines of code to support a new HW architecture, wow! Kudos to @phalani-paladugu and @heyujiao99 !

@phalani-paladugu
Copy link
Contributor Author

Screenshot 2026-02-02 at 3 24 33 PM

Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
@delock delock enabled auto-merge (squash) February 4, 2026 03:30
@phalani-paladugu
Copy link
Contributor Author

Hi, is there a known issue with the cpu-torch-latest CI job?
In the last two runs, the unit tests appear to complete successfully, but the job never finishes and eventually fails.
I don’t see any test failures in the logs—only that the workflow seems to hang after the tests have passed.

@sfc-gh-truwase
Copy link
Collaborator

@phalani-paladugu, yes, something seems to be wrong with cpu-torch-latest. Right now I see that it is running for over 2hrs, which is unexpected.

@loadams do you know what could be wrong?

@sfc-gh-truwase
Copy link
Collaborator

@phalani-paladugu FYI, @tohtana is investigating in the linked PR.

@tohtana
Copy link
Collaborator

tohtana commented Feb 14, 2026

Hi @phalani-paladugu,
Based on the investigation in #7851, I think this change will fix the hang issue in the cpu test. It's not a fully clean fix, but at least we can unblock merging your PR.
As I cannot push a commit to this PR (not editable by maintainers), can you merge the commit?

auto-merge was automatically disabled February 15, 2026 17:24

Head branch was pushed to by a user without write access

Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
@phalani-paladugu
Copy link
Contributor Author

Hi @tohtana ,
Thanks for the investigation and for proposing the fix. I have added the change you suggested. It looks like a few checks are currently awaiting approval. Should be good to go once they have cleared.

@tohtana tohtana merged commit 7f49367 into deepspeedai:master Feb 15, 2026
9 checks passed
@tohtana
Copy link
Collaborator

tohtana commented Feb 15, 2026

All tests passed and the PR is merged now. This is a significant update. Thank you @phalani-paladugu for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments