Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Upstream sharding annotation and sharding propagation #109

Open
comaniac opened this issue Jul 25, 2022 · 6 comments
Open

[Tracking] Upstream sharding annotation and sharding propagation #109

comaniac opened this issue Jul 25, 2022 · 6 comments
Assignees

Comments

@comaniac
Copy link
Contributor

comaniac commented Jul 25, 2022

This issue is used to track the progress of upstreaming sharding by @Tonny-Gu

Time Milestone Status
7/31/22 Test with BERT and refactor a part of resharding rules. Done
8/7/22 Submit a PR of ShardSpec. #115
8/14/22 Submit a PR of InferHint and Expansion Rule #123
8/25/22 Documents and unit tests

@hgt312 @Tonny-Gu please update this issue timely. Thanks.

@comaniac
Copy link
Contributor Author

comaniac commented Aug 2, 2022

@hgt312 @Tonny-Gu We have passed the first milestone. Please update the progress here.

@Tonny-Gu
Copy link
Contributor

Tonny-Gu commented Aug 3, 2022

@comaniac In the past week,

  • Make ShardSpec's mutable flag work. Now InferHint Pass will not return the partition solution that conflicts with this flag. When a spec is set to immutable, InferHint will not try to reshard the corresponding tensor.
  • Finish refactoring resharding rule - Replicated to Sharded.
  • Partially complete resharding rule - Sharded to Replicated (only support resharding one dimension for now).

However, it looks like there is a bug in ExpandOpCall Pass (C++ side). Currently, I am fixing this bug. But that won't postpone the plan of submitting the first PR.

@Tonny-Gu
Copy link
Contributor

Tonny-Gu commented Aug 8, 2022

The first PR is sent. Link: #115

@comaniac
Copy link
Contributor Author

@Tonny-Gu please update the progress for the InferHint and Expansion Rule.

@Tonny-Gu
Copy link
Contributor

@comaniac

The second PR is sent. Link: #123.

Currently for unknown reasons, there is an issue that occurred when importing RAF Python Package. Now, I am discussing with @hgt312 to find a solution. Thus, the code proposed in the PR remains untested.

Besides, some necessary infer hints and expansion rules for partitioning BERT models are not ready yet. I plan to prepare one additional PR to upstream them.

@comaniac
Copy link
Contributor Author

@comaniac

The second PR is sent. Link: #123.

Currently for unknown reasons, there is an issue that occurred when importing RAF Python Package. Now, I am discussing with @hgt312 to find a solution. Thus, the code proposed in the PR remains untested.

Besides, some necessary infer hints and expansion rules for partitioning BERT models are not ready yet. I plan to prepare one additional PR to upstream them.

Thanks for checking in. Please tag me in the PR when it's ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants