You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDAScanCompactionConfig are per-stream per-data type bits of data, used during compation and other per-stream operation. I.e. used during PBM construction etc.
These are currently stored in a fixed size array of 128 elements (as only 128 streams can concurrently run on a single device). This fixed size is an issue.
In practice it is perfectly fine to create and use more than 128 streams, they will just not actively run concurrently on the same device at the same time.
Instead, as we know the width of the widest layer of any model, we know prior to the use of cuda scan compaction data how many elements we need, so we can allocate a correctly sized array of these configs at runtime.
A better fix would be to refactor this / the general use of per-stream bits of data to an alternate abstraction, but that will be a more significant investment of time, so can wait as part of #449.
This isssue presented itself for the concurrency benchmark when using more than 128 species for spatial messaging.
The text was updated successfully, but these errors were encountered:
CUDAScanCompactionConfig
are per-stream per-data type bits of data, used during compation and other per-stream operation. I.e. used during PBM construction etc.These are currently stored in a fixed size array of 128 elements (as only 128 streams can concurrently run on a single device). This fixed size is an issue.
In practice it is perfectly fine to create and use more than 128 streams, they will just not actively run concurrently on the same device at the same time.
Instead, as we know the width of the widest layer of any model, we know prior to the use of cuda scan compaction data how many elements we need, so we can allocate a correctly sized array of these configs at runtime.
A better fix would be to refactor this / the general use of per-stream bits of data to an alternate abstraction, but that will be a more significant investment of time, so can wait as part of #449.
This isssue presented itself for the concurrency benchmark when using more than 128 species for spatial messaging.
The text was updated successfully, but these errors were encountered: