Memory policy #983

nksauter · 2024-04-09T23:14:40Z

I've created a failing unit test for new kokkos code in the cuda-context.

If the test "libtbx.python cctbx_project/simtbx/tests/tst_memory_policy.py" can be made to work, the bug has been fixed.

Background information: I'm trying to extend the kokkos exascale_api with new behavior. In the old way, it would allocate large arrays on GPU corresponding to the detector size even if only a few pixels are calculated due to the whitelist (a list of pixels of interest). In the new behavior only enough memory is allocated for the whitelist pixels. I am using C++ templates with template specialization dependent on these two memory-allocation cases. I need this to work so Daniel can move ahead with the SPREAD project, and it is just this last detail that I cannot seem to fix.

dermen · 2024-05-09T15:53:04Z

@nksauter , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ?

nksauter · 2024-05-09T16:36:52Z

@dermen stand by please with regard to the working tests. I'm going to apply the test to additional cases today to double check. The duplicated code is a serious problem, as my original intent was to implement polymorphism with the minimal lines of code, instead I had to duplicate an entire kernel. Further, I was unable to shrink the size of the m_accumulate_floatimage array as was the original intent.

…

On Thu, May 9, 2024 at 8:53 AM Derek Mendez ***@***.***> wrote: @nksauter <https://github.com/nksauter> , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ? — Reply to this email directly, view it on GitHub <#983 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ24VTVVOZLU4RA3OPDZETZBOLXNAVCNFSM6AAAAABF7NPLTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHE2DAMJXGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Baharis

After 3 weeks of testing via psii_spread's annulus worker, this PR was indeed found to significantly lower the memory footprint when simulating images on GPU. I don't have particular issues with the functionality of this code.

In the exascale_api, allow pixel values to be calculation either on large array (all pixels), or with low-memory on just the whitelist consisting of shoebox pixels. This commit only gives the polymorphism framework; both implementations are currently identical giving the large-array behavior.

The script tests/tst_memory_policy.py fails with a cuda illegal access. The intention is to get help from NESAP to get a functional test.

Memory savings achieved through code specialization, for the case where pixel values are simulated on a small whitelist. Specializations are not yet optimal, as there is still a lot of code duplication. Changes give ~4.5x reduction in memory footprint, but no success yet in resizing the array m_accumulate_floatimage. Attempts so far lead to cuda memory allocation error.

nksauter requested review from Baharis and JBlaschke April 9, 2024 23:14

phyy-nx force-pushed the memory_policy branch from 98d8720 to 510bc4d Compare April 10, 2024 16:22

nksauter requested a review from dermen April 24, 2024 18:49

Baharis force-pushed the memory_policy branch from dcb444f to 6d7a2d7 Compare May 20, 2024 22:24

Baharis approved these changes May 31, 2024

View reviewed changes

nksauter force-pushed the memory_policy branch 2 times, most recently from edb7ebb to 7e5ae72 Compare June 15, 2024 00:37

nksauter added 4 commits November 16, 2024 08:33

Debug case for non-working test.

79de5b7

The script tests/tst_memory_policy.py fails with a cuda illegal access. The intention is to get help from NESAP to get a functional test.

Remove debugging output to conserve stdout size.

fa0295b

nksauter force-pushed the memory_policy branch from 7e5ae72 to fa0295b Compare November 16, 2024 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory policy #983

Memory policy #983

nksauter commented Apr 9, 2024 •

edited

Loading

Uh oh!

dermen commented May 9, 2024

Uh oh!

nksauter commented May 9, 2024 via email

Uh oh!

Baharis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Memory policy #983

Are you sure you want to change the base?

Memory policy #983

Conversation

nksauter commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dermen commented May 9, 2024

Uh oh!

nksauter commented May 9, 2024 via email

Uh oh!

Baharis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nksauter commented Apr 9, 2024 •

edited

Loading