Skip to content

Conversation

@nksauter
Copy link
Contributor

@nksauter nksauter commented Apr 9, 2024

I've created a failing unit test for new kokkos code in the cuda-context.

If the test "libtbx.python cctbx_project/simtbx/tests/tst_memory_policy.py" can be made to work, the bug has been fixed.

Background information: I'm trying to extend the kokkos exascale_api with new behavior. In the old way, it would allocate large arrays on GPU corresponding to the detector size even if only a few pixels are calculated due to the whitelist (a list of pixels of interest). In the new behavior only enough memory is allocated for the whitelist pixels. I am using C++ templates with template specialization dependent on these two memory-allocation cases. I need this to work so Daniel can move ahead with the SPREAD project, and it is just this last detail that I cannot seem to fix.

@nksauter nksauter requested review from Baharis and JBlaschke April 9, 2024 23:14
@nksauter nksauter requested a review from dermen April 24, 2024 18:49
@dermen
Copy link
Contributor

dermen commented May 9, 2024

@nksauter , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ?

@nksauter
Copy link
Contributor Author

nksauter commented May 9, 2024 via email

Copy link
Contributor

@Baharis Baharis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 3 weeks of testing via psii_spread's annulus worker, this PR was indeed found to significantly lower the memory footprint when simulating images on GPU. I don't have particular issues with the functionality of this code.

@nksauter nksauter force-pushed the memory_policy branch 2 times, most recently from edb7ebb to 7e5ae72 Compare June 15, 2024 00:37
In the exascale_api, allow pixel values to be calculation either on
large array (all pixels), or with low-memory on just the whitelist
consisting of shoebox pixels.  This commit only gives the polymorphism
framework; both implementations are currently identical giving the
large-array behavior.
The script tests/tst_memory_policy.py fails with a cuda illegal access.
The intention is to get help from NESAP to get a functional test.
Memory savings achieved through code specialization, for the case where
pixel values are simulated on a small whitelist. Specializations are not
yet optimal, as there is still a lot of code duplication.

Changes give ~4.5x reduction in memory footprint, but no success yet in
resizing the array m_accumulate_floatimage.  Attempts so far lead to
cuda memory allocation error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants