Heapsize limit, Device memory management in general #183
Replies: 3 comments 1 reply
-
How do i use TaskSchedule.syncObjects, updateReference (no javadoc) and other interesting looking stuff ? TornadoAPI. Why is streamIn sometimes called, and other times not - when in both cases the data is used, what is forceCopyIn Is execute synchronous ? How to use KernelContext. allocate*, *Barrier in general (some clues in examples.kernelcontext.reductions is see) |
Beta Was this translation helpful? Give feedback.
-
Hi @ian-p-johnson , thank you for your feedback, answers below: Question:
Answer: Quoting from the OpenCL SPEC:
Full discussion here: https://forums.developer.nvidia.com/t/why-is-cl-device-max-mem-alloc-size-never-larger-than-25-of-cl-device-global-mem-size-only-on-nvidia/47745 This limitation does not exist in the PTX Backend, and Level Zero exceeds the 25% limitation capacity. So, you can try those backends. In TornadoVM, we enabled batch processing using the Having said that, we are currently improving the device memory management in TornadoVM to better utilize device's memory. Taking as an example the memory management implemented in the Marawacc runtime: Question:
Answer: For instance: ts.streamIn(vars).task(....);
ts.execute();
// Force CopyOut
ts.syncObject(output); In essence, TornadoVM will only copies data when the At the user level, when invoking the streamIn/streamOut methods, there are actually no data transfers involved at this point. It only indicates the TornadoVM runtime which buffers should be copy In and Out when running the Question:
Answer: for (int i = 0; i < TornadoRuntime.getTornadoRuntime().getNumDrivers(); i++) {
final TornadoDriver driver = TornadoRuntime.getTornadoRuntime().getDriver(i);
for (int j = 0; j < driver.getDeviceCount(); j++) {
driver.getDevice(j).reset();
}
} With the new model we are working on, this will be automatic within TornadoVM. Question:
Answer: Hope this helps. Please let's know what to clarify. |
Beta Was this translation helpful? Give feedback.
-
PTX worked perfectly - I just backed off a little from the max to leave space for code etc but I am now managing to use 1_044_000_000 x 2 floats using I'll take a look through the rest of the stuff you sent now - thx Great project BTW |
Beta Was this translation helpful? Give feedback.
-
Maybe I missed something in the documentation but I am having problems accessing more on device heap than 2GB (on my GTX 1070 8GB) By default it seems to be 1GB and i can increase it to 2GB using -Dtornado.heap.allocation=2GB but all values above 2GB are effectively capped at 2GB. I have confirmed the amount of data I am using and i get an error if i use more than 2GB
Is there any way i can use the full 8GB on the GTX 1070
Ubuntu 21.10 (Impish)
Driver: 510.54
CUDA: 11.6
OpenJdk-11
I am looking to use it for trading strategy optimisation and i hoped to upload chunks of source artefacts (ticks, bars, indicators etc) and then strategy parameter sets, outputting trade histories or at minimal summaries. I was hoping to fill the available GPU memory with an optimal data set and then call tasks to submit strategy parameter sets and pull back responses, effectively streaming until I ran out of source data (and then perhaps attempt to overwrite old unused data if i can't free it on the device, before submitting more parameter sets)
Can i interlace a number of TaskSchedule.execute, calling each in turn (say to upload data, upload work*, download results available*, upload more data, submit more work, etc) I don't see any threading primitives so is there an advised way to managing queing on the GPU (barriers etc), or ensuring that blocks of results have been consistently written to GPU memory (so i can use a flag to indicate completion) At the moment i can only imagine managing this on the client and reusing GPU heap space when no longer needed
When i stream a value out, is the GPU memory recovered?
How do a close a "session", releasing all GPU resources - without closing my client VM - so i can start again with a new strategy/data set
(I see some stuff in TornadoDevice ensureAllocated/Present, enqueueBarrier etc but with no guide on how to use them safely, also some goodies in examples/memory & examples.MultipleTasks) - i can guess, but i'd rather not)
There is a general lack of documentation for memory management on the GPU
Beta Was this translation helpful? Give feedback.
All reactions