Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow separate queue for fft and ifft #17

Merged
merged 9 commits into from
Jan 6, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 87 additions & 7 deletions pyvkfft/opencl.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,20 +155,60 @@ def _make_config(self):
int(self.use_lut), int(self.keepShaderCode),
n_batch, skipx, skipy, skipz)

def fft(self, src: cla.Array, dest: cla.Array = None):
def fft(self, src: cla.Array, dest: cla.Array = None, queue: cl.CommandQueue = None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the array also carries a queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which can be None right? Any suggestions on what to do?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be None, but I think it's reasonable to expect an array passed in here to have a valid queue.

If you were designing this API from scratch, I think the following fallback order would make sense:

  • queue (if passed)
  • src.queue and dest.queue (if given, must be the same)
  • I would not consider self.queue.

For backward compatibility, I would warn if self.queue disagrees with queue resulting from the above and say that self.queue is being used for now, but that that behavior is deprecated, with a certain fuse.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the 'simple' fft API (pyvkfft.fft.fftn, pyvkfft.fft.ifftn...) already allows to transform just by passing an array - its queue will be used as default, and a different one can be supplied. This already allows to use different queues for different transforms.

If you were designing this API from scratch, I think the following fallback order would make sense:

  • queue (if passed)
  • src.queue and dest.queue (if given, must be the same)
  • I would not consider self.queue.

For backward compatibility, I would warn if self.queue disagrees with queue resulting from the above and say that self.queue is being used for now, but that that behavior is deprecated, with a certain fuse.

There can be some strange results when different queues are used, e.g. when using a different queue from the array's, the get() method will not wait for the fft to be finished.

I would still keep VkFFTApp's self.queue as the default (assuming it's a deliberate choice from the developer), but a warning about when different queues are used will be useful.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue choice is indeed debatable.. Preferring the array queues could also make sense.. It really depends on the use case.

The main use case is probably not a single array which is transformed with different queues at different times, but rather different arrays which are transformed by the same VkFFTApp. In which case using the array queues would indeed be easier...

"""
Compute the forward FFT
:param src: the source pyopencl Array
:param dest: the destination pyopencl Array. Should be None for an inplace transform
:param queue: the pyopencl CommandQueue to use for the transform. If not given,
the queue given when constructing the application is used.
:raises RuntimeError: in case of a GPU kernel launch error
:return: the transformed array. For a R2C inplace transform, the complex view of the
array is returned.
"""
if not queue:
if dest is None:
if src.queue is None:
warnings.warn("queue is not given and the source array does not have a queue
attached to it. Falling back to the queue given to the constructor
of VkFFTApp. This is deprecated and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
else:
queue = src.queue
elif dest.queue is None and src.queue is None:
warnings.warn("queue is not given and the source/dest arrays do not have a queue
attached to it. Falling back to the queue given to the constructor
of VkFFTApp. This is deprecated and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
elif dest.queue != src.queue:
if dest.queue is None:
queue = src.queue
elif src.queue is None:
queue = dest.queue
else:
warnings.warn("queue is not given and the source/dest arrays are not equal.
Falling back to the queue given to the constructor. This is deprecated
and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
else:
queue = src.queue

if queue != self.queue:
# TODO: warn here if self.queue has not been finished yet.
# There is no way to check that a queue is finished in pyopencl.
pass
isuruf marked this conversation as resolved.
Show resolved Hide resolved

if self.inplace:
if dest is not None:
if src.data.int_ptr != dest.data.int_ptr:
raise RuntimeError("VkFFTApp.fft: dest is not None but this is an inplace transform")
res = _vkfft_opencl.fft(self.app, int(src.data.int_ptr), int(src.data.int_ptr), int(self.queue.int_ptr))
res = _vkfft_opencl.fft(self.app, int(src.data.int_ptr), int(src.data.int_ptr), int(queue.int_ptr))
check_vkfft_result(res, src.shape, src.dtype, self.ndim, self.inplace, self.norm, self.r2c,
self.dct, backend="opencl")
if self.norm == "ortho":
Expand All @@ -186,27 +226,67 @@ def fft(self, src: cla.Array, dest: cla.Array = None):
raise RuntimeError("VkFFTApp.fft: dest and src are identical but this is an out-of-place transform")
if self.r2c:
assert (dest.size == src.size // src.shape[-1] * (src.shape[-1] // 2 + 1))
res = _vkfft_opencl.fft(self.app, int(src.data.int_ptr), int(dest.data.int_ptr), int(self.queue.int_ptr))
res = _vkfft_opencl.fft(self.app, int(src.data.int_ptr), int(dest.data.int_ptr), int(queue.int_ptr))
check_vkfft_result(res, src.shape, src.dtype, self.ndim, self.inplace, self.norm, self.r2c,
self.dct, backend="opencl")
if self.norm == "ortho":
dest *= self._get_fft_scale(norm=0)
return dest

def ifft(self, src: cla.Array, dest: cla.Array = None):
def ifft(self, src: cla.Array, dest: cla.Array = None, queue: cl.CommandQueue = None):
"""
Compute the backward FFT
:param src: the source pyopencl.Array
:param dest: the destination pyopencl.Array. Can be None for an inplace transform
:param queue: the pyopencl CommandQueue to use for the transform. If not given,
the queue given when constructing the application is used.
:raises RuntimeError: in case of a GPU kernel launch error
:return: the transformed array. For a C2R inplace transform, the float view of the
array is returned.
"""
if not queue:
if dest is None:
if src.queue is None:
warnings.warn("queue is not given and the source array does not have a queue
attached to it. Falling back to the queue given to the constructor
of VkFFTApp. This is deprecated and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
else:
queue = src.queue
elif dest.queue is None and src.queue is None:
warnings.warn("queue is not given and the source/dest arrays do not have a queue
attached to it. Falling back to the queue given to the constructor
of VkFFTApp. This is deprecated and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
elif dest.queue != src.queue:
if dest.queue is None:
queue = src.queue
elif src.queue is None:
queue = dest.queue
else:
warnings.warn("queue is not given and the source/dest arrays are not equal.
Falling back to the queue given to the constructor. This is deprecated
and will stop working in the future.
Use src_array.with_queue(queue) to attach a queue to the array or
pass a queue to this method", DeprecationWarning)
queue = self.queue
else:
queue = src.queue

if queue != self.queue:
# TODO: warn here if self.queue has not been finished yet.
# There is no way to check that a queue is finished in pyopencl.
pass

if self.inplace:
if dest is not None:
if src.data.int_ptr != dest.data.int_ptr:
raise RuntimeError("VkFFTApp.fft: dest!=src but this is an inplace transform")
res = _vkfft_opencl.ifft(self.app, int(src.data.int_ptr), int(src.data.int_ptr), int(self.queue.int_ptr))
res = _vkfft_opencl.ifft(self.app, int(src.data.int_ptr), int(src.data.int_ptr), int(queue.int_ptr))
check_vkfft_result(res, src.shape, src.dtype, self.ndim, self.inplace, self.norm, self.r2c,
self.dct, backend="opencl")
if self.norm == "ortho":
Expand All @@ -227,10 +307,10 @@ def ifft(self, src: cla.Array, dest: cla.Array = None):
# Special case, src and dest buffer sizes are different,
# VkFFT is configured to go back to the source buffer
res = _vkfft_opencl.ifft(self.app, int(dest.data.int_ptr), int(src.data.int_ptr),
int(self.queue.int_ptr))
int(queue.int_ptr))
else:
res = _vkfft_opencl.ifft(self.app, int(src.data.int_ptr), int(dest.data.int_ptr),
int(self.queue.int_ptr))
int(queue.int_ptr))
check_vkfft_result(res, src.shape, src.dtype, self.ndim, self.inplace, self.norm, self.r2c,
self.dct, backend="opencl")
if self.norm == "ortho":
Expand Down
39 changes: 39 additions & 0 deletions pyvkfft/test/test_fft.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def ascent():
try:
import pyopencl as cl
import pyopencl.array as cla
from pyvkfft.opencl import VkFFTApp as clVkFFTApp

has_pyopencl = True
except ImportError:
Expand Down Expand Up @@ -412,6 +413,44 @@ def test_pycuda_streams(self):
dn = np.roll(d, i * 7, axis=1)
self.assertTrue(np.allclose(dn, vd[i].get(), rtol=rtol, atol=abs(dn).max() * rtol))

@unittest.skipIf(not has_pyopencl, "pyopencl is not available")
def test_pyopencl_queues(self):
"""
Test multiple FFT with queues different from the queue used
in creating the VkFFTApp
"""
for dtype in (np.complex64, np.complex128):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check for complex128/fp64 support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I'm already checking for complex128.

with self.subTest(dtype=np.dtype(dtype)):
init_ctx("pyopencl", gpu_name=self.gpu, verbose=False)
ctx = gpu_ctx_dic["pyopencl"][2].context
if dtype == np.complex64:
rtol = 1e-6
else:
rtol = 1e-12
sh = (256, 256)
d = (np.random.uniform(-0.5, 0.5, sh) + 1j * np.random.uniform(-0.5, 0.5, sh)).astype(dtype)
n_queues = 5
vd = []
vapp = []

with cl.CommandQueue(ctx) as cq:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the queue using with may be dangerous here. This means the queue should not be used anymore outside of the withblock, but a reference to that queue will still persist in the VkFFTApp object. It is not used later in the code so it is fine here, but I think it's giving a dangerous example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that it's dangerous to keep persisting the queue in VkFFTApp. Latest pyopencl will give a warning if the queue is used outside of the with block though. inducer/pyopencl#561

for i in range(n_queues):
vd.append(cla.to_device(cq, np.roll(d, i * 7, axis=1)))
vapp.append(clVkFFTApp(d.shape, d.dtype, ndim=2, norm=1, queue=cq))

queues = [cl.CommandQueue(ctx) for _ in range(n_queues)]
for i in range(n_queues):
vapp[i].fft(vd[i], queue=queues[i])
for i in range(n_queues):
dn = fftn(np.roll(d, i * 7, axis=1))
self.assertTrue(np.allclose(dn, vd[i].get(), rtol=rtol, atol=abs(dn).max() * rtol))

for i in range(n_queues):
vapp[i].ifft(vd[i], queue=queues[i])
for i in range(n_queues):
dn = np.roll(d, i * 7, axis=1)
self.assertTrue(np.allclose(dn, vd[i].get(), rtol=rtol, atol=abs(dn).max() * rtol))


# The class parameters are written in pyvkfft_test.main()
class TestFFTSystematic(unittest.TestCase):
Expand Down