Python Convolution Comparisons

Convolutions, or sliding dot products, are fundamental operations in scientific computing. The goal here is to explore the possible approaches in python. The graphs shown are number of elements vs runtimes.

Ideal implementation would have the inputs (x,K) where x is 1d or 3d K is [0th order, 1st order, 2nd, order, ... nth order].

Example usages:

x1 = np.random.uniform(-1,1,100)
x3 = np.random.uniform(-1,1,(100,10,10))
k1 = np.random.uniform(-1,1,10))
k2 = np.random.uniform(-1,1,(10,10))
k3 = np.random.uniform(-1,1,(10,10,10))
conv( x3, k3 )
conv( x1, k1 )
conv( x1, k2 )
conv( x1, k3 )
conv( x1, [ k1 , k2 , k3 ] )

1D First Order

An example 1D signal could be audio amplitudes. An example filter would be an impulse response. The filter slides over a window of the larger audio signal. Corresponding terms are multiplied and then summed together to construct a filtered version of the signal.

It was found that numpy.convolve is the fastest for a large variety of inputs, but after a sufficiently large size it makes sense to use cupy. The gpu implementations started to become worth using at around 10^7.

Best to worst for larger arrays:

cpste - cupy strided @Alexer
nbc_gm, nbc - numba cuda improvement from @gmarkall via numba community support
cpv, spfcv, spvc, npc
- cpv - cupyx.scipy.ndimage,
- spvc - scipy.signal.convolve
- spfvc - scipy.signal.fftconvolve
- npc - numpy.convolve
npste, nbpd_np, nbp, nbnpc, nbnpc_nj
- npste - numpy strided @Alexer
- nbpd_np - numba parallel dot, no gil
- nbp - numba parallel
- nbnpc_nj - @njit numba numpy convolve
npd - numpy dot
npfe - numpy for einsum
naive - (not shown) two for loops, significantly worse

1D Second Order

Best to worst for larger arrays:

nbc - numba cuda (naive implementation with @cuda.njit)
nbconv - numba njit (naive implementation with njit)
npfe - numpy for loop einsum (generate the outer product on demand)
npe - numpy einsum (generate the outer product all at once first)

1D Third Order

Best to worst for larger arrays:

nbcg, nbcg1024, nbc
nbcg - numba cuda (naive implementation with @cuda.njit and grid)
nbcg1024 - numba cuda (naive implementation with @cuda.njit and grid, 1024 threads/block)
nbc - numba cuda (naive implementation with @cuda.njit)
nbconv - numba njit (naive implementation with njit)

3D First Order

A video would be a good example of a 3D signal.

Best to worst for larger arrays:

cpv - cupyx.scipy.ndimage
cpste - cupy stride tricks @Alexer
nbcg - numba cuda (naive implementation with @cuda.njit and grid)
npste - numpy stride tricks @Alexer
nbconv - numba njit (naive implementation with njit)
spv - scipy.signal.convolve valid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python Convolution Comparisons

1D First Order

1D Second Order

1D Third Order

3D First Order

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python Convolution Comparisons

1D First Order

1D Second Order

1D Third Order

3D First Order