Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient memory access patterns in CUDA kernels #12

Open
dimitrivlachos opened this issue Oct 24, 2024 · 0 comments · May be fixed by #19
Open

Inefficient memory access patterns in CUDA kernels #12

dimitrivlachos opened this issue Oct 24, 2024 · 0 comments · May be fixed by #19
Labels
enhancement New feature or request

Comments

@dimitrivlachos
Copy link
Collaborator

dimitrivlachos commented Oct 24, 2024

Currently, all CUDA kernels (including new kernels from #1) use global memory accesses when accessing neighbouring pixels in their convolution operations. This approach is inefficient and slows down performance.

We should modify these kernels to use more spatially optimised memory for accessing neighbouring pixels, thereby reducing global memory latency and enhancing overall performance.

This could include the use of local memory, shared memory, texture memory or surface memory, to name a few.

@dimitrivlachos dimitrivlachos added the enhancement New feature or request label Oct 24, 2024
dimitrivlachos added a commit that referenced this issue Nov 5, 2024
Implement extended dispersion spotfinding

Implement a GPU-based version of the extended dispersion spotfinding
algorithm. This builds on regular dispersion by making two passes.
This allows for the detection of fainter spots by using the first pass
to detect candidate spots and exclude them from the background
calculation in the second pass.

Extended dispersion spotfinding is unavoidably slower than regular
dispersion by the fact that it requires two passes. However, the
performance gained through massively parallel processing on the GPU
should make this a viable option, when needed, even for fast feedback.

Create several CUDA kernels to perform the extended dispersion
spotfinding algorithm (`threshold.cu`, `erosion.cu`).

Refactor the dispersion kernel to share code with extended dispersion.

Move common code to `cuda_common.hpp`.

Create basic test script for extended dispersion spotfinding.

Add an `--algorithm` argument to `spotfinder.cc` along with the
necessary code to parse it, allowing for algorithm selection at runtime.

Add new files to the CMakeLists.txt file to include them in the build.

See also: #12, #13, #14
@dimitrivlachos dimitrivlachos linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant