-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA accelerated PSNR #1175
base: master
Are you sure you want to change the base?
CUDA accelerated PSNR #1175
Conversation
Oh this will also contribute to ffmpeg as a colleague of mine has been experimenting with 8K footage and saw that there is no GPU accelerated PSNR as of now in ffmpeg. (At least not to our knowledge) |
Based on #1174 |
This looks interesting, but this doesn't have a lot of value considering it's still PSNR at the end of the day. Instead, I believe some focus should be on GPU accelerating much more powerful metrics like butteraugli and ssimulacra2 respectively: |
The motivation behind this is to not hold CUDA VMAF backe because of PSNR. If video is decoded accelerated it is already in GPU memory and would have to be downloaded to CPU just to calculate PSNR. |
@kylophone could you give this a review/test ? |
I tested this and there was a speed regression for vmaf only with raw inputs, likely due to the chroma copy. |
Yes that can be true, in ffmpeg that should not be happening. Can you put any numbers behind that speed regression? |
@kylophone any update on this ? As said the big benefit comes from using this with ffmpeg: GPU decode + GPU filter. If PSNR has to be calculated on the CPU the GPU data has to be downloaded and blocks processing a lot. |
@kylophone Do you see the speed regression on the standalone tool as a blocker ? In ffmpeg this would not lead to a compression due to either using HW decode or overlapping with the kernels which the standalone tool cannot do (blocking fread in the main thread). |
The speedup that we see is very significant GPU compared to CPU, this scales well for higher resolutions.
When used with FFmpeg this is especially important as also omits a needed PCI copy when using the hardware decoders. When i find more time i will do the same for SSIM but this is a little more work.