-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #62
base: main
Are you sure you want to change the base?
Conversation
In profiling a benchmark code this takes processing overhead from ~14% to ~1.4% and ~0.06% respectively
…ller than the whole screen. The idea is to ask ctypes.string_at for as little memory as possible. Since images are stored in memory with width being the fast index. If we want to grab a 480x640 region from a 1440x2560 screen we can ask ctypes.string_at() for a 480x2560 region. This reduces memory allocation and memcpy overhead in ctypes.string_at(). To grab a 480x640 region out of a 1440x2560 screen the profiler time spent went from ~24% to ~8%.
…tions of (90,180,270)
…ere its width region matches the screen's width.
…rom_address API instead. In profiling a 1440x2560 grab, total time spent went from 20% in string_at() to almost 0% in from_address. My understanding is that string_at uses memove which is slower than the memcpy I suspect from_address uses.
…that case self.region is used, and self.region was already validated when it was defined. In profiling a max FPS benchmark with no region defined, this spares 3% of total execution time.
… if statement bypassed the call to self.process_cvtcolor(). Simplify code in process_cvtcolor since it no longer needs to handle 'BGRA''. In profiling this spares 0.4% of total execution time in 'BGRA' mode.
…but it was producing a non-contiguous array, which changes the behavior of the library
Crazy optimisations, where did you get the experience to know how to improve this code? |
Thanks for the commit! Interesting optimization. Let me take a look and do some benchmarks & merge |
@ninjatall12 Is this with dxcam in pure python or with AI-M-BOT's pyd ? Because I also have issues with the py310 binary he made. PS: Thanks, I did a lot of competitions on Codingame and just got experience over the years at work and in personal projects. I was profiling a project of mine with pprofile and noticed that dxcam was spending a lot of time in ctypes.string_at() which just copies memory according to its description; I found this very suspicious and investigated. |
would you share the part of your script taking screenshots? was that area using windll.user32.SetWindowDisplayAffinity? |
reproduced, my bad Also i will recommend using grab function instead of using start(), start function create a new thread which doesn't benefit in performance (in Python)
|
that's not a nanosleep. that gives you whole milliseconds at best. and it affects the kernel. https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod |
i know, just need to sleep as precise as possible, my function name means nothing, why care about it |
please use |
@AI-M-BOT Could you detail how you compile these .pyd? With Nuitka, I can't get binaries as fast as yours. My binaries also don't reproduce the bug ninjatall12 and I have been seeing. |
should be fine now, just replace with new file |
create a file named dxshot.py or whatever you prefer, copy the content of dxcam/init.py into dxshot.py, run cmd |
are you using Python 3.9?
I only got this and script still works. |
Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end. |
Hi, could you upload your version to pypi or provide a way that requirements.txt can use? Pycharm doesn't like |
Is possible to python code call your c++ code? Or use cython style? |
Also, can you show me cpu usage, memory usage and FPS? |
To grab() a 1440x2560 screen, profiling total time spent in ctypes.string_at() went from ~20% to ~0%.
A further optimization is made if grabbing a subset of the screen. Then if we want a 480x640 region of a 1440x2560 screen, only 480x2560x4 contiguous pixels need to be copied.
Overall FPS improvements on my machine, grabbing 1440x2560 in BGRA ~271FPS -> ~685FPS