Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #62

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Performance improvements #62

wants to merge 8 commits into from

Conversation

Agade09
Copy link

@Agade09 Agade09 commented Apr 15, 2023

To grab() a 1440x2560 screen, profiling total time spent in ctypes.string_at() went from ~20% to ~0%.
A further optimization is made if grabbing a subset of the screen. Then if we want a 480x640 region of a 1440x2560 screen, only 480x2560x4 contiguous pixels need to be copied.
Overall FPS improvements on my machine, grabbing 1440x2560 in BGRA ~271FPS -> ~685FPS

Agade09 added 2 commits April 15, 2023 10:04
In profiling a benchmark code this takes processing overhead from ~14% to ~1.4% and ~0.06% respectively
…ller than the whole screen.

The idea is to ask ctypes.string_at for as little memory as possible. Since images are stored in memory with width being the fast index. If we want to grab a 480x640 region from a 1440x2560 screen we can ask ctypes.string_at() for a 480x2560 region. This reduces memory allocation and memcpy overhead in ctypes.string_at().
To grab a 480x640 region out of a 1440x2560 screen the profiler time spent went from ~24% to ~8%.
@Agade09 Agade09 changed the title Performance improvement of numpy processor for BGRA->BGR and BGRA->RGB conversion Performance improvement of numpy processor Apr 15, 2023
Agade09 added 4 commits April 16, 2023 12:47
…ere its width region matches the screen's width.
…rom_address API instead. In profiling a 1440x2560 grab, total time spent went from 20% in string_at() to almost 0% in from_address. My understanding is that string_at uses memove which is slower than the memcpy I suspect from_address uses.
…that case self.region is used, and self.region was already validated when it was defined.

In profiling a max FPS benchmark with no region defined, this spares 3% of total execution time.
@Agade09 Agade09 changed the title Performance improvement of numpy processor Performance improvements Apr 17, 2023
Agade09 added 2 commits April 17, 2023 13:34
… if statement bypassed the call to self.process_cvtcolor(). Simplify code in process_cvtcolor since it no longer needs to handle 'BGRA''.

In profiling this spares 0.4% of total execution time in 'BGRA' mode.
…but it was producing a non-contiguous array, which changes the behavior of the library
@ninjatall12
Copy link

Crazy optimisations, where did you get the experience to know how to improve this code?

@ra1nty ra1nty self-assigned this Apr 22, 2023
@ra1nty
Copy link
Owner

ra1nty commented Apr 22, 2023

Thanks for the commit! Interesting optimization. Let me take a look and do some benchmarks & merge

@Agade09
Copy link
Author

Agade09 commented Apr 29, 2023

@ninjatall12 Is this with dxcam in pure python or with AI-M-BOT's pyd ? Because I also have issues with the py310 binary he made.

PS: Thanks, I did a lot of competitions on Codingame and just got experience over the years at work and in personal projects. I was profiling a project of mine with pprofile and noticed that dxcam was spending a lot of time in ctypes.string_at() which just copies memory according to its description; I found this very suspicious and investigated.

@AI-M-BOT
Copy link

@Agade09 with AI-M-BOT's pyd, i am using his python 3.11 binary. thanks for the quick reply. will let him know about this

would you share the part of your script taking screenshots? was that area using windll.user32.SetWindowDisplayAffinity?

@AI-M-BOT
Copy link

image

reproduced, my bad

Also i will recommend using grab function instead of using start(), start function create a new thread which doesn't benefit in performance (in Python)
Taking screenshots using dxgi repeatly without sleeping will dramatically lower down fps of game. Also the sleep function author used in this project is not precise enough (which sleeps around 15ms at least), i will recommend the following function:

from ctypes import windll

def nanosleep(num: float) -> None:
    windll.winmm.timeBeginPeriod(1)
    windll.kernel32.Sleep(int(num))
    windll.winmm.timeEndPeriod(1)

@crackwitz
Copy link

nanosleep

that's not a nanosleep. that gives you whole milliseconds at best. and it affects the kernel.

https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

@AI-M-BOT
Copy link

AI-M-BOT commented Apr 29, 2023

nanosleep

that's not a nanosleep. that gives you whole milliseconds at best. and it affects the kernel.

https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

i know, just need to sleep as precise as possible, my function name means nothing, why care about it

@AI-M-BOT
Copy link

@AI-M-BOT Thanks for the tip about sleepnig but i still face this issue image after using camera.grab()

please use grab(region=region)

@Agade09
Copy link
Author

Agade09 commented Apr 29, 2023

@AI-M-BOT Could you detail how you compile these .pyd? With Nuitka, I can't get binaries as fast as yours. My binaries also don't reproduce the bug ninjatall12 and I have been seeing.
I have been using python -m nuitka --lto=yes --module dxcam --include-package=dxcam.

@AI-M-BOT
Copy link

AI-M-BOT commented Apr 29, 2023

should be fine now, just replace with new file
https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

@AI-M-BOT
Copy link

create a file named dxshot.py or whatever you prefer, copy the content of dxcam/init.py into dxshot.py, run cmd
python -m nuitka --mingw64 --module --show-progress --no-pyi-file --remove-output --follow-import-to=dxcam dxshot.py

@AI-M-BOT
Copy link

should be fine now, just replace with new file https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

still the same image I get this error either using cam.grab(region=region) or cam.get_latest_frame()

are you using Python 3.9?
Is your testing script open source on github?

Exception ignored in: <function _compointer_base.__del__ at 0x000002A710E128B0>
Traceback (most recent call last):
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 956, in __del__
    self.Release()
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 1211, in Release
    return self.__com_Release()
OSError: exception: access violation writing 0xFFFFFFFFFFFFFFFF

I only got this and script still works.
I just tested all versions with pure grab() and get_last_frame(), no issue on my laptop

@ninjatall12
Copy link

ninjatall12 commented May 2, 2023

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

@ParticleG
Copy link

should be fine now, just replace with new file https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

still the same image I get this error either using cam.grab(region=region) or cam.get_latest_frame()

are you using Python 3.9? Is your testing script open source on github?

Exception ignored in: <function _compointer_base.__del__ at 0x000002A710E128B0>
Traceback (most recent call last):
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 956, in __del__
    self.Release()
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 1211, in Release
    return self.__com_Release()
OSError: exception: access violation writing 0xFFFFFFFFFFFFFFFF

I only got this and script still works. I just tested all versions with pure grab() and get_last_frame(), no issue on my laptop

Hi, could you upload your version to pypi or provide a way that requirements.txt can use? Pycharm doesn't like .pyd files and keeps complaining about module not found errors.

@lucasmonstrox
Copy link

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

Is possible to python code call your c++ code? Or use cython style?

@lucasmonstrox
Copy link

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

Also, can you show me cpu usage, memory usage and FPS?

Fidelxyz added a commit to Fidelxyz/DXCam-CPP that referenced this pull request Nov 8, 2023
Fidelxyz added a commit to Fidelxyz/DXCam-CPP that referenced this pull request Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants