Performance improvements #62

Agade09 · 2023-04-15T10:12:34Z

To grab() a 1440x2560 screen, profiling total time spent in ctypes.string_at() went from ~20% to ~0%.
A further optimization is made if grabbing a subset of the screen. Then if we want a 480x640 region of a 1440x2560 screen, only 480x2560x4 contiguous pixels need to be copied.
Overall FPS improvements on my machine, grabbing 1440x2560 in BGRA ~271FPS -> ~685FPS

In profiling a benchmark code this takes processing overhead from ~14% to ~1.4% and ~0.06% respectively

…ller than the whole screen. The idea is to ask ctypes.string_at for as little memory as possible. Since images are stored in memory with width being the fast index. If we want to grab a 480x640 region from a 1440x2560 screen we can ask ctypes.string_at() for a 480x2560 region. This reduces memory allocation and memcpy overhead in ctypes.string_at(). To grab a 480x640 region out of a 1440x2560 screen the profiler time spent went from ~24% to ~8%.

…tions of (90,180,270)

…ere its width region matches the screen's width.

…rom_address API instead. In profiling a 1440x2560 grab, total time spent went from 20% in string_at() to almost 0% in from_address. My understanding is that string_at uses memove which is slower than the memcpy I suspect from_address uses.

…that case self.region is used, and self.region was already validated when it was defined. In profiling a max FPS benchmark with no region defined, this spares 3% of total execution time.

… if statement bypassed the call to self.process_cvtcolor(). Simplify code in process_cvtcolor since it no longer needs to handle 'BGRA''. In profiling this spares 0.4% of total execution time in 'BGRA' mode.

…but it was producing a non-contiguous array, which changes the behavior of the library

ninjatall12 · 2023-04-22T19:51:50Z

Crazy optimisations, where did you get the experience to know how to improve this code?

ra1nty · 2023-04-22T22:00:55Z

Thanks for the commit! Interesting optimization. Let me take a look and do some benchmarks & merge

Agade09 · 2023-04-29T12:39:51Z

@ninjatall12 Is this with dxcam in pure python or with AI-M-BOT's pyd ? Because I also have issues with the py310 binary he made.

PS: Thanks, I did a lot of competitions on Codingame and just got experience over the years at work and in personal projects. I was profiling a project of mine with pprofile and noticed that dxcam was spending a lot of time in ctypes.string_at() which just copies memory according to its description; I found this very suspicious and investigated.

AI-M-BOT · 2023-04-29T16:16:09Z

@Agade09 with AI-M-BOT's pyd, i am using his python 3.11 binary. thanks for the quick reply. will let him know about this

would you share the part of your script taking screenshots? was that area using windll.user32.SetWindowDisplayAffinity?

AI-M-BOT · 2023-04-29T16:30:30Z

reproduced, my bad

Also i will recommend using grab function instead of using start(), start function create a new thread which doesn't benefit in performance (in Python)
Taking screenshots using dxgi repeatly without sleeping will dramatically lower down fps of game. Also the sleep function author used in this project is not precise enough (which sleeps around 15ms at least), i will recommend the following function:

from ctypes import windll

def nanosleep(num: float) -> None:
    windll.winmm.timeBeginPeriod(1)
    windll.kernel32.Sleep(int(num))
    windll.winmm.timeEndPeriod(1)

crackwitz · 2023-04-29T16:40:46Z

nanosleep

that's not a nanosleep. that gives you whole milliseconds at best. and it affects the kernel.

https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

AI-M-BOT · 2023-04-29T17:05:59Z

nanosleep

that's not a nanosleep. that gives you whole milliseconds at best. and it affects the kernel.

https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

i know, just need to sleep as precise as possible, my function name means nothing, why care about it

AI-M-BOT · 2023-04-29T17:08:18Z

@AI-M-BOT Thanks for the tip about sleepnig but i still face this issue after using camera.grab()

please use grab(region=region)

Agade09 · 2023-04-29T17:20:38Z

@AI-M-BOT Could you detail how you compile these .pyd? With Nuitka, I can't get binaries as fast as yours. My binaries also don't reproduce the bug ninjatall12 and I have been seeing.
I have been using python -m nuitka --lto=yes --module dxcam --include-package=dxcam.

AI-M-BOT · 2023-04-29T18:25:45Z

should be fine now, just replace with new file
https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

AI-M-BOT · 2023-04-29T20:54:15Z

create a file named dxshot.py or whatever you prefer, copy the content of dxcam/init.py into dxshot.py, run cmd
python -m nuitka --mingw64 --module --show-progress --no-pyi-file --remove-output --follow-import-to=dxcam dxshot.py

AI-M-BOT · 2023-04-29T20:56:59Z

should be fine now, just replace with new file https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

still the same I get this error either using cam.grab(region=region) or cam.get_latest_frame()

are you using Python 3.9?
Is your testing script open source on github?

Exception ignored in: <function _compointer_base.__del__ at 0x000002A710E128B0>
Traceback (most recent call last):
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 956, in __del__
    self.Release()
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 1211, in Release
    return self.__com_Release()
OSError: exception: access violation writing 0xFFFFFFFFFFFFFFFF

I only got this and script still works.
I just tested all versions with pure grab() and get_last_frame(), no issue on my laptop

ninjatall12 · 2023-05-02T09:06:07Z

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

ParticleG · 2023-06-09T10:06:28Z

should be fine now, just replace with new file https://github.com/AI-M-BOT/DXcam/releases/tag/1.1

still the same I get this error either using cam.grab(region=region) or cam.get_latest_frame()

are you using Python 3.9? Is your testing script open source on github?
Exception ignored in: <function _compointer_base.__del__ at 0x000002A710E128B0>
Traceback (most recent call last):
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 956, in __del__
    self.Release()
  File "E:\embed_python\python39\Lib\site-packages\comtypes\__init__.py", line 1211, in Release
    return self.__com_Release()
OSError: exception: access violation writing 0xFFFFFFFFFFFFFFFF
I only got this and script still works. I just tested all versions with pure grab() and get_last_frame(), no issue on my laptop

Hi, could you upload your version to pypi or provide a way that requirements.txt can use? Pycharm doesn't like .pyd files and keeps complaining about module not found errors.

lucasmonstrox · 2023-06-18T10:18:20Z

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

Is possible to python code call your c++ code? Or use cython style?

lucasmonstrox · 2023-06-18T10:21:21Z

Moved on to using c++ instead of python since it better fits my usage case. massive performance bump and a lot less resource intensive was quite a pain to get opencv to work with dxgi but got there in the end.

Also, can you show me cpu usage, memory usage and FPS?

Agade09 added 2 commits April 15, 2023 10:04

Use numpy directly for BGRA->RGB and BGRA->BGR conversion.

a70ab3a

In profiling a benchmark code this takes processing overhead from ~14% to ~1.4% and ~0.06% respectively

Agade09 changed the title ~~Performance improvement of numpy processor for BGRA->BGR and BGRA->RGB conversion~~ Performance improvement of numpy processor Apr 15, 2023

Agade09 added 4 commits April 16, 2023 12:47

Fixed regression where numpy_processor was no longer correct for rota…

2833caf

…tions of (90,180,270)

Fixed not applying performance optimization if a region is defined wh…

94bec3f

…ere its width region matches the screen's width.

Don't call _validate_region in grab if the region is None because in …

87a8117

…that case self.region is used, and self.region was already validated when it was defined. In profiling a max FPS benchmark with no region defined, this spares 3% of total execution time.

Agade09 changed the title ~~Performance improvement of numpy processor~~ Performance improvements Apr 17, 2023

Agade09 added 2 commits April 17, 2023 13:34

Translate 'BGRA' color_mode to None in NumpyProcessor so the existing…

bf02b69

… if statement bypassed the call to self.process_cvtcolor(). Simplify code in process_cvtcolor since it no longer needs to handle 'BGRA''. In profiling this spares 0.4% of total execution time in 'BGRA' mode.

Revert using numpy for conversion of BGRA->BGR/RGB. Numpy was faster …

204a558

…but it was producing a non-contiguous array, which changes the behavior of the library

ra1nty self-assigned this Apr 22, 2023

ninjatall12 mentioned this pull request Apr 23, 2023

Performance improvements AI-M-BOT/DXcam#1

Merged

Fidelxyz added a commit to Fidelxyz/DXCam-CPP that referenced this pull request Nov 8, 2023

Apply performance improve proposed in ra1nty/DXcam#62

9e9a953

Fidelxyz added a commit to Fidelxyz/DXCam-CPP that referenced this pull request Nov 8, 2023

Apply optimization proposed in ra1nty/DXcam#62

6f7a97d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #62

Performance improvements #62

Agade09 commented Apr 15, 2023 •

edited

Loading

ninjatall12 commented Apr 22, 2023

ra1nty commented Apr 22, 2023

Agade09 commented Apr 29, 2023 •

edited

Loading

AI-M-BOT commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023

crackwitz commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023 •

edited

Loading

AI-M-BOT commented Apr 29, 2023

Agade09 commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023 •

edited

Loading

AI-M-BOT commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023

ninjatall12 commented May 2, 2023 •

edited

Loading

ParticleG commented Jun 9, 2023

lucasmonstrox commented Jun 18, 2023

lucasmonstrox commented Jun 18, 2023

Performance improvements #62

Are you sure you want to change the base?

Performance improvements #62

Conversation

Agade09 commented Apr 15, 2023 • edited Loading

ninjatall12 commented Apr 22, 2023

ra1nty commented Apr 22, 2023

Agade09 commented Apr 29, 2023 • edited Loading

AI-M-BOT commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023

crackwitz commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023 • edited Loading

AI-M-BOT commented Apr 29, 2023

Agade09 commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023 • edited Loading

AI-M-BOT commented Apr 29, 2023

AI-M-BOT commented Apr 29, 2023

ninjatall12 commented May 2, 2023 • edited Loading

ParticleG commented Jun 9, 2023

lucasmonstrox commented Jun 18, 2023

lucasmonstrox commented Jun 18, 2023

Agade09 commented Apr 15, 2023 •

edited

Loading

Agade09 commented Apr 29, 2023 •

edited

Loading

AI-M-BOT commented Apr 29, 2023 •

edited

Loading

AI-M-BOT commented Apr 29, 2023 •

edited

Loading

ninjatall12 commented May 2, 2023 •

edited

Loading