-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Python bindings #108
base: main
Are you sure you want to change the base?
Conversation
Should we consider creating a function with their GIL-less counterpart like But more importantly, is |
Perhaps we should also consider moving the |
Hello @winstxnhdw, I tried installing py-gxhash via pip using your command but I get this error:
Am I missing something? I am using python 3.13
Gxhash has no shared or global state. Any state is internal to the method scope (thus thread local). The input data is passed by reference, so it must not be mutated while gxhash is reading from it, or the produced hash will be undefined, but it won't crash nor hang. You can indeed drop the GIL if you wish. |
Looks like your shell is not escaping the URL correctly. I have updated the installation instructions. This one should work. pip install "gxhash @ git+https://[email protected]/winstxnhdw/gxhash.git#subdirectory=py-gxhash" |
d661b7d
to
f9beeef
Compare
1a06c13
to
4e66ccd
Compare
I noticed that the docs uses the specific phrase |
The current async function is more expensive than running it synchronously even for 5 GB files. Using nogil + Python threads is more than 20x faster. Perhaps instead of spawning a tokio thread, we should pass a Python thread to Rust instead. Also, it's really interesting that repeated calls are many magnitudes faster than the first call. I guess this is the cache being populated and then hit. With so much going on in Python, I always assumed that the cache would be evicted quickly. |
Indeed, it almost did! Thank you :).
I don't think it makes a lot of sense to expose an async version of gxhash. Async / await is usually reserved for network-bound or sometimes even disk-bound operation so that a thread is not frozen while waiting for the response. Here gxhash is purely CPU-bound. Suppose in a given context the hashing time is substantial enough that you think one may want to execute it asynchronously. In that case, it also means there is a substantial amount of CPU work delegated to a threadpool thread (from tokio or python), which is not something you want (it may cause threadpool starvation). In such case what you want is delegate the work to a dedicated background thread, and you wouldn't use async / await (instead invoke thread and send / consume work via queue for instance).
It is correct, however this documentation is on the
It's really awesome! I was surprised to see how little it would require locally in order to make this work, congrats! At this point I am wondering if the future of this belongs to a subfolder (like you did) or a dedicated repository, the reasons being:
If you like this second option please let me know. That would imply moving transferring this repo to a "gxhash" organization and name it |
Ah, right. I'll be sure to handle that.
The reason for exposing an async version is because I'd imagine I am not exactly a software design expert on this topic, so if you still think that it doesn't make sense, let's discuss this further. There's very little resources on this topic, especially in the context of Python and I am always looking to broaden my understanding.
You also used the same description here. Also, do you have plans to implement larger-than-memory hashing? After confirming that
I don't mind this, but personally, I think there's some beauty and credibility to have a monolith of a library with all the bindings in one place, similar to |
Let's take your example:
As a staff backend engineer I do have some experience on this kind of topic, but don't just take my words for it, there are some resources online on this subject. Here are some:
Now if you read from the disk, or compute a hash block by block, then it's another story. I'd likely use async / await to read from the disk or wait for receiving request bytes, and thus hash on-the-go. In that situation I'd say every chunk hash operation isn't going to be blocking as much as our first example, and thus it might be acceptable to do it synchronously.
Oh indeed! We must change that 👍
In rust, that's one of the purposes of |
Thank you for taking the time to write something up. I am still confused whether only CPU-bound functions can cause thread pool starvation? In Python, not all IO-bound functions have an async variant. For example, with open("model.bin", "rb") as f:
file_bytes = f.read() Performing such an operation would block the event loop and one way to avoid blocking the event loop would be to run In any case, it would be best to remove the Also,
I will wait for you to change the description and then align my docstrings with yours. Also, I am wondering if you plan to push this line to
I think Python Generators would work well here. |
I would humbly disagree, or at least say that it would only be acceptable with some caveats; this approach would improve p9x latency by blocking the event loop for a shorter amount of time per call, but the total time the event loop remains "unavailable" to handle events would remain the same (well, actually it would increase because of the function call and ffi overhead combined with the decreased gxhash-side performance of hashing smaller chunks at a time) thereby reducing total throughput. In my opinion though, this isn't something that the library itself should necessarily manage (though I am not a pythonista and I am talking from the general perspective such as that taken in async rust or async C#, both of which I am considerably more familiar with), only because of the number of factors and permutations that have to be taken into account, all of which the downstream caller integrating gxhash into their project would know more about. The subtleties here lie greatly in the specific application. If you are hashing content that fits comfortably in memory without causing gc issues or if the final content is going to be stored in memory regardless (i.e. is not being streamed to disk or network) then it might make more sense (given how fast gxhash itself is) to buffer the entirety of the content then make one threadpool-backed call to gxhash that doesn't block the event loop, that runs at max gxhash performance, and minimizes the function call/ffi overhead. But if you are working on streaming content that won't be stored (contiguously!) in memory thereafter, then you're obviously going to have to call gxhash per some smaller-sized slice at a time. In that case, the question of whether this should be done on a threadpool thread or directly from the async routine is a much more complicated question that would involve knowledge of the size of the input (i.e. if you are hashing 16 bytes at a time, the threadpool overhead, context switch, cache misses, etc is going to exceed the benefits of not blocking the event pool) while if you're processing still decently-sized chunks at a time it might make sense to spin up a threadpool task to handle the cpu-intensive work. But I don't know what the accepted level of "handholding" here is in the python modules community (or even if python devs are interested in optimizations at this level to begin with, though what I wrote above is assuming at least some do). |
I always believed that good API design should include sane defaults while also giving the user the option to build their own specialised solution from scratch, and FWIW, I care deeply about such optimisations. I also think providing the user with an
I am not sure what could be much worse than blocking the event loop. When the event loop is blocked, no one can load your web page, your Prometheus service will no longer be able to scrape for metrics, and your monitoring service sends an alert to everyone that your service is down. Of course the other option is for users to push it to a task queue, but on a sufficiently large cluster with a known amount of users, is there really a need to force the user to pay the development complexity and overhead cost of using a task queue?
Also, in Python, all synchronous functions are blocking. This means that hashing every chunk synchronously would still block the event loop until all chunks have been completely hashed. Of course, this is not the case if you are streaming the bytes in from a web/socket. |
The implication is that you are still blocking the event loop while it does the tasks I mentioned. e.g. one heavily optimized ffi call to gxhash to hash a 4-byte input might actually block the event loop less than the bookkeeping that same event loop thread has to do (synchronously) to send and retrieve the result to an off-thread worker for such a low-latency operation. And if you expand the scope past microbenchmarks, you need to take into account whether or not the "other thread" will cohabit a core that's already servicing another event loop for your application, the trashing of multiple threads' L1 cache lines, etc. But again, this is only generally speaking for typical async event loops, without possessing arcane knowledge of the minutia of Python scheduler overhead, cpu core selection, etc as that is quite outside my wheelhouse. |
In the context of Python, once you cross the FFI boundary and drop the GIL, everything from then on is effectively non-blocking. Unless you mean the hashing operation saturates the CPU. |
Summary
This PR adds Python bindings with type hints for
gxhash32
,gxhash64
andgxhash128
. Onlycargo
is required. Currently, it's able to hash a 5 GB file in 0.7s.Closes #97.
Demo
pip install "gxhash @ git+https://[email protected]/winstxnhdw/gxhash.git#subdirectory=py-gxhash"
Todo
pypi