-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproducing benchmark results #32
Comments
I'm seeing similar performance as you:
It could be that the |
I finally did come around to implement the optimal string alignment distance in RapidFuzz and compare the performance. I expected it to be faster due to the better time complexity ( |
I honestly don't really have the time to maintain this library because I'm no longer using it in my research, so I'm admittedly not quite sure why performance is dropping off like this. At the time I wrote it, it was by far the most performant DL distance library for Python (and the only one that supported unicode), but this is clearly not the case anymore. |
I tried to reproduce the benchmarks in your readme. However my results running the same benchmark are greatly different from the results you achieved. Note I scaled the benchmarks down from
500000
to5000
, since this enough to get a good idea of the performance difference without spending all day running the benchmark.On Python 3.9 I get:
and on Python2.7:
So the performance differences in your benchmarks appear to differentiate by one order of magnitude for both
pyxdameraulevenshtein <-> Michael Homer's implementation
andpyxdameraulevenshtein <-> difflib
. My best guess would be that the Python version you created them on (they existed since the start when the library was Python 2.4+) was still significantly slower than more recent versions of Python. It would probably make sense to redo these benchmarks on a more recent version of Python.The text was updated successfully, but these errors were encountered: