Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ported implementation #10

Open
MaxGraey opened this issue Jun 4, 2019 · 11 comments
Open

Improve ported implementation #10

MaxGraey opened this issue Jun 4, 2019 · 11 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@MaxGraey
Copy link

MaxGraey commented Jun 4, 2019

I guess this wasm version is port of levenshtein-js. Wdyt about port fast-levenshtein instead? It looks like much simpler and faster and don't require store full matrix.

@kyranet
Copy link
Owner

kyranet commented Jun 4, 2019

It still uses arrays, though fast-levenshtein re-uses them:

https://github.com/hiddentao/fast-levenshtein/blob/9c3c4aa3e19f8faf36b2d5df9884532563332094/levenshtein.js#L11-L12

Also the benchmarks show it's slower:

leven benchmarks:
         165,926 op/s » leven
         164,398 op/s » talisman
           1,044 op/s » levenshtein-edit-distance
             628 op/s » fast-levenshtein
             497 op/s » levenshtein-component
             195 op/s » ld
             190 op/s » levenshtein
             168 op/s » levdist
              10 op/s » natural
fast-levenshtein benchmarks:
Running suite Implementation comparison [benchmark/speed.js]...
>> levenshtein-edit-distance x 234 ops/sec ±3.02% (73 runs sampled)
>> levenshtein-component x 422 ops/sec ±4.38% (83 runs sampled)
>> levenshtein-deltas x 283 ops/sec ±3.83% (78 runs sampled)
>> natural x 255 ops/sec ±0.76% (88 runs sampled)
>> levenshtein x 180 ops/sec ±3.55% (86 runs sampled)
>> fast-levenshtein x 1,792 ops/sec ±2.72% (95 runs sampled)
Benchmark done.
Fastest test is fast-levenshtein at 4.2x faster than levenshtein-component
js-levenshtein benchmarks:
                      50 paragraphs, length max=500 min=240 avr=372.5
             162 op/s » js-levenshtein
              98 op/s » talisman
              94 op/s » levenshtein-edit-distance
              85 op/s » leven
              39 op/s » fast-levenshtein
                      100 sentences, length max=170 min=6 avr=57.5
           3,076 op/s » js-levenshtein
           2,024 op/s » talisman
           1,817 op/s » levenshtein-edit-distance
           1,633 op/s » leven
             800 op/s » fast-levenshtein
                      2000 words, length max=20 min=3 avr=9.5
           3,119 op/s » js-levenshtein
           2,416 op/s » talisman
           2,141 op/s » levenshtein-edit-distance
           1,855 op/s » leven
           1,260 op/s » fast-levenshtein

I'm not sure why each benchmark's results differ, and why some do so much, I have yet to benchmark all of them together. @vladfrangu has been working on benchmarks for this module, those are the initial results (before rewriting to Leven's implementation):

levenshtein-wasm benchmarks:
levenshtein-wasm x 710 ops/sec ±6.04% (70 runs sampled)
js-levenshtein x 1,498 ops/sec ±0.38% (89 runs sampled)
talisman x 1,484 ops/sec ±0.85% (92 runs sampled)
levenshtein-edit-distance x 1,281 ops/sec ±0.16% (93 runs sampled)
leven x 1,360 ops/sec ±0.14% (93 runs sampled)
fast-levenshtein x 1,008 ops/sec ±0.44% (91 runs sampled)
Fastest is js-levenshtein
Running paragraph
levenshtein-wasm x 49.98 ops/sec ±0.06% (63 runs sampled)
js-levenshtein x 80.81 ops/sec ±3.30% (68 runs sampled)
talisman x 52.76 ops/sec ±5.84% (66 runs sampled)
levenshtein-edit-distance x 46.37 ops/sec ±3.16% (59 runs sampled)
leven x 51.51 ops/sec ±3.20% (63 runs sampled)
fast-levenshtein x 29.58 ops/sec ±2.66% (51 runs sampled)
Fastest is js-levenshtein
Running Sentence
levenshtein-wasm x 865 ops/sec ±1.96% (82 runs sampled)
js-levenshtein x 1,446 ops/sec ±2.64% (87 runs sampled)
talisman x 1,027 ops/sec ±4.12% (82 runs sampled)
levenshtein-edit-distance x 906 ops/sec ±3.03% (88 runs sampled)
leven x 1,042 ops/sec ±2.22% (89 runs sampled)
fast-levenshtein x 632 ops/sec ±1.19% (89 runs sampled)
Fastest is js-levenshtein
levenshtein-wasm in leven's benchmark suite benchmarks:

image

Consistently, js-levenshtein ends up being the fastest in three benchmark suites, I don't know why the benchmarks mark the library they're on as the fastest, I guess I'll put a graph like deno does, while warming up the code before actually benchmarking this, and running them in separate threads (Web Workers? Child processes?) so the HEAP of one benchmark doesn't affect the others (I noticed in Node.js 6 that in any benchmark, the first function runs faster than if it were the second or the third), thus unmasking the true performance behind them. Thoughts?


I think I will wait for #2 to be finished and ready before we can make assumptions about what's faster, specially since one of Wasm's strongest points is predictable performance, and some benchmark suites don't warm V8 to fire up TurboFan, optimize the code, and make a correct performance comparison.

@kyranet kyranet added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 4, 2019
@vladfrangu
Copy link
Contributor

I have finished my benchmarking PR, #2, which adds the benchmarks to the Azure Pipelines, and while everything does work, we are missing the function that allows us to remove the allocated strings from memory.. Due to that, we cannot benchmark this "properly" yet.


The stacktrace for the error is as follows:

yarn run v1.13.0
$ node benchmark/benchmark
Running Word
wasm-10b093f2:1



RuntimeError: memory access out of bounds
    at wasm-function[3]:173
    at Object.wrap [as levenshtein] (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\dist\loader.js:157:16)
    at levenshtein (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\dist\index.js:12:23)
    at wordBench (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\benchmark.js:1172:9)
    at Benchmark.<anonymous> (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\benchmark.js:1188:36)
    at Benchmark.eval [as compiled] (eval at createCompiled (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:1725:16), <anonymous>:5:124)
    at clock (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:1658:29)
    at cycle (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:2007:49)
    at C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:2061:37
    at Timeout._onTimeout (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\lodash\lodash.js:2756:43)

If it's an error in the current code, please let us know or leave a PR, I'm sure @kyranet wouldn't mind! After this is fixed (and no other errors creep up), we can give this a proper benchmark. 👍

@kyranet
Copy link
Owner

kyranet commented Jun 5, 2019

CI on benchmark (#2).

@MaxGraey
Copy link
Author

MaxGraey commented Jun 5, 2019

Yeah, it seems leven and js-levenshtein should be faster. Ok, did you think about approximations like ukkonen?

@vladfrangu This strange. But I recommend use buddy allocator currently. tlsf in master have some bugs which already fixed in dev branch

@kyranet
Copy link
Owner

kyranet commented Jun 5, 2019

Uhm, not a fan of multiple memory allocations, that approach is very complex and could be documented more. But I'm up to porting it if it's the fastest approach, I didn't know about that one!

Also I think the memory issue @vladfrangu is reporting is related to this:

return wasmModule.levenshtein(wasmModule.newString(a), wasmModule.newString(b));

As it's creating new strings into the Wasm memory, but never freeing them, unless I'm missing something 😅

@kyranet
Copy link
Owner

kyranet commented Jun 5, 2019

By the way, does AssemblyScript give any mechanism to create a value array (without pointers to memory)? I think dropping memory allocation, deallocation, and accesses, would allow further optimizations, speeding up operations substantially.

@MaxGraey
Copy link
Author

MaxGraey commented Jun 5, 2019

As it's creating new strings into the Wasm memory, but never freeing them, unless I'm missing something

Yes, if you alloc new string on js side you should free it.

@kyranet You could work with raw linear memory directly. Like in game-of-life example

@kyranet
Copy link
Owner

kyranet commented Jun 5, 2019

Yes, if you alloc new string on js side you should free it.

That's what I want to do, the new docs show uses of module.__retain, module.__allocString, and module.__release in the loader basics:

var str = myModule.__retain(myModule.__allocString("my string"));
var foo = new myModule.Foo(str);
// do something with foo
myModule.__release(foo);
myModule.__release(str);

But those are for the dev branch, __allocString is defined and maybe __retain too, but I don't see any occurrence of __release. And I don't know about the master branch, which this module uses.

You could work with raw linear memory directly. Like in game-of-life example

Checked it, it uses load/store, which I already use, is this the right approach? For somebody who has worked with ARM a little, those names sound like they operate with the RAM memory/HEAP, which I want to avoid.

@MaxGraey
Copy link
Author

MaxGraey commented Jun 5, 2019

Checked it, it uses load/store, which I already use, is this the right approach? For somebody who has worked with ARM a little, those names sound like they operate with the RAM memory/HEAP, which I want to avoid.

You can't avoid this especially when working with strings. Wasm hasn't shadow stack, only locals & globals and linear memory (heap)

@kyranet
Copy link
Owner

kyranet commented Jun 6, 2019

@vladfrangu's benchmark PR got merged! The reports from Azure Pipelines:

Benchmark Results
$ node benchmark/benchmark
Running Word
levenshtein-wasm x 473 ops/sec ±1.01% (86 runs sampled)
js-levenshtein x 2,176 ops/sec ±0.92% (89 runs sampled)
talisman x 1,846 ops/sec ±1.00% (91 runs sampled)
levenshtein-edit-distance x 1,567 ops/sec ±1.23% (89 runs sampled)
leven x 1,532 ops/sec ±1.10% (89 runs sampled)
fast-levenshtein x 1,307 ops/sec ±1.02% (89 runs sampled)
Fastest is js-levenshtein
Running paragraph
levenshtein-wasm x 21.20 ops/sec ±1.56% (39 runs sampled)
js-levenshtein x 104 ops/sec ±0.96% (74 runs sampled)
talisman x 68.87 ops/sec ±1.07% (69 runs sampled)
levenshtein-edit-distance x 59.59 ops/sec ±1.24% (61 runs sampled)
leven x 60.67 ops/sec ±0.86% (62 runs sampled)
fast-levenshtein x 41.88 ops/sec ±1.67% (55 runs sampled)
Fastest is js-levenshtein
Running Sentence
levenshtein-wasm x 435 ops/sec ±0.85% (89 runs sampled)
js-levenshtein x 1,932 ops/sec ±1.01% (91 runs sampled)
talisman x 1,445 ops/sec ±0.79% (92 runs sampled)
levenshtein-edit-distance x 1,139 ops/sec ±1.20% (88 runs sampled)
leven x 1,157 ops/sec ±1.07% (90 runs sampled)
fast-levenshtein x 856 ops/sec ±1.12% (87 runs sampled)
Fastest is js-levenshtein
Done in 116.09s.

The newest master has the latest code from AssemblyScript, I also noticed the output code grew up a lot (from around ~450 lines to ~2600 lines), however, this __retains the string and __releases it, so there are no memory issues anymore:

const ptrB = wasmModule.__retain(wasmModule.__allocString(b));
const ptrA = wasmModule.__retain(wasmModule.__allocString(a));
try {
return wasmModule.levenshtein(ptrA, ptrB);
} finally {
wasmModule.__release(ptrA);
wasmModule.__release(ptrB);
}

I will port js-levenshtein's algorithm (since it's the fastest), I have the feeling the JavaScript side is being a huge bottleneck, any thoughts?

@kyranet
Copy link
Owner

kyranet commented Jun 7, 2019

Update: I have ported js-levenshtein's algorithm (since it's the fastest) in 7c86572, ukkonen measures Edit Distance and levenshtein... well... Levenshtein Distance. They are different algorithms with different usecases, the former is faster for long strings whereas the latter is faster for short strings, so I decided to stick with Levenshtein, maybe in the future, if this gets performant enough, I'll port Ukkonen as a separate library 👍

Benchmark Results
Running Word
levenshtein-wasm x 623 ops/sec ±0.73% (92 runs sampled)
js-levenshtein x 1,815 ops/sec ±0.41% (93 runs sampled)
talisman x 1,666 ops/sec ±1.66% (90 runs sampled)
levenshtein-edit-distance x 1,441 ops/sec ±0.67% (88 runs sampled)
leven x 1,483 ops/sec ±0.63% (91 runs sampled)
fast-levenshtein x 1,055 ops/sec ±2.23% (83 runs sampled)
Fastest is js-levenshtein
Running paragraph
levenshtein-wasm x 34.44 ops/sec ±1.26% (59 runs sampled)
js-levenshtein x 72.18 ops/sec ±0.41% (72 runs sampled)
talisman x 56.72 ops/sec ±1.41% (60 runs sampled)
levenshtein-edit-distance x 55.15 ops/sec ±1.32% (70 runs sampled)
leven x 57.97 ops/sec ±0.33% (73 runs sampled)
fast-levenshtein x 35.40 ops/sec ±0.32% (61 runs sampled)
Fastest is js-levenshtein
Running Sentence
levenshtein-wasm x 779 ops/sec ±0.34% (93 runs sampled)
js-levenshtein x 1,630 ops/sec ±0.18% (95 runs sampled)
talisman x 1,274 ops/sec ±0.27% (95 runs sampled)
levenshtein-edit-distance x 1,141 ops/sec ±0.54% (94 runs sampled)
leven x 1,175 ops/sec ±0.27% (95 runs sampled)
fast-levenshtein x 726 ops/sec ±0.22% (93 runs sampled)
Fastest is js-levenshtein
Done in 120.07s.

I have just released [email protected] and the bundle size has been shot with AssemblyScript's latest changes, I have uploaded the WAT files (beware different implementations, since one is a port of leven and the other is a port of js-levenshtein, plus I also moved store/load to Int32Array) to a Gist.

In PackagePhobia, the bundle size has grown from 77.0kB to 514kB, and the code performs greatly worse (from being almost as fast as fast-levenshtein, to be the slowest of all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants