Improve ported implementation #10

MaxGraey · 2019-06-04T14:17:18Z

I guess this wasm version is port of levenshtein-js. Wdyt about port fast-levenshtein instead? It looks like much simpler and faster and don't require store full matrix.

kyranet · 2019-06-04T22:04:13Z

It still uses arrays, though fast-levenshtein re-uses them:

https://github.com/hiddentao/fast-levenshtein/blob/9c3c4aa3e19f8faf36b2d5df9884532563332094/levenshtein.js#L11-L12

Also the benchmarks show it's slower:

leven benchmarks:

         165,926 op/s » leven
         164,398 op/s » talisman
           1,044 op/s » levenshtein-edit-distance
             628 op/s » fast-levenshtein
             497 op/s » levenshtein-component
             195 op/s » ld
             190 op/s » levenshtein
             168 op/s » levdist
              10 op/s » natural

fast-levenshtein benchmarks:

Running suite Implementation comparison [benchmark/speed.js]...
>> levenshtein-edit-distance x 234 ops/sec ±3.02% (73 runs sampled)
>> levenshtein-component x 422 ops/sec ±4.38% (83 runs sampled)
>> levenshtein-deltas x 283 ops/sec ±3.83% (78 runs sampled)
>> natural x 255 ops/sec ±0.76% (88 runs sampled)
>> levenshtein x 180 ops/sec ±3.55% (86 runs sampled)
>> fast-levenshtein x 1,792 ops/sec ±2.72% (95 runs sampled)
Benchmark done.
Fastest test is fast-levenshtein at 4.2x faster than levenshtein-component

js-levenshtein benchmarks:

                      50 paragraphs, length max=500 min=240 avr=372.5
             162 op/s » js-levenshtein
              98 op/s » talisman
              94 op/s » levenshtein-edit-distance
              85 op/s » leven
              39 op/s » fast-levenshtein

                      100 sentences, length max=170 min=6 avr=57.5
           3,076 op/s » js-levenshtein
           2,024 op/s » talisman
           1,817 op/s » levenshtein-edit-distance
           1,633 op/s » leven
             800 op/s » fast-levenshtein

                      2000 words, length max=20 min=3 avr=9.5
           3,119 op/s » js-levenshtein
           2,416 op/s » talisman
           2,141 op/s » levenshtein-edit-distance
           1,855 op/s » leven
           1,260 op/s » fast-levenshtein

I'm not sure why each benchmark's results differ, and why some do so much, I have yet to benchmark all of them together. @vladfrangu has been working on benchmarks for this module, those are the initial results (before rewriting to Leven's implementation):

levenshtein-wasm benchmarks:

levenshtein-wasm x 710 ops/sec ±6.04% (70 runs sampled)
js-levenshtein x 1,498 ops/sec ±0.38% (89 runs sampled)
talisman x 1,484 ops/sec ±0.85% (92 runs sampled)
levenshtein-edit-distance x 1,281 ops/sec ±0.16% (93 runs sampled)
leven x 1,360 ops/sec ±0.14% (93 runs sampled)
fast-levenshtein x 1,008 ops/sec ±0.44% (91 runs sampled)
Fastest is js-levenshtein

Running paragraph
levenshtein-wasm x 49.98 ops/sec ±0.06% (63 runs sampled)
js-levenshtein x 80.81 ops/sec ±3.30% (68 runs sampled)
talisman x 52.76 ops/sec ±5.84% (66 runs sampled)
levenshtein-edit-distance x 46.37 ops/sec ±3.16% (59 runs sampled)
leven x 51.51 ops/sec ±3.20% (63 runs sampled)
fast-levenshtein x 29.58 ops/sec ±2.66% (51 runs sampled)
Fastest is js-levenshtein

Running Sentence
levenshtein-wasm x 865 ops/sec ±1.96% (82 runs sampled)
js-levenshtein x 1,446 ops/sec ±2.64% (87 runs sampled)
talisman x 1,027 ops/sec ±4.12% (82 runs sampled)
levenshtein-edit-distance x 906 ops/sec ±3.03% (88 runs sampled)
leven x 1,042 ops/sec ±2.22% (89 runs sampled)
fast-levenshtein x 632 ops/sec ±1.19% (89 runs sampled)
Fastest is js-levenshtein

levenshtein-wasm in leven's benchmark suite benchmarks:

Consistently, js-levenshtein ends up being the fastest in three benchmark suites, I don't know why the benchmarks mark the library they're on as the fastest, I guess I'll put a graph like deno does, while warming up the code before actually benchmarking this, and running them in separate threads (Web Workers? Child processes?) so the HEAP of one benchmark doesn't affect the others (I noticed in Node.js 6 that in any benchmark, the first function runs faster than if it were the second or the third), thus unmasking the true performance behind them. Thoughts?

I think I will wait for #2 to be finished and ready before we can make assumptions about what's faster, specially since one of Wasm's strongest points is predictable performance, and some benchmark suites don't warm V8 to fire up TurboFan, optimize the code, and make a correct performance comparison.

vladfrangu · 2019-06-05T09:53:30Z

I have finished my benchmarking PR, #2, which adds the benchmarks to the Azure Pipelines, and while everything does work, we are missing the function that allows us to remove the allocated strings from memory.. Due to that, we cannot benchmark this "properly" yet.

The stacktrace for the error is as follows:

yarn run v1.13.0
$ node benchmark/benchmark
Running Word
wasm-10b093f2:1



RuntimeError: memory access out of bounds
    at wasm-function[3]:173
    at Object.wrap [as levenshtein] (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\dist\loader.js:157:16)
    at levenshtein (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\dist\index.js:12:23)
    at wordBench (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\benchmark.js:1172:9)
    at Benchmark.<anonymous> (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\benchmark.js:1188:36)
    at Benchmark.eval [as compiled] (eval at createCompiled (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:1725:16), <anonymous>:5:124)
    at clock (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:1658:29)
    at cycle (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:2007:49)
    at C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\benchmark\benchmark.js:2061:37
    at Timeout._onTimeout (C:\Users\Vlad\Desktop\Development\levenshtein-wasm\benchmark\node_modules\lodash\lodash.js:2756:43)

If it's an error in the current code, please let us know or leave a PR, I'm sure @kyranet wouldn't mind! After this is fixed (and no other errors creep up), we can give this a proper benchmark. 👍

kyranet · 2019-06-05T10:10:41Z

CI on benchmark (#2).

MaxGraey · 2019-06-05T10:16:20Z

Yeah, it seems leven and js-levenshtein should be faster. Ok, did you think about approximations like ukkonen?

@vladfrangu This strange. But I recommend use buddy allocator currently. tlsf in master have some bugs which already fixed in dev branch

kyranet · 2019-06-05T10:24:17Z

Uhm, not a fan of multiple memory allocations, that approach is very complex and could be documented more. But I'm up to porting it if it's the fastest approach, I didn't know about that one!

Also I think the memory issue @vladfrangu is reporting is related to this:

levenshtein-wasm/src/index.ts

Line 13 in 90d54df

    
           return wasmModule.levenshtein(wasmModule.newString(a), wasmModule.newString(b));

As it's creating new strings into the Wasm memory, but never freeing them, unless I'm missing something 😅

kyranet · 2019-06-05T10:27:32Z

By the way, does AssemblyScript give any mechanism to create a value array (without pointers to memory)? I think dropping memory allocation, deallocation, and accesses, would allow further optimizations, speeding up operations substantially.

MaxGraey · 2019-06-05T10:34:44Z

As it's creating new strings into the Wasm memory, but never freeing them, unless I'm missing something

Yes, if you alloc new string on js side you should free it.

@kyranet You could work with raw linear memory directly. Like in game-of-life example

kyranet · 2019-06-05T10:54:44Z

Yes, if you alloc new string on js side you should free it.

That's what I want to do, the new docs show uses of module.__retain, module.__allocString, and module.__release in the loader basics:

var str = myModule.__retain(myModule.__allocString("my string"));
var foo = new myModule.Foo(str);
// do something with foo
myModule.__release(foo);
myModule.__release(str);

But those are for the dev branch, __allocString is defined and maybe __retain too, but I don't see any occurrence of __release. And I don't know about the master branch, which this module uses.

You could work with raw linear memory directly. Like in game-of-life example

Checked it, it uses load/store, which I already use, is this the right approach? For somebody who has worked with ARM a little, those names sound like they operate with the RAM memory/HEAP, which I want to avoid.

MaxGraey · 2019-06-05T11:22:07Z

Checked it, it uses load/store, which I already use, is this the right approach? For somebody who has worked with ARM a little, those names sound like they operate with the RAM memory/HEAP, which I want to avoid.

You can't avoid this especially when working with strings. Wasm hasn't shadow stack, only locals & globals and linear memory (heap)

kyranet · 2019-06-06T17:40:33Z

@vladfrangu's benchmark PR got merged! The reports from Azure Pipelines:

Benchmark Results

$ node benchmark/benchmark
Running Word
levenshtein-wasm x 473 ops/sec ±1.01% (86 runs sampled)
js-levenshtein x 2,176 ops/sec ±0.92% (89 runs sampled)
talisman x 1,846 ops/sec ±1.00% (91 runs sampled)
levenshtein-edit-distance x 1,567 ops/sec ±1.23% (89 runs sampled)
leven x 1,532 ops/sec ±1.10% (89 runs sampled)
fast-levenshtein x 1,307 ops/sec ±1.02% (89 runs sampled)
Fastest is js-levenshtein

Running paragraph
levenshtein-wasm x 21.20 ops/sec ±1.56% (39 runs sampled)
js-levenshtein x 104 ops/sec ±0.96% (74 runs sampled)
talisman x 68.87 ops/sec ±1.07% (69 runs sampled)
levenshtein-edit-distance x 59.59 ops/sec ±1.24% (61 runs sampled)
leven x 60.67 ops/sec ±0.86% (62 runs sampled)
fast-levenshtein x 41.88 ops/sec ±1.67% (55 runs sampled)
Fastest is js-levenshtein

Running Sentence
levenshtein-wasm x 435 ops/sec ±0.85% (89 runs sampled)
js-levenshtein x 1,932 ops/sec ±1.01% (91 runs sampled)
talisman x 1,445 ops/sec ±0.79% (92 runs sampled)
levenshtein-edit-distance x 1,139 ops/sec ±1.20% (88 runs sampled)
leven x 1,157 ops/sec ±1.07% (90 runs sampled)
fast-levenshtein x 856 ops/sec ±1.12% (87 runs sampled)
Fastest is js-levenshtein
Done in 116.09s.

The newest master has the latest code from AssemblyScript, I also noticed the output code grew up a lot (from around ~450 lines to ~2600 lines), however, this __retains the string and __releases it, so there are no memory issues anymore:

levenshtein-wasm/src/index.ts

Lines 14 to 21 in 9cfdac7

    
           const ptrB = wasmModule.__retain(wasmModule.__allocString(b)); 
        
           const ptrA = wasmModule.__retain(wasmModule.__allocString(a)); 
        
           try { 
        
           	return wasmModule.levenshtein(ptrA, ptrB); 
        
           } finally { 
        
           	wasmModule.__release(ptrA); 
        
           	wasmModule.__release(ptrB); 
        
           }

I will port js-levenshtein's algorithm (since it's the fastest), I have the feeling the JavaScript side is being a huge bottleneck, any thoughts?

kyranet · 2019-06-07T07:33:36Z

Update: I have ported js-levenshtein's algorithm (since it's the fastest) in 7c86572, ukkonen measures Edit Distance and levenshtein... well... Levenshtein Distance. They are different algorithms with different usecases, the former is faster for long strings whereas the latter is faster for short strings, so I decided to stick with Levenshtein, maybe in the future, if this gets performant enough, I'll port Ukkonen as a separate library 👍

Benchmark Results

Running Word
levenshtein-wasm x 623 ops/sec ±0.73% (92 runs sampled)
js-levenshtein x 1,815 ops/sec ±0.41% (93 runs sampled)
talisman x 1,666 ops/sec ±1.66% (90 runs sampled)
levenshtein-edit-distance x 1,441 ops/sec ±0.67% (88 runs sampled)
leven x 1,483 ops/sec ±0.63% (91 runs sampled)
fast-levenshtein x 1,055 ops/sec ±2.23% (83 runs sampled)
Fastest is js-levenshtein

Running paragraph
levenshtein-wasm x 34.44 ops/sec ±1.26% (59 runs sampled)
js-levenshtein x 72.18 ops/sec ±0.41% (72 runs sampled)
talisman x 56.72 ops/sec ±1.41% (60 runs sampled)
levenshtein-edit-distance x 55.15 ops/sec ±1.32% (70 runs sampled)
leven x 57.97 ops/sec ±0.33% (73 runs sampled)
fast-levenshtein x 35.40 ops/sec ±0.32% (61 runs sampled)
Fastest is js-levenshtein

Running Sentence
levenshtein-wasm x 779 ops/sec ±0.34% (93 runs sampled)
js-levenshtein x 1,630 ops/sec ±0.18% (95 runs sampled)
talisman x 1,274 ops/sec ±0.27% (95 runs sampled)
levenshtein-edit-distance x 1,141 ops/sec ±0.54% (94 runs sampled)
leven x 1,175 ops/sec ±0.27% (95 runs sampled)
fast-levenshtein x 726 ops/sec ±0.22% (93 runs sampled)
Fastest is js-levenshtein
Done in 120.07s.

I have just released [email protected] and the bundle size has been shot with AssemblyScript's latest changes, I have uploaded the WAT files (beware different implementations, since one is a port of leven and the other is a port of js-levenshtein, plus I also moved store/load to Int32Array) to a Gist.

In PackagePhobia, the bundle size has grown from 77.0kB to 514kB, and the code performs greatly worse (from being almost as fast as fast-levenshtein, to be the slowest of all).

kyranet added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ported implementation #10

Improve ported implementation #10

MaxGraey commented Jun 4, 2019

kyranet commented Jun 4, 2019 •

edited

Loading

vladfrangu commented Jun 5, 2019

kyranet commented Jun 5, 2019 •

edited

Loading

MaxGraey commented Jun 5, 2019

kyranet commented Jun 5, 2019

kyranet commented Jun 5, 2019

MaxGraey commented Jun 5, 2019

kyranet commented Jun 5, 2019

MaxGraey commented Jun 5, 2019

kyranet commented Jun 6, 2019 •

edited

Loading

kyranet commented Jun 7, 2019 •

edited

Loading

Improve ported implementation #10

Improve ported implementation #10

Comments

MaxGraey commented Jun 4, 2019

kyranet commented Jun 4, 2019 • edited Loading

vladfrangu commented Jun 5, 2019

kyranet commented Jun 5, 2019 • edited Loading

MaxGraey commented Jun 5, 2019

kyranet commented Jun 5, 2019

kyranet commented Jun 5, 2019

MaxGraey commented Jun 5, 2019

kyranet commented Jun 5, 2019

MaxGraey commented Jun 5, 2019

kyranet commented Jun 6, 2019 • edited Loading

kyranet commented Jun 7, 2019 • edited Loading

kyranet commented Jun 4, 2019 •

edited

Loading

kyranet commented Jun 5, 2019 •

edited

Loading

kyranet commented Jun 6, 2019 •

edited

Loading

kyranet commented Jun 7, 2019 •

edited

Loading