[WASM/segment-anything] Error: unreachable #1695

BladeTransformerLLC · 2024-02-11T21:34:15Z

Hi there. The wasm segment-anything demo fails to create image embeddings with the following error on all the web browsers I use (Chrome, Firefox, etc).

lib-example.html:46 {error: RuntimeError: unreachable
    at __rg_oom (http://localhost:8000/build/m_bg.wasm:wasm-function[2615…}
lib-example.html:52 Uncaught (in promise) Error: Error: unreachable
    at Worker.messageHandler (lib-example.html:52:22)

model: MobileSAM Tiny
rustc: 1.76.0 (07dca489a 2024-02-04)
wasm-bindgen: 0.2.88

The text was updated successfully, but these errors were encountered:

BladeTransformerLLC · 2024-02-12T14:25:45Z

The error seems to actually occur here:

candle/candle-wasm-examples/segment-anything/src/bin/m.rs

Line 65 in d0aa197

let data = self.sam.embeddings(&image_t)?;

which is defined at:

candle/candle-transformers/src/models/segment_anything/sam.rs

Line 125 in d0aa197

pub fn embeddings(&self, img: &Tensor) -> Result<Tensor> {

LaurentMazare · 2024-02-12T14:29:23Z

@radames maybe you have an idea from the top of your head of what could be causing this?

radames · 2024-02-12T21:11:26Z

interesting, testing the embeddings call running on CPU, it works fine, however the forward pass is crashing on wasm
@LaurentMazare the only difference I see is this change from 2month ago, can you trace it?

b0fe5e4#diff-93f2883354b5db7aed5fe70e86efcbf45fb51b2e53729c984c50db017f3eb367L30-L32

RuntimeError: unreachable
    at __rg_oom (http://localhost:3000/build/m_bg.wasm:wasm-function[2572]:0x259cb8)
    at __rust_alloc_error_handler (http://localhost:3000/build/m_bg.wasm:wasm-function[2771]:0x25acab)
    at alloc::alloc::handle_alloc_error::rt_error::he184dc17edadc515 (http://localhost:3000/build/m_bg.wasm:wasm-function[2871]:0x25b1e1)
    at alloc::alloc::handle_alloc_error::h6794baf93dd91773 (http://localhost:3000/build/m_bg.wasm:wasm-function[2870]:0x25b1d6)
    at <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::matmul::h90f4b8097d8ce7fa (http://localhost:3000/build/m_bg.wasm:wasm-function[62]:0x8207b)
    at candle_core::storage::Storage::matmul::hbf128fe6d10adcec (http://localhost:3000/build/m_bg.wasm:wasm-function[1127]:0x20fdbe)
    at candle_core::tensor::Tensor::matmul::h67c69b91303a5f7b (http://localhost:3000/build/m_bg.wasm:wasm-function[218]:0x122e82)
    at <candle_transformers::models::segment_anything::tiny_vit::Attention as candle_core::Module>::forward::h766d633748eb5295 (http://localhost:3000/build/m_bg.wasm:wasm-function[94]:0xc66b3)
    at <candle_transformers::models::segment_anything::tiny_vit::TinyViTBlock as candle_core::Module>::forward::haaec9b20d4c4d579 (http://localhost:3000/build/m_bg.wasm:wasm-function[91]:0xc21ec)
    at <candle_transformers::models::segment_anything::tiny_vit::BasicLayer as candle_core::Module>::forward::ha59c134a2ce2c8a6 (http://localhost:3000/build/m_bg.wasm:wasm-function[1151]:0x2129a6)

LaurentMazare · 2024-02-12T22:15:40Z

Interesting, thanks for looking into this. I'm not sure it would be related to the change tweaking the module trait for batchnorm.
Looking more into the backtrace, it seems that we're actually running out of memory. This certainly feels awkward as the Tiny model involved here is pretty small, maybe somehow a bug was introduced that creates tensors that are larger than expected.

LaurentMazare · 2024-02-13T11:55:26Z

Actually I think I got to the bottom of this, @radames not sure how you got to the commit you mentionned but it's really close to the problematic one which is slighly earlier 4290b8....
The main thing that changed is that batch-normed became learnable and because of that the backprop graph was retained, even in eval mode. This resulted in larger memory usage and caused the oom. #1702 attempts at fixing this by properly detaching the running mean/var tensors before applying the batch-norm so that hopefully no backprop graph is retained, from my testing it seems to fix the issue.

LaurentMazare · 2024-02-13T13:29:22Z

#1702 has been merged, let me know it's still an issue!

BladeTransformerLLC · 2024-02-13T15:05:11Z

It works now! Great work @LaurentMazare

radames · 2024-02-13T15:30:58Z

amazing @LaurentMazare , I got the commit via git blame on the most recent changes for the SAM model implementation.

LaurentMazare mentioned this issue Feb 13, 2024

Detach the tensors on batch-norm eval. #1702

Merged

radames closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WASM/segment-anything] Error: unreachable #1695

[WASM/segment-anything] Error: unreachable #1695

BladeTransformerLLC commented Feb 11, 2024

BladeTransformerLLC commented Feb 12, 2024

LaurentMazare commented Feb 12, 2024

radames commented Feb 12, 2024

LaurentMazare commented Feb 12, 2024

LaurentMazare commented Feb 13, 2024

LaurentMazare commented Feb 13, 2024

BladeTransformerLLC commented Feb 13, 2024

radames commented Feb 13, 2024

[WASM/segment-anything] Error: unreachable #1695

[WASM/segment-anything] Error: unreachable #1695

Comments

BladeTransformerLLC commented Feb 11, 2024

BladeTransformerLLC commented Feb 12, 2024

LaurentMazare commented Feb 12, 2024

radames commented Feb 12, 2024

LaurentMazare commented Feb 12, 2024

LaurentMazare commented Feb 13, 2024

LaurentMazare commented Feb 13, 2024

BladeTransformerLLC commented Feb 13, 2024

radames commented Feb 13, 2024