Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WASM/segment-anything] Error: unreachable #1695

Closed
BladeTransformerLLC opened this issue Feb 11, 2024 · 8 comments
Closed

[WASM/segment-anything] Error: unreachable #1695

BladeTransformerLLC opened this issue Feb 11, 2024 · 8 comments

Comments

@BladeTransformerLLC
Copy link

Hi there. The wasm segment-anything demo fails to create image embeddings with the following error on all the web browsers I use (Chrome, Firefox, etc).

lib-example.html:46 {error: RuntimeError: unreachable
    at __rg_oom (http://localhost:8000/build/m_bg.wasm:wasm-function[2615…}
lib-example.html:52 Uncaught (in promise) Error: Error: unreachable
    at Worker.messageHandler (lib-example.html:52:22)

model: MobileSAM Tiny
rustc: 1.76.0 (07dca489a 2024-02-04)
wasm-bindgen: 0.2.88

wasm-console

@BladeTransformerLLC
Copy link
Author

The error seems to actually occur here:

let data = self.sam.embeddings(&image_t)?;

which is defined at:

pub fn embeddings(&self, img: &Tensor) -> Result<Tensor> {

@LaurentMazare
Copy link
Collaborator

@radames maybe you have an idea from the top of your head of what could be causing this?

@radames
Copy link
Contributor

radames commented Feb 12, 2024

interesting, testing the embeddings call running on CPU, it works fine, however the forward pass is crashing on wasm
@LaurentMazare the only difference I see is this change from 2month ago, can you trace it?

b0fe5e4#diff-93f2883354b5db7aed5fe70e86efcbf45fb51b2e53729c984c50db017f3eb367L30-L32

RuntimeError: unreachable
    at __rg_oom (http://localhost:3000/build/m_bg.wasm:wasm-function[2572]:0x259cb8)
    at __rust_alloc_error_handler (http://localhost:3000/build/m_bg.wasm:wasm-function[2771]:0x25acab)
    at alloc::alloc::handle_alloc_error::rt_error::he184dc17edadc515 (http://localhost:3000/build/m_bg.wasm:wasm-function[2871]:0x25b1e1)
    at alloc::alloc::handle_alloc_error::h6794baf93dd91773 (http://localhost:3000/build/m_bg.wasm:wasm-function[2870]:0x25b1d6)
    at <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::matmul::h90f4b8097d8ce7fa (http://localhost:3000/build/m_bg.wasm:wasm-function[62]:0x8207b)
    at candle_core::storage::Storage::matmul::hbf128fe6d10adcec (http://localhost:3000/build/m_bg.wasm:wasm-function[1127]:0x20fdbe)
    at candle_core::tensor::Tensor::matmul::h67c69b91303a5f7b (http://localhost:3000/build/m_bg.wasm:wasm-function[218]:0x122e82)
    at <candle_transformers::models::segment_anything::tiny_vit::Attention as candle_core::Module>::forward::h766d633748eb5295 (http://localhost:3000/build/m_bg.wasm:wasm-function[94]:0xc66b3)
    at <candle_transformers::models::segment_anything::tiny_vit::TinyViTBlock as candle_core::Module>::forward::haaec9b20d4c4d579 (http://localhost:3000/build/m_bg.wasm:wasm-function[91]:0xc21ec)
    at <candle_transformers::models::segment_anything::tiny_vit::BasicLayer as candle_core::Module>::forward::ha59c134a2ce2c8a6 (http://localhost:3000/build/m_bg.wasm:wasm-function[1151]:0x2129a6)

@LaurentMazare
Copy link
Collaborator

Interesting, thanks for looking into this. I'm not sure it would be related to the change tweaking the module trait for batchnorm.
Looking more into the backtrace, it seems that we're actually running out of memory. This certainly feels awkward as the Tiny model involved here is pretty small, maybe somehow a bug was introduced that creates tensors that are larger than expected.

@LaurentMazare
Copy link
Collaborator

Actually I think I got to the bottom of this, @radames not sure how you got to the commit you mentionned but it's really close to the problematic one which is slighly earlier 4290b8....
The main thing that changed is that batch-normed became learnable and because of that the backprop graph was retained, even in eval mode. This resulted in larger memory usage and caused the oom. #1702 attempts at fixing this by properly detaching the running mean/var tensors before applying the batch-norm so that hopefully no backprop graph is retained, from my testing it seems to fix the issue.

@LaurentMazare
Copy link
Collaborator

#1702 has been merged, let me know it's still an issue!

@BladeTransformerLLC
Copy link
Author

It works now! Great work @LaurentMazare

@radames
Copy link
Contributor

radames commented Feb 13, 2024

amazing @LaurentMazare , I got the commit via git blame on the most recent changes for the SAM model implementation.

@radames radames closed this as completed Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants