Could not import SGMV kernel from Punica, falling back to loop. #2465

ksajan · 2024-08-28T05:22:48Z

System Info

text-generation-launcher --env:

2024-08-28T05:17:36.254761Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.79.0
Commit sha: 21187c27c90acbec7f912b8af4feaec154de960f
Docker label: N/A
nvidia-smi:
N/A
xpu-smi:
N/A
2024-08-28T05:17:36.254797Z  INFO text_generation_launcher: Args {
    model_id: "bigscience/bloom-560m",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "0.0.0.0",
    port: 3000,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: true,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
}

No GPU using CPU version.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Installed rust and create a virtual env with python 3.9
Install Protoc
cloned the github repo
ran the command

cd text-generation-inference/
BUILD_EXTENSIONS=True make install-cpu

then tried the example of running the tgi locally using falcon-7b model but after downloading it fails to load saying the error : Could not import SGMV kernel from Punica, falling back to loop.

Expected behavior

It should download the model and serve it without any error

The text was updated successfully, but these errors were encountered:

ErikKaum · 2024-08-29T13:22:22Z

Hi @ksajan 👋

Thanks for filing the issue. I think the problem is that you're running on a CPU and the falcon-7b in TGI is only supported with kernels that require a GPU.

If you want to run TGI locally on cpu to test I'd recommend choosing a smaller model that doesn't rely on special kernel. Or if you're requirements are to use something like falcon-7b then unfortunately you'll need a GPU machine.

Let me know if I can help in any other way 🙌

ksajan · 2024-08-29T13:42:24Z

@ErikKaum I tried running this lmsys/vicuna-7b-v1.3 as well which I can run using llama_cpp. I was trying to actually train the Medusa head that is there in the TGI documentation but I was unable to run this in google collab with GPU with a similar error.

ErikKaum · 2024-09-05T09:04:52Z

Yeah so the llama.cpp version probably uses different kernels that don't require GPUs.

When you build this for a gpu did you use: BUILD_EXTENSIONS=True make install-cpu or BUILD_EXTENSIONS=True make`?

I'd nonetheless recommend using the docker image to avoid building from source, usually a lot more hassle free 👍

github-staff deleted a comment Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not import SGMV kernel from Punica, falling back to loop. #2465

Could not import SGMV kernel from Punica, falling back to loop. #2465

ksajan commented Aug 28, 2024

ErikKaum commented Aug 29, 2024

ksajan commented Aug 29, 2024 •

edited

Loading

ErikKaum commented Sep 5, 2024

Could not import SGMV kernel from Punica, falling back to loop. #2465

Could not import SGMV kernel from Punica, falling back to loop. #2465

Comments

ksajan commented Aug 28, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

ErikKaum commented Aug 29, 2024

ksajan commented Aug 29, 2024 • edited Loading

ErikKaum commented Sep 5, 2024

ksajan commented Aug 29, 2024 •

edited

Loading