Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api_server with --compile fails #773

Open
6 tasks done
MithrilMan opened this issue Dec 21, 2024 · 3 comments
Open
6 tasks done

api_server with --compile fails #773

MithrilMan opened this issue Dec 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@MithrilMan
Copy link
Contributor

MithrilMan commented Dec 21, 2024

Self Checks

  • This template is only for bug reports. For questions, please visit Discussions.
  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues, including closed ones. Search issues
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Docker)

Environment Details

latest docker image ( sha256:40c9620c1dfd8efb1063a5826e72243efafc5f19784af5fa0603238e06b7dd62 )
I've a RTX 3090 TI with cuda (works on gradio)

Steps to Reproduce

run api_server.py with these arguments
--listen 0.0.0.0:8080 --llama-checkpoint-path "checkpoints/fish-speech-1.5" --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile

✔️ Expected Behavior

should work

❌ Actual Behavior

the warmup fails already with an error in generate method where calls decode_n_tokens

2024-12-21 03:12:34.092 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-21 03:12:34.092 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-21 03:12:34.126 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-21 03:12:34.127 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  0%|                                                                                                                                                                                    | 0/1023 [00:00<?, ?it/s]/usr/local/lib/python3.12/contextlib.py:105: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
  0%|                                                                                                                                                                                    | 0/1023 [00:05<?, ?it/s]
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/opt/fish-speech/tools/api_server.py", line 77, in initialize_app
    app.state.model_manager = ModelManager(
                              ^^^^^^^^^^^^^
  File "/opt/fish-speech/tools/server/model_manager.py", line 66, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/opt/fish-speech/tools/server/model_manager.py", line 122, in warm_up
    list(inference(request, tts_inference_engine))
  File "/opt/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, '\'Failed running call_function <built-in method empty_like of type object at 0x7fa3d7c4b5e0>(*(FakeTensor(..., device=\\\'cuda:0\\\', size=(102048,), dtype=torch.bfloat16),), **{}):\\nCannot set version_counter for inference tensor\\n\\nfrom user code:\\n   File "/opt/fish-speech/tools/llama/generate.py", line 266, in decode_one_token_ar\\n    sample(\\n  File "/opt/fish-speech/tools/llama/generate.py", line 135, in sample\\n    idx_next = multinomial_sample_one_no_sync(probs)\\n  File "/opt/fish-speech/tools/llama/generate.py", line 52, in multinomial_sample_one_no_sync\\n    q = torch.empty_like(probs_sort).exponential_(1)\\n\\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\\n\\n\\nYou can suppress this exception and fall back to eager by setting:\\n    import torch._dynamo\\n    torch._dynamo.config.suppress_errors = True\\n\'')
@MithrilMan MithrilMan added the bug Something isn't working label Dec 21, 2024
@MithrilMan
Copy link
Contributor Author

MithrilMan commented Dec 21, 2024

here the log if I launch run_webui with --compile (it works)

just to confirm it's a server api error

image

Here instead the screenshot of the api_server error:

image

@MithrilMan
Copy link
Contributor Author

MithrilMan commented Dec 21, 2024

update:
if I launch the api_server from the bash it works.

The problem is if I launch the api_server from vs code with its debugger debugpy

here the launch.json

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "api_server.py",
            "type": "debugpy",
            "request": "launch",
            "program": "tools/api_server.py",
            "console": "integratedTerminal",
            "args": [
                "--listen", "0.0.0.0:8080",
                "--llama-checkpoint-path", "checkpoints/fish-speech-1.5",
                "--decoder-checkpoint-path", "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth",
                "--decoder-config-name", "firefly_gan_vq",
                "--compile",
                "--half"
            ],
            "cwd": "${workspaceFolder}"
        }
    ]
}

@MithrilMan
Copy link
Contributor Author

actually my solution is disable --compile when debugging within visual studio even if it means waste of time, suggestions are welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant