You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for bug reports. For questions, please visit Discussions.
I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English中文日本語Portuguese (Brazil)
I have searched for existing issues, including closed ones. Search issues
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Docker)
Environment Details
latest docker image ( sha256:40c9620c1dfd8efb1063a5826e72243efafc5f19784af5fa0603238e06b7dd62 )
I've a RTX 3090 TI with cuda (works on gradio)
Steps to Reproduce
run api_server.py with these arguments --listen 0.0.0.0:8080 --llama-checkpoint-path "checkpoints/fish-speech-1.5" --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
✔️ Expected Behavior
should work
❌ Actual Behavior
the warmup fails already with an error in generate method where calls decode_n_tokens
2024-12-21 03:12:34.092 | INFO | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-21 03:12:34.092 | INFO | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-21 03:12:34.126 | INFO | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-21 03:12:34.127 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
0%| | 0/1023 [00:00<?, ?it/s]/usr/local/lib/python3.12/contextlib.py:105: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
0%| | 0/1023 [00:05<?, ?it/s]
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/kui/asgi/lifespan.py", line 36, in __call__
await result
File "/opt/fish-speech/tools/api_server.py", line 77, in initialize_app
app.state.model_manager = ModelManager(
^^^^^^^^^^^^^
File "/opt/fish-speech/tools/server/model_manager.py", line 66, in __init__
self.warm_up(self.tts_inference_engine)
File "/opt/fish-speech/tools/server/model_manager.py", line 122, in warm_up
list(inference(request, tts_inference_engine))
File "/opt/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, '\'Failed running call_function <built-in method empty_like of type object at 0x7fa3d7c4b5e0>(*(FakeTensor(..., device=\\\'cuda:0\\\', size=(102048,), dtype=torch.bfloat16),), **{}):\\nCannot set version_counter for inference tensor\\n\\nfrom user code:\\n File "/opt/fish-speech/tools/llama/generate.py", line 266, in decode_one_token_ar\\n sample(\\n File "/opt/fish-speech/tools/llama/generate.py", line 135, in sample\\n idx_next = multinomial_sample_one_no_sync(probs)\\n File "/opt/fish-speech/tools/llama/generate.py", line 52, in multinomial_sample_one_no_sync\\n q = torch.empty_like(probs_sort).exponential_(1)\\n\\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\\n\\n\\nYou can suppress this exception and fall back to eager by setting:\\n import torch._dynamo\\n torch._dynamo.config.suppress_errors = True\\n\'')
The text was updated successfully, but these errors were encountered:
Self Checks
Cloud or Self Hosted
Self Hosted (Docker)
Environment Details
latest docker image ( sha256:40c9620c1dfd8efb1063a5826e72243efafc5f19784af5fa0603238e06b7dd62 )
I've a RTX 3090 TI with cuda (works on gradio)
Steps to Reproduce
run api_server.py with these arguments
--listen 0.0.0.0:8080 --llama-checkpoint-path "checkpoints/fish-speech-1.5" --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
✔️ Expected Behavior
should work
❌ Actual Behavior
the warmup fails already with an error in
generate
method where callsdecode_n_tokens
The text was updated successfully, but these errors were encountered: