Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'timed out waiting for llama runner to start' in ~6 minutes when trying to load large model #246

Open
alexander-potemkin opened this issue Aug 8, 2024 · 0 comments

Comments

@alexander-potemkin
Copy link

I'm trying to send a query to LLM with the following code:

from ollama import Client, Options
from datetime import datetime

model_name = 'llama3.1:70b-instruct-q8_0'
client = Client(host='https://llama.my.server.com', timeout=360000)
pull_result = client.pull(model_name)

response = client.chat(
    # keep_alive=0,
    model=model_name,
    messages=[
        {"role": "user", "content": f"$myPrompt"},
    ],
    options=Options(
        # num_ctx=128000,
        num_ctx=64000,
        use_mmap=True,
        keep_alive=0,
        timeout=360000
    )
)

print("Model responce: ", response['response'])

The error I'm getting is ollama._types.ResponseError: timed out waiting for llama runner to start - progress 1.00 -
On the server it's Aug 8 22:19:19 myserver ollama[2197083]: time=2024-08-08T22:19:19.371+02:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="timed out waiting for llama runner to start - progress 1.00 - "

To be fair - it's quite a stretch for the server - with 64GB RAM (+huge swap), 8 vCPUs (4 real cores, I believe) and no GPU, but I'm fine to wait a few hours if required, especially, since I configured time-outs, or at least I believe so - based on the examples and issues I've found here.

Any help would be much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant