Context length management #981

noelo · 2025-02-06T11:48:58Z

noelo
Feb 6, 2025

Hi,
I've just being trying out the stack and it looks great. I have an 3.1-8b parameter mode running on a remote VLLM instance.

I'm setting the vllm max_token using " --env VLLM_MAX_TOKENS=6048" but when I run any of the llama-stack-apps I keep getting the following error "openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 6048 tokens. However, you requested 6616 tokens (568 in the messages, 6048 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}

I've tried to find if there's some parameter to change to avoid overfilling the models context but I've not been able to solve it. Any ideas to help with this?

Answered by noelo

Feb 7, 2025

I managed to sort this out at the client level. The AgentConfig class has a max_tokens field in the sampling_params config.

    agent_config = AgentConfig(
        model=selected_model,
        sampling_params={"max_tokens":4096,
            "strategy": {"type": "top_p", "temperature": 1.0, "top_p": 0.9},
        },

I wonder if this should be something passed back by the framework, similar to the model name.

View full answer

terrytangyuan · 2025-02-06T12:45:23Z

terrytangyuan
Feb 6, 2025
Collaborator

You can change this when starting your vLLM server via --max_model_len=6048

1 reply

noelo Feb 6, 2025
Author

thanks for replying, I appreciate the help. Yea I've done that hence the "This model's maximum context length is 6048 tokens" output but I think the llama-stack is not respecting that but I can't find where that it. Now it may be on the llama-stack-app demo that I'm running and I've tried to set the context length there but it's still ignored

noelo · 2025-02-07T13:51:35Z

noelo
Feb 7, 2025
Author

I managed to sort this out at the client level. The AgentConfig class has a max_tokens field in the sampling_params config.

    agent_config = AgentConfig(
        model=selected_model,
        sampling_params={"max_tokens":4096,
            "strategy": {"type": "top_p", "temperature": 1.0, "top_p": 0.9},
        },

I wonder if this should be something passed back by the framework, similar to the model name.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context length management #981

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Context length management #981

noelo Feb 6, 2025

Replies: 2 comments · 1 reply

terrytangyuan Feb 6, 2025 Collaborator

noelo Feb 6, 2025 Author

noelo Feb 7, 2025 Author

noelo
Feb 6, 2025

Replies: 2 comments 1 reply

terrytangyuan
Feb 6, 2025
Collaborator

noelo Feb 6, 2025
Author

noelo
Feb 7, 2025
Author