You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-08-07:18:01:48,442 INFO [evaluator_utils.py:200] Request: Instance(request_type='generate_until', doc={'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18'}, arguments=(JsonChatStr(prompt='[{"role": "user", "content": "Q: Janet\\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers\' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers\' market?\\nA: Let\'s think step by step."}]'), {'until': ['Q:', '</s>', '<|im_end|>'], 'do_sample': False}), idx=0, metadata=('gsm8k_cot_zeroshot', 0, 1), resps=[], filtered_resps={}, task_name='gsm8k_cot_zeroshot', doc_id=0, repeats=1)
2024-08-07:18:01:48,442 INFO [evaluator.py:457] Running generate_until requests
Requesting API: 0%| | 0/10 [00:00<?, ?it/s]2024-08-07:18:01:48,539 WARNING [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:49,550 WARNING [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:50,558 WARNING [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
Traceback (most recent call last):
File "/opt/conda/envs/py310/bin/lm_eval", line 8, in <module>
sys.exit(cli_evaluate())
File "/home/ubuntu/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
results = evaluator.simple_evaluate(
File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
results = evaluate(
File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 562, in generate_until
outputs = retry(
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 336, in wrapped_f
return copy(f, *args, **kw)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 475, in __call__
do = self.iter(retry_state=retry_state)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 376, in iter
result = action(retry_state)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
result = fn(*args, **kwargs)
File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 345, in model_call
response.raise_for_status()
File "/opt/conda/envs/py310/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 424 Client Error: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format' for url: http://localhost:8080/v1/chat/completions/model
Requesting API: 0%|
What have you tried to solve it?
The text was updated successfully, but these errors were encountered:
I can confirm seeing this issue in djl-inference:0.29.0-tensorrtllm0.11.0-cu124.
Steps to reproduce:
Send a POST request with the stop parameter:
{
"inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nYou are rolling a 12-sided dice twice.\n\nQuestion: Can I win more than once?\n<|eot_id|>\n\n<|start_header_id|>assistant<|end_header_id|> Answer:",
"parameters": {
"do_sample": false,
"details": false,
"temperature": 0.7,
"top_p": 0.92,
"max_new_tokens": 220,
"stop": ["<|eot_id|>"]
}
}
Note: the model does not stop on "<|eot_id|>" so the stop parameter is needed.
Description
(A clear and concise description of what the bug is.)
There are 2 different behavior in LMI trtllm containers during testing gsm8k dataset via lm_eval_harness on model llama-2-7b.
Expected Behavior
(what's the expected behavior?)
Expect lm_eval_harness is able to generate report when the djl-inference:0.29.0-tensorrtllm0.11.0-cu124 is applied.
Error Message
(Paste the complete error message, including stack trace.)
Error log in djl-inference:0.29.0-tensorrtllm0.11.0-cu124
How to Reproduce?
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
aws s3 sync s3://djl-llm/llama-2-7b-hf/ llama-2-7b-hf/
docker run -it --gpus all --shm-size 20g -v /home/ubuntu/trtllm/llama-2-7b:/opt/ml/model -p 8080:8080 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124
lm_eval --model local-chat-completions --tasks gsm8k_cot_zeroshot --model_args model=meta-llama/Meta-Llama-2-7B,base_url=http://localhost:8080/v1/chat/completions/model,tokenized_requests=True --limit 10 --apply_chat_template --write_out --log_samples --output_path ~/trtllm/lm_eval/output_llama-2-7b-gsm8k_cot_zeroshot_v11
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: