You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While setting up the framework to evaluate using LLM-as-judge, it would be helpful to test end-to-end without special permissions like setting up openai_key or HF pro subscription. The current models in src/lighteval/metrics/metrics.py contain the following options:
gpt-3.5-turbo
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
When trying to call the llama model, a free HF_TOKEN gives the following error:
(<class 'openai.BadRequestError'>, BadRequestError("Error code: 400 - {'error': 'Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query.'}"))
Solution/Feature
I tried to define a new llm judge using a smaller model:
However, this gave a different error that I not been able to figure out how to resolve. There is an error related to using the OpenAI API even while the main intent was to call a tinyllama model.
INFO:httpx:HTTP Request: POST https://api-inference.huggingface.co/v1/chat/completions "HTTP/1.1 422 Unprocessable Entity"
WARNING:lighteval.logging.hierarchical_logger: (<class 'openai.UnprocessableEntityError'>, UnprocessableEntityError("Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}"))
INFO:httpx:HTTP Request: POST https://api-inference.huggingface.co/v1/chat/completions "HTTP/1.1 422 Unprocessable Entity"
WARNING:lighteval.logging.hierarchical_logger: (<class 'openai.UnprocessableEntityError'>, UnprocessableEntityError("Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}"))
INFO:httpx:HTTP Request: POST https://api-inference.huggingface.co/v1/chat/completions "HTTP/1.1 422 Unprocessable Entity"
WARNING:lighteval.logging.hierarchical_logger: (<class 'openai.UnprocessableEntityError'>, UnprocessableEntityError("Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}"))
WARNING:lighteval.logging.hierarchical_logger: } [0:00:48.373629]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:56.466097]
Traceback (most recent call last):
File "/Users/chuandu/Documents/workspace/legal_llm_evaluation/llm_eval_env/bin/lighteval", line 8, in <module>
sys.exit(cli_evaluate())
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/__main__.py", line 58, in cli_evaluate
main_accelerate(args)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
return fn(*args, **kwargs)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/main_accelerate.py", line 92, in main
pipeline.evaluate()
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/pipeline.py", line 236, in evaluate
self._compute_metrics(sample_id_to_responses)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/pipeline.py", line 288, in _compute_metrics
metrics = compute_metric(results=sample_responses, formatted_doc=doc, metrics=metric_category_metrics)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/metrics/__init__.py", line 211, in apply_llm_as_judge_metric
outputs.update(metric.compute(predictions=predictions, formatted_doc=formatted_doc))
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/metrics/utils.py", line 74, in compute
return self.sample_level_fn(**kwargs) # result, formatted_doc,
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/metrics/metrics_sample.py", line 811, in compute
scores, messages, judgements = self.judge.evaluate_answer(questions, predictions, ref_answers)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/metrics/llm_as_judge.py", line 158, in evaluate_answer
response = self.__call_api(prompt)
File "/Users/chuandu/Documents/workspace/lighteval/src/lighteval/metrics/llm_as_judge.py", line 259, in __call_api
raise Exception("Failed to get response from the API")
Exception: Failed to get response from the API
Thank you!
The text was updated successfully, but these errors were encountered:
I suspect this model is not provided by the free version of inference endpoints on the fly - can you try with llama 3.1 70B for example, or command R +?
Thank you for the feedback! @JoelNiklaus figured out that it's because we should feed in use_transformers=True when constructing the judge instance. Do you think it would be helpful to add an example like this in metrics.py or as a note in the README?
Issue encountered
While setting up the framework to evaluate using LLM-as-judge, it would be helpful to test end-to-end without special permissions like setting up openai_key or HF pro subscription. The current models in
src/lighteval/metrics/metrics.py
contain the following options:When trying to call the llama model, a free HF_TOKEN gives the following error:
Solution/Feature
I tried to define a new llm judge using a smaller model:
However, this gave a different error that I not been able to figure out how to resolve. There is an error related to using the OpenAI API even while the main intent was to call a tinyllama model.
Thank you!
The text was updated successfully, but these errors were encountered: