Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new entries to model registry #141

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Gnurro
Copy link
Collaborator

@Gnurro Gnurro commented Dec 11, 2024

Added models: EuroLLM-9B-Instruct, QwQ-32B-Preview, aya-expanse-32b, Teuken-7B-instruct-research-v0.4 and Teuken-7B-instruct-commercial-v0.4
Teuken models may not work with the default huggingface backend without modification as they use extensive custom transformers code and require trust_remote_code=True.

…instruct-research-v0.4 and Teuken-7B-instruct-commercial-v0.4 entries to the model registry
@sherzod-hakimov
Copy link
Contributor

Please then test whether these models run one experiment of any game and exclude the ones that don't run.

@Gnurro
Copy link
Collaborator Author

Gnurro commented Dec 12, 2024

The Teuken models can be run as-is, but require a manual input in terminal each time they are loaded, which happens once for each clemgame. This makes batch-running the entire benchmark a lot more complicated, as the prompt to run remote code has a rather short time limit before it simply fails to load the model. This could be handled by adding handling of trust_remote_code in the HF backend code and model registry, as I've already written here: https://github.com/Gnurro/clembench/blob/hf_trust_remote_code/backends/huggingface_local_api.py
We've so far refrained from running models that require this, as is customary in benchmarking LLMs, so I'll remove the Teuken model entries.

…cial-v0.4 entries from the model registry due to them requiring custom code to run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants