Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone nanotron config #285

Merged
merged 13 commits into from
Sep 4, 2024
Merged

Standalone nanotron config #285

merged 13 commits into from
Sep 4, 2024

Conversation

hynky1999
Copy link
Collaborator

What does this implement/fix? Explain your changes.

This PR moves the lighteval config to lighteval codebase.

  • Enforces the lighteval_config_path as the only way to read the lighteval config. The nanotron part is ignore, this way the breaking changes won't be as breaking.
  • Some typing corrections
    Comments

  • Haven't tested yet as I don't want to rebase on nanotron fix PR so waiting for that to be merged

@hynky1999 hynky1999 changed the title Standalon nanotron config Standalone nanotron config Aug 30, 2024
@NathanHB
Copy link
Member

NathanHB commented Sep 3, 2024

I took a look, tested and fixed a few bumps, it should work tell me what you thikn !

@hynky1999
Copy link
Collaborator Author

hynky1999 commented Sep 3, 2024

@NathanHB Thanks for fixing the problems.
I made small nits to the config (outpid_dir type to str, as that's what it's in cli interface).

Two more things:

  • I changed the nanotron model to take batch_size from override_bs (as do accelerate models), I still think it's very confusing, but leaving it as it is
  • I noticed that batch detection for nanotron doesn't work. Created an issue for that. I think it's low prio but let's keep track of it [BUG] Nanotron batch detection doesn't work #286

This was referenced Sep 3, 2024
@NathanHB NathanHB merged commit aaa8bbf into main Sep 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants