-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFTTrainer
Raises NotImplementedError with IterableDataset
#2138
Comments
The trl library can handle IterableDataset and it was actually fixed check this out pr |
@dame-cell ya i did check the PR , you can see the last few comments mentioning to do |
This is what happening,the function |
Hmm did you try running the code without unsloth? like just using the trl library |
where? can you provide the full traceback? I currently can't reproduce the error with the latest version (31b7820) of TRL. The use of an iterable dataset works as expected |
System Info
Google Colab
Description
When attempting to fine-tune a model using the
SFTTrainer
with anIterableDataset
, an error occurs because theSFTTrainer
expects a dataset that supports random access (__getitem__
). This is problematic when working with large datasets that cannot be loaded into memory at once and require streaming.Error Message
Context : This issue is especially relevant for fine-tuning on very large datasets, where memory constraints make it impractical to load the dataset fully into memory.
Information
Tasks
examples
folderReproduction
Expected behavior
The
NotImplementedError
is raised when the trainer tries to access the dataset.The text was updated successfully, but these errors were encountered: