We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No response
I wonder how data shuffling works when using streaming option. I understand that data shuffling is applied on each buffer.
If I have total 10,000 data and set buffer_size as 1,000.
The buffer order is shuffled.
The text was updated successfully, but these errors were encountered:
You can refer to the document for details https://huggingface.co/docs/datasets/v3.2.0/en/package_reference/main_classes#datasets.IterableDataset.shuffle https://huggingface.co/docs/datasets/en/stream#shuffle If your dataset has multiple shards, the order of shards will be shuffled also.
Sorry, something went wrong.
No branches or pull requests
Reminder
System Info
No response
Reproduction
I wonder how data shuffling works when using streaming option.
I understand that data shuffling is applied on each buffer.
If I have total 10,000 data and set buffer_size as 1,000.
Expected behavior
The buffer order is shuffled.
Others
No response
The text was updated successfully, but these errors were encountered: