We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e.g. thangvip/cosmopedia_vi_math has 300 splits and it takes a very long time to load only one split.
This is due to load_dataset() resolving the files of all the splits even if only one is needed.
load_dataset()
In dataset-viewer the splits are loaded in different jobs so it results in 300 jobs that resolve 300 splits -> 90k calls to /paths-info
dataset-viewer
/paths-info
The text was updated successfully, but these errors were encountered:
This should help fixing this issue: #6832
Sorry, something went wrong.
I'm having a similar issue when using splices:
It seems to be downloading, loading, and generating splits using the entire dataset.
lhoestq
No branches or pull requests
e.g. thangvip/cosmopedia_vi_math has 300 splits and it takes a very long time to load only one split.
This is due to
load_dataset()
resolving the files of all the splits even if only one is needed.In
dataset-viewer
the splits are loaded in different jobs so it results in 300 jobs that resolve 300 splits -> 90k calls to/paths-info
The text was updated successfully, but these errors were encountered: