We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dump raw training data for the LLM-jp-3 series. For each training instance, the following fields should be included at least:
token_ids
training_step
dataset
document_ids
The text was updated successfully, but these errors were encountered:
https://github.com/llm-jp/Megatron-LM/tree/nii-geniac-dump
Sorry, something went wrong.
hkiyomaru
t0-0
No branches or pull requests
Dump raw training data for the LLM-jp-3 series. For each training instance, the following fields should be included at least:
token_ids
: A list of token IDs for the training instancetraining_step
: Training step at which the training instance was processeddataset
: Name of the dataset from which the instance was sourceddocument_ids
: IDs of the documents associated with the training instanceThe text was updated successfully, but these errors were encountered: