Use knowledge data for phase 1 training and skills data for phase 2 training in standalone script #113
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #86
Updates the
data_processing_op()
to output two datasets,skills_processed_data
andknowledge_processed_data
instead a singleprocessed_data
output. These new datasets are stored in.../processed_data/knowledge
and.../processed_data/skills
respectively.This changes requires that the SDG data used contains both a
knowledge_train_msgs*.jsonl
file as well as askills_train_msgs*.jsonl
file (which it should, but may cause a failure if a small PR is used to generate the SDG data).Updates the
PYTORCH_TRAINING_JOB
template instandalone.tpl
to use theknowledge_processed_data
for phase 1 training andskills_processed_data
for phase 2.Note: This PR does not add this feature to the KFP, I need to update the
pytorchJob_manifest_op()
to reflect this change. However, that function has fallen a bit behind the standalone version at this point so will bring it back to parity in a later PR.