Skip to content

Commit

Permalink
Merge pull request #378 from RobotSail/fix-docs
Browse files Browse the repository at this point in the history
fix: formatting error
  • Loading branch information
mergify[bot] authored Nov 14, 2024
2 parents b6f07a8 + 9dc14ca commit b72ac26
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/dataset_formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ The generated samples are converted to a training dataset in in the "messages" f

### Leaf Node Dataset (Output)

In order to facilitate [data mixing](./data_mixing.md), the generated samples for each leaf node are stored at ```node_datasets_{self.date_suffix}/{leaf_node_path}.jsonl``. These datasets are suitable for either the "phase 1" (knowledge, aka "phase 0.7" or "p07") or the "phase 2" (skills, aka phase "1.0" or "p10") training phase, and are then referenced by the knowledge and skills data mixing recipes described below.
In order to facilitate [data mixing](./data_mixing.md), the generated samples for each leaf node are stored at `node_datasets_{self.date_suffix}/{leaf_node_path}.jsonl`. These datasets are suitable for either the "phase 1" (knowledge, aka "phase 0.7" or "p07") or the "phase 2" (skills, aka phase "1.0" or "p10") training phase, and are then referenced by the knowledge and skills data mixing recipes described below.

The contents of a dataset for a skill leaf node is straightforward - all of the the generated samples fields along with a "messages" column (as above in [Messages Training Dataset (Output)](#messages-training-dataset-output)) and an additional `id` column containing a unique UUID per sample.

Expand Down

0 comments on commit b72ac26

Please sign in to comment.