[Tracking issue] General dataset support #2071

qgallouedec · 2024-09-15T13:39:31Z

The aim is for all trainers to apply the same procedure in their init function:

if needed, apply the chat template, then
if needed, tokenize.

Support todo:

Standard dataset

Conversational dataset

Misc

Update docs/dataset_format.mdx

The text was updated successfully, but these errors were encountered:

qgallouedec added the ✨ enhancement New feature or request label Sep 15, 2024

This was referenced Sep 18, 2024

Conversational dataset support for Online DPO #2075

Merged

Add support for ShareGPT-formatted datasets #2083

Open

[RewardTrainer] Tokenize inputs within trainer #2102

Merged

BCOTrainer conversational dataset support #2107

Merged

This was referenced Sep 27, 2024

Conversational dataset support for DPOTrainer #2131

Merged

Conversational dataset support for CPOTrainer #2144

Merged

qgallouedec mentioned this issue Oct 5, 2024

Conversational dataset support for ORPOTrainer #2184

Merged

5 tasks

qgallouedec added the 🗃️ data Related to data label Oct 7, 2024

qgallouedec mentioned this issue Oct 18, 2024

Conversational dataset support for KTOTrainer #2248

Open

5 tasks

qgallouedec pinned this issue Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking issue] General dataset support #2071

[Tracking issue] General dataset support #2071

qgallouedec commented Sep 15, 2024 •

edited

Loading

[Tracking issue] General dataset support #2071

[Tracking issue] General dataset support #2071

Comments

qgallouedec commented Sep 15, 2024 • edited Loading

Support todo:

Standard dataset

Conversational dataset

Misc

qgallouedec commented Sep 15, 2024 •

edited

Loading