Newbie Qs: RLHF fine-tuning & dataset #6470

vtharmalingam · 2024-12-28T20:06:22Z

First off, thank you for the awesome library!!

I want to train Qwen for RLHF fine-tuning

Here is the use case context: My LLM is responding to user queries, and both query and response are tracked for human validation. The human feedback is given as a scalar value between 0 and 1. That makes up the dataset for fine-tuning the model.

So the question here is:

What is the acceptable dataset format? Will the below format work for the finetuning? Also, please throw some lights as to how the dataset structure/format is flexible enough if I need to add an additional key/value in the JSON for my domain/context needs—does it give such flexibility? If yes, which Python file or configuration do I need to edit the new field?

```json	
[
        {
            "query": "What are the benefits of regular exercise?",
            "response": "Regular exercise boosts physical health, improves mental health, and enhances overall well-being. It helps in weight management and reduces the risk of chronic diseases.",
            "feedback": 0.9
        },
        {
            "query": "Explain the theory of relativity in simple terms.",
            "response": "The theory of relativity states that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant no matter how fast you are moving. It includes both special and general relativity.",
            "feedback": 0.8
        },
```

Thanks,
Tharma

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newbie Qs: RLHF fine-tuning & dataset #6470

Newbie Qs: RLHF fine-tuning & dataset #6470

vtharmalingam commented Dec 28, 2024 •

edited

Loading

Newbie Qs: RLHF fine-tuning & dataset #6470

Newbie Qs: RLHF fine-tuning & dataset #6470

Comments

vtharmalingam commented Dec 28, 2024 • edited Loading

vtharmalingam commented Dec 28, 2024 •

edited

Loading