Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newbie Qs: RLHF fine-tuning & dataset #6470

Open
vtharmalingam opened this issue Dec 28, 2024 · 0 comments
Open

Newbie Qs: RLHF fine-tuning & dataset #6470

vtharmalingam opened this issue Dec 28, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@vtharmalingam
Copy link

vtharmalingam commented Dec 28, 2024

First off, thank you for the awesome library!!

I want to train Qwen for RLHF fine-tuning

Here is the use case context: My LLM is responding to user queries, and both query and response are tracked for human validation. The human feedback is given as a scalar value between 0 and 1. That makes up the dataset for fine-tuning the model.

So the question here is:

What is the acceptable dataset format? Will the below format work for the finetuning? Also, please throw some lights as to how the dataset structure/format is flexible enough if I need to add an additional key/value in the JSON for my domain/context needs—does it give such flexibility? If yes, which Python file or configuration do I need to edit the new field?

```json	
[
        {
            "query": "What are the benefits of regular exercise?",
            "response": "Regular exercise boosts physical health, improves mental health, and enhances overall well-being. It helps in weight management and reduces the risk of chronic diseases.",
            "feedback": 0.9
        },
        {
            "query": "Explain the theory of relativity in simple terms.",
            "response": "The theory of relativity states that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant no matter how fast you are moving. It includes both special and general relativity.",
            "feedback": 0.8
        },
```

Thanks,
Tharma

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant