Finetuning LLM models with PDF documents in h2o-llmstudio #719

sunilswain · 2024-05-23T05:39:27Z

sunilswain
May 23, 2024

Hi,
does anyone know how we can use pdf documents(E.g - Travel Policy document) and use them to finetune our model using llmstudio.

Thanks.

Answered by psinger

Hi,

you would first need to generate input/output pairs for your documents. I answered a similar question already here that might be helpful:
#522

If you want to just to next token training on the text of your pdfs, then you would need to transform it to a csv file with raw text and follow this: https://docs.h2o.ai/h2o-llmstudio/faqs#what-if-my-data-is-not-in-question-and-answer-form-and-i-just-have-documents-how-can-i-fine-tune-the-llm-model

psinger · 2024-05-23T06:35:52Z

Hi,

you would first need to generate input/output pairs for your documents. I answered a similar question already here that might be helpful:
#522

If you want to just to next token training on the text of your pdfs, then you would need to transform it to a csv file with raw text and follow this: https://docs.h2o.ai/h2o-llmstudio/faqs#what-if-my-data-is-not-in-question-and-answer-form-and-i-just-have-documents-how-can-i-fine-tune-the-llm-model

0 replies