From bd9ddbffd07a26aa7a07edc754eedae9c4f44af3 Mon Sep 17 00:00:00 2001 From: Kelly Brown <86735520+kelbrown20@users.noreply.github.com> Date: Tue, 19 Nov 2024 11:20:53 -0500 Subject: [PATCH] [Docs] Update taxonomy docs to show PDF consumption (#1348) **Description:** InstructLab 0.21.0 uses a version of SDG that allows users to specify a PDF file they have in their git repository as a valid document type. Updating taxonomy docs due to this update Def would love feedback about how this should look from the taxonomy perspective! Signed-off-by: Kelly Brown --- README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 088d308a4..74e4ce178 100644 --- a/README.md +++ b/README.md @@ -256,8 +256,11 @@ Knowledge is supported by documents, such as a textbook, technical manual, encyc Knowledge in the taxonomy tree consists of a few more elements than skills: +> [!IMPORTANT] +> If you are using InstructLab version `0.21.0` or above, you can specify PDF files in your knowledge `qna.yaml` file as a valid document type. Any previous version of InstructLab still only consumes knowledge documents in markdown format. + - Each knowledge node in the tree has a `qna.yaml`, similar to the format of the `qna.yaml` for skills. -- ⭐ Knowledge submissions require you to create a Git repository, can be with GitHub, that contains the markdown files of your knowledge contributions. These contributions in your repository must use the markdown (.md) format. +- ⭐ Knowledge submissions require you to create a Git repository, can be with GitHub, that contains the files of your knowledge contributions. - The `qna.yaml` includes parameters that contain information from your repository. > [!TIP] @@ -279,9 +282,9 @@ The `qna.yaml` format must include the following fields: - `answer`: Specify the desired answer from the model. Each `qna.yaml` file needs at least three question and answer pairs per `context` chunk with a maximum word count of 250 words. - `document_outline`: Describe an overview of the document your submitting. - `document`: The source of your knowledge contribution. - - `repo`: The URL to your repository that holds your knowledge markdown files. - - `commit`: The SHA of the commit in your repository with your knowledge markdown files. - - `patterns`: A list of glob patterns specifying the markdown files in your repository. Any glob pattern that starts with `*`, such as `*.md`, must be quoted due to YAML rules. For example, `"*.md"`. + - `repo`: The URL to your repository that holds your knowledge files. + - `commit`: The SHA of the commit in your repository with your knowledge files. + - `patterns`: A list of glob patterns specifying the files in your repository. Any glob pattern that starts with `*`, such as `*.md`, must be quoted due to YAML rules. For example, `"*.md"`. ### Knowledge: YAML examples