Skip to content

Commit

Permalink
[Docs] Updates for SDG README
Browse files Browse the repository at this point in the history
Signed-off-by: Kelly Brown <[email protected]>
  • Loading branch information
kelbrown20 committed Sep 25, 2024
1 parent 432c2d1 commit 0ef4af3
Showing 1 changed file with 72 additions and 1 deletion.
73 changes: 72 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,79 @@
# sdg
# Synthetic Data Generation (SDG)

![Lint](https://github.com/instructlab/sdg/actions/workflows/lint.yml/badge.svg?branch=main)
![Build](https://github.com/instructlab/sdg/actions/workflows/pypi.yaml/badge.svg?branch=main)
![Release](https://img.shields.io/github/v/release/instructlab/sdg)
![License](https://img.shields.io/github/license/instructlab/sdg)

Python library for Synthetic Data Generation

## Introduction

Synthetic Data Generation (SDG) is a process that creates an artificially generated dataset that mimics real data based on provided examples. SDG uses a YAML file containing question-and-answer pairs as input data.

## Installing the SDG library

Clone the library and navigate to the repo:

```bash
git clone https://github.com/instructlab/sdg
cd sdg
```

Install the library:

```bash
pip install .
```

## Using the library

You can import SDG into your Python files with the following items:

Check failure on line 31 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:31:68 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.35.0/doc/md009.md

```python
from instructlab.sdg.generate_data import generate_data
from instructlab.sdg.utils import GenerateException
```

## Pipelines

There are four pipelines that are used in SDG. Each pipeline requires specific hardware specifications.
<!--TODO: Add explanations of pipelines-->

*Full* -

This pipeline is targeted for running SDG on consumer grade accelerators (GPUs).

*Simple* -

This pipeline is targeted for running SDG on CPUs or GPU enhanced CPUs.

### Pipeline architecture

All the pipelines are written in YAML format.

Knowledge:

Grounded Skills:

Freeform Skills:

<!--TODO: Add content here-->

## Repository structure

```bash
|-- sdg/src/instructlab/pipelines/ (1)
|-- sdg/src/instructlab/configs/ (2)
|-- sdg/src/instructlab/utils/ (3)
|-- sdg/docs/ (4)
|-- sdg/scripts/ (5)
|-- sgd/tests/ (6)
```

1. Contains the YAML code that configures the SDG pipelines
2.

Check failure on line 75 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:75:3 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.35.0/doc/md009.md
3.
4.
5.
6. Contains all the CI tests for the SDG repository

0 comments on commit 0ef4af3

Please sign in to comment.