Augment datasets using Large Language Models by generating synthetic data. This repository has simple examples you can use to generate Synthetic Data and persist it to disk as a CSV file.
This example can run in Codespaces but you can use the following if you are cloniing this repository:
Install the dependencies
Create the virtual environment and install the dependencies:
python3 -m venv .venv
source .venv/bin/activate
.venv/bin/pip install -r requirements.txt
Here is a summary of what this repository will use:
- Llamafile for the LLM (alternatively you can use an OpenAI API compatible key and endpoint)
- OpenAI's Python API to connect to the LLM
- A large language model (LLM) to generate synthetic data like Mixtral or using an OpenAI API based service.