Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/fireworks integration #2089

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f1ad995
FEAT: Added Fireworks Integration
somashekhar161 Sep 20, 2024
a38675b
FEAT: Added Fireworks Integration
somashekhar161 Sep 20, 2024
2d5865e
FEAT: Added fixed UI mode name
somashekhar161 Sep 20, 2024
c5c1f9b
fixed at ui height overlow
somashekhar161 Sep 20, 2024
241637f
FEAT: Added Fireworks integration
somashekhar161 Sep 20, 2024
4d22546
Added dockerfile and docker-compose for fireworks
somashekhar161 Sep 21, 2024
519c48b
Merge branch 'main' of github.com:somashekhar161/private-gpt into fea…
somashekhar161 Sep 24, 2024
9c3590e
Added embedded model option for fireworks \n Added documentation for …
somashekhar161 Sep 24, 2024
cecec30
fixed test black error
somashekhar161 Sep 24, 2024
80f15a1
fixed ruff chekc
somashekhar161 Sep 24, 2024
b807e50
fixed mypy private_gpt for llama-index
somashekhar161 Sep 24, 2024
6a46060
fixed mypy ignored mypy-llama-index-embeddings-fireworks
somashekhar161 Sep 24, 2024
03e8809
fixed mypy ignored llama-index-embeddings-fireworks
somashekhar161 Sep 24, 2024
0ff7a06
fixed mypy ignored tool.mypy-llama_index.embeddings.fireworks
somashekhar161 Sep 24, 2024
b2ffe5b
fixed mypy
somashekhar161 Sep 24, 2024
16d1f60
fixed mypy ignored tool.mypy-llama_index.embeddings.fireworks
somashekhar161 Sep 24, 2024
b8cb49a
updated dependencies poetry lock
somashekhar161 Sep 24, 2024
2052ff4
added # type: ignore for embeddings.fireworks
somashekhar161 Sep 24, 2024
5334dda
fixed ruff and black
somashekhar161 Sep 24, 2024
c846b3f
revert back to main branch's dependecy version
somashekhar161 Sep 25, 2024
62985df
resolved dependecies
somashekhar161 Sep 26, 2024
c4be3f8
Merge branch 'zylon-ai:main' into feat/fireworks-integration
somashekhar161 Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions Dockerfile.fireworks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that the Docker file is useless in this case. It's a better idea if someone wants to use fireworks, make the necessary modifications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the docker file is only for running it even if the make modification docker file is usefull cause of dependency errors
python version mismatch etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this premise.
Adding another Docker/Docker-compose profile, will mean more code to maintain, when it is not the optimal PGPT user case. If fireworks was a fully local-setup environment, probably, it would be nice.

Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
FROM python:3.11.6-slim-bookworm as base

# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry==1.8.3
ENV PATH="/root/.local/bin:$PATH"
ENV PATH=".venv/bin/:$PATH"

RUN apt update && apt install -y \
build-essential

# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true

FROM base as dependencies
WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./

ARG POETRY_EXTRAS="ui llms-fireworks embeddings-fireworks vector-stores-qdrant embeddings-openai"
RUN poetry install --no-root --extras "${POETRY_EXTRAS}"

FROM base as app
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV APP_ENV=prod
ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
EXPOSE 8080

# Prepare a non-root user
# More info about how to configure UIDs and GIDs in Docker:
# https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md

# Define the User ID (UID) for the non-root user
# UID 100 is chosen to avoid conflicts with existing system users
ARG UID=100

# Define the Group ID (GID) for the non-root user
# GID 65534 is often used for the 'nogroup' or 'nobody' group
ARG GID=65534

RUN adduser --system --gid ${GID} --uid ${UID} --home /home/worker worker
WORKDIR /home/worker/app

RUN chown worker /home/worker/app
RUN mkdir local_data && chown worker local_data
RUN mkdir models && chown worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker *.yaml .
COPY --chown=worker scripts/ scripts

USER worker
ENTRYPOINT python -m private_gpt
25 changes: 21 additions & 4 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
services:

#-----------------------------------
#---- Private-GPT services ---------
#-----------------------------------

# Private-GPT service for the Ollama CPU and GPU modes
# This service builds from an external Dockerfile and runs the Ollama mode.
private-gpt-ollama:
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-ollama # x-release-please-version
image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-ollama # x-release-please-version
build:
context: .
dockerfile: Dockerfile.ollama
Expand Down Expand Up @@ -80,7 +79,7 @@ services:
ollama-cpu:
image: ollama/ollama:latest
volumes:
- ./models:/root/.ollama
- ./local_data:/root/.ollama
profiles:
- ""
- ollama-cpu
Expand All @@ -98,4 +97,22 @@ services:
count: 1
capabilities: [gpu]
profiles:
- ollama-cuda
- ollama-cuda

# fireworks service
private-gpt-fireworks:
build:
context: .
dockerfile: Dockerfile.fireworks
volumes:
- ./local_data/:/home/worker/app/local_data
ports:
- "3001:8080"
environment:
PORT: 8080
PGPT_PROFILES: fireworks
FIREWORKS_API_KEY: ${FIREWORKS_API_KEY}
env_file:
- .env
profiles:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as Dockerfile

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i have used how other docker files and docker-compose are written
followed those template so that it won't be outsider :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the same. It is not the core of PGPT, therefore, we should not give this kind of support. Our goal is to give a 100% private solution, in this area, our two main providers are Ollama and Llama-CPP. Of course, this PR gives more value to other people with the same problems, but it doesn't make sense to maintain on docker-compose and Dockerfile.

- fireworks
4,339 changes: 2,329 additions & 2,010 deletions poetry.lock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how the new dependencies are installed, but it bumps all dependencies. Please revert the current poetry.lock and update it using poetry lock --no-update. Many of the changes you have are related to bump versions. Probably, you will have to revert some formatting changes as well. Anyway, I will try to update Llama-index to the latest version, including all other dependencies. If you prefer to wait and merge it when it is ready, I can send you a ping.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had bumped it while i was having error when testing at github
will revert back to old version :)

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions private_gpt/components/embedding/embedding_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,19 @@ def __init__(self, settings: Settings) -> None:
api_key=api_key,
model=model,
)
case "fireworks":
try:
from llama_index.embeddings.fireworks import FireworksEmbedding
except ImportError as e:
raise ImportError(
"FireworksEmbedding dependencies not found, install with `poetry install --extras embeddings-fireworks`"
) from e

api_key = settings.fireworks.embedding_api_key or settings.fireworks.api_key

self.embedding_model = FireworksEmbedding(
api_key=api_key,
)
case "ollama":
try:
from llama_index.embeddings.ollama import ( # type: ignore
Expand Down
13 changes: 13 additions & 0 deletions private_gpt/components/llm/llm_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,19 @@ def __init__(self, settings: Settings) -> None:
api_key=openai_settings.api_key,
model=openai_settings.model,
)
case "fireworks":
try:
from llama_index.llms.fireworks import Fireworks # type: ignore
except ImportError as e:
raise ImportError(
"fireworks dependencies not found, install with `poetry install --extras llms-fireworks`"
) from e

fireworks_settings = settings.fireworks
self.llm = Fireworks(
model=fireworks_settings.model,
api_key=fireworks_settings.api_key,
)
case "openailike":
try:
from llama_index.llms.openai_like import OpenAILike # type: ignore
Expand Down
19 changes: 18 additions & 1 deletion private_gpt/settings/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ class LLMSettings(BaseModel):
"mock",
"ollama",
"gemini",
"fireworks",
]
max_new_tokens: int = Field(
256,
Expand Down Expand Up @@ -197,7 +198,7 @@ class HuggingFaceSettings(BaseModel):

class EmbeddingSettings(BaseModel):
mode: Literal[
"huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini"
"huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini","fireworks"
]
ingest_mode: Literal["simple", "batch", "parallel", "pipeline"] = Field(
"simple",
Expand Down Expand Up @@ -260,6 +261,21 @@ class OpenAISettings(BaseModel):
description="OpenAI embedding Model to use. Example: 'text-embedding-3-large'.",
)

class FireWorksSettings(BaseModel):
api_key: str
model: str = Field(
"accounts/fireworks/models/llama-v3p1-70b-instruct",
description="FireWorks Model to use. Example: 'accounts/fireworks/models/llama-v3p1-70b-instruct'.",
)
# embedding_api_base: str = Field(
# None,
# description="Base URL of OpenAI API. Example: 'https://api.openai.com/v1'.",
# )
embedding_api_key: str
# embedding_model: str = Field(
# "text-embedding-ada-002",
# description="OpenAI embedding Model to use. Example: 'text-embedding-3-large'.",
# )
jaluma marked this conversation as resolved.
Show resolved Hide resolved

class GeminiSettings(BaseModel):
api_key: str
Expand Down Expand Up @@ -586,6 +602,7 @@ class Settings(BaseModel):
huggingface: HuggingFaceSettings
sagemaker: SagemakerSettings
openai: OpenAISettings
fireworks: FireWorksSettings
gemini: GeminiSettings
ollama: OllamaSettings
azopenai: AzureOpenAISettings
Expand Down
3 changes: 2 additions & 1 deletion private_gpt/ui/ui.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@ def _build_ui_blocks(self) -> gr.Blocks:
".contain { display: flex !important; flex-direction: column !important; }"
"#component-0, #component-3, #component-10, #component-8 { height: 100% !important; }"
"#chatbot { flex-grow: 1 !important; overflow: auto !important;}"
"#col { height: calc(100vh - 112px - 16px) !important; }"
"#col { min-height: calc(100vh - 112px - 16px) !important; }"
"hr { margin-top: 1em; margin-bottom: 1em; border: 0; border-top: 1px solid #FFF; }"
".avatar-image { background-color: antiquewhite; border-radius: 2px; }"
".footer { text-align: center; margin-top: 20px; font-size: 14px; display: flex; align-items: center; justify-content: center; }"
Expand Down Expand Up @@ -518,6 +518,7 @@ def get_model_label() -> str | None:
model_mapping = {
"llamacpp": config_settings.llamacpp.llm_hf_model_file,
"openai": config_settings.openai.model,
"fireworks": config_settings.fireworks.model,
"openailike": config_settings.openai.model,
"azopenai": config_settings.azopenai.llm_model,
"sagemaker": config_settings.sagemaker.llm_endpoint_name,
Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ llama-index-vector-stores-postgres = {version ="^0.1.11", optional = true}
llama-index-vector-stores-clickhouse = {version ="^0.1.3", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.3", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.4", optional = true}
llama-index-llms-fireworks = {version = "^0.1.5", optional = true}
llama-index-embeddings-fireworks = {version = "^0.1.2", optional = true}

# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
asyncpg = {version="^0.29.0", optional = true}
Expand Down Expand Up @@ -90,6 +93,8 @@ vector-stores-postgres = ["llama-index-vector-stores-postgres"]
vector-stores-milvus = ["llama-index-vector-stores-milvus"]
storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-index-storage-index-store-postgres","psycopg2-binary","asyncpg"]
rerank-sentence-transformers = ["torch", "sentence-transformers"]
llms-fireworks = ["llama-index-llms-fireworks"]
embeddings-fireworks = ["llama-index-embeddings-fireworks"]

[tool.poetry.group.dev.dependencies]
black = "^22"
Expand Down
13 changes: 13 additions & 0 deletions settings-fireworks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
server:
env_name: ${APP_ENV:fireworks}

llm:
mode: fireworks

embedding:
mode: fireworks

fireworks:
api_key: ${FIREWORKS_API_KEY:}
model: "accounts/fireworks/models/llama-v3p1-70b-instruct"
#poetry install --extras "ui llms-fireworks embeddings-fireworks vector-stores-qdrant embeddings-openai"
27 changes: 16 additions & 11 deletions settings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ llm:
context_window: 3900
# Select your tokenizer. Llama-index tokenizer is the default.
# tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
temperature: 0.1 # The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

rag:
similarity_top_k: 2
Expand All @@ -68,19 +68,19 @@ summarize:
use_async: true

clickhouse:
host: localhost
port: 8443
username: admin
password: clickhouse
database: embeddings
host: localhost
port: 8443
username: admin
password: clickhouse
database: embeddings

llamacpp:
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_penalty: 1.1 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 1.0 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_penalty: 1.1 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)

embedding:
# Should be matching the value above in most cases
Expand Down Expand Up @@ -126,11 +126,16 @@ openai:
model: gpt-3.5-turbo
embedding_api_key: ${OPENAI_API_KEY:}

fireworks:
api_key: ${FIREWORKS_API_KEY:}
model: "accounts/fireworks/models/llama-v3p1-70b-instruct"
embedding_api_key: ${FIREWORKS_API_KEY:}

ollama:
llm_model: llama3.1
embedding_model: nomic-embed-text
api_base: http://localhost:11434
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
embedding_api_base: http://localhost:11434 # change if your embedding model runs on another ollama
keep_alive: 5m
request_timeout: 120.0
autopull_models: true
Expand Down
Loading