Name		Name	Last commit message	Last commit date
parent directory ..
ai_applications		ai_applications
model_services		model_services
README.md		README.md
ai-studio.yaml		ai-studio.yaml

README.md

Chat Application

This model service is intended be used as the basis for a chat application. It is capable of having arbitrarily long conversations with users and retains a history of the conversation until it reaches the maximum context length of the model. At that point, the service will remove the earliest portions of the conversation from its memory.

To use this model service, please follow the steps below:

Download Model
Build Image
Run Image
Interact with Service
Deploy on Openshift

Deploy Locally

Download model(s)

This example assumes that the developer already has a copy of the model that they would like to use downloaded onto their host machine and located in the /models directory of this repo.

The two models that we have tested and recommend for this example are Llama2 and Mistral. Please download any of the GGUF variants you'd like to use.

For a full list of supported model variants, please see the "Supported models" section of the llama.cpp repository.

cd models

wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf

Build the image

To build the image we will use a build.sh script that will simply copy the desired model and shared code into the build directory temporarily. This prevents any large unused model files in the repo from being loaded into the podman environment during build which can cause a significant slowdown.

cd chatbot/model_services/builds

sh build.sh llama-2-7b-chat.Q5_K_S.gguf arm locallm

The user should provide the model name, the architecture and image name they want to use for the build.

Run the image

Once the model service image is built, it can be run with the following:

podman run -it -p 7860:7860 locallm

Interact with the app

Now the service can be interacted with by going to 0.0.0.0:7860 in your browser.

You can also use the ask.py script under /ai_applications to run the chat application in a terminal. If the --prompt argument is left blank, it will just default to "Hello".

cd chatbot/ai_applications

python ask.py --prompt <YOUR-PROMPT>

Deploy on Openshift

Now that we've developed an application locally that leverages an LLM, we'll want to share it with a wider audience. Let's get it off our machine and run it on OpenShift.

Rebuild for x86

We'll need to rebuild the image for the x86 architecture for most use case outside of our Mac. Since this is an AI workload, we will also want to take advantage of Nvidia GPU's available outside our local machine. Therefore, this image's base image contains CUDA and builds llama.cpp specifically for a CUDA environment.

cp chatapp/model_services/builds

sh build.sh llama-2-7b-chat.Q5_K_S.gguf x86 locallm

Before building the image, you can change line 6 of builds/x86/Containerfile if you'd like to NOT use CUDA and GPU acceleration by setting -DLLAMA_CUBLAS to off

ENV CMAKE_ARGS="-DLLAMA_CUBLAS=off"

Push to Quay

Once you login to quay.io you can push your own newly built version of this LLM application to your repository for use by others.

podman login quay.io

podman push localhost/locallm quay.io/<YOUR-QUAY_REPO>/locallm

Deploy

Now that your model lives in a remote repository we can deploy it. Go to your OpenShift developer dashboard and select "+Add" to use the Openshift UI to deploy the application.

Select "Container images"

Then fill out the form on the Deploy page with your quay.io image name and make sure to set the "Target port" to 7860.

Hit "Create" at the bottom and watch your application start.

Once the pods are up and the application is working, navigate to the "Routs" section and click on the link created for you to interact with your app.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chatbot

chatbot

README.md

Chat Application

Deploy Locally

Download model(s)

Build the image

Run the image

Interact with the app

Deploy on Openshift

Rebuild for x86

Push to Quay

Deploy

Files

chatbot

Directory actions

More options

Directory actions

More options

Latest commit

History

chatbot

Folders and files

parent directory

README.md

Chat Application

Deploy Locally

Download model(s)

Build the image

Run the image

Interact with the app

Deploy on Openshift

Rebuild for x86

Push to Quay

Deploy