Skip to content

Latest commit

 

History

History
 
 

chatbot

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Chat Application

This model service is intended be used as the basis for a chat application. It is capable of having arbitrarily long conversations with users and retains a history of the conversation until it reaches the maximum context length of the model. At that point, the service will remove the earliest portions of the conversation from its memory.

To use this model service, please follow the steps below:

Deploy Locally

Download model(s)

This example assumes that the developer already has a copy of the model that they would like to use downloaded onto their host machine and located in the /models directory of this repo.

The two models that we have tested and recommend for this example are Llama2 and Mistral. Please download any of the GGUF variants you'd like to use.

For a full list of supported model variants, please see the "Supported models" section of the llama.cpp repository.

cd models

wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf

Build the image

To build the image we will use a build.sh script that will simply copy the desired model and shared code into the build directory temporarily. This prevents any large unused model files in the repo from being loaded into the podman environment during build which can cause a significant slowdown.

cd chatbot/model_services/builds

sh build.sh llama-2-7b-chat.Q5_K_S.gguf arm locallm

The user should provide the model name, the architecture and image name they want to use for the build.

Run the image

Once the model service image is built, it can be run with the following:

podman run -it -p 7860:7860 locallm

Interact with the app

Now the service can be interacted with by going to 0.0.0.0:7860 in your browser.

You can also use the ask.py script under /ai_applications to run the chat application in a terminal. If the --prompt argument is left blank, it will just default to "Hello".

cd chatbot/ai_applications

python ask.py --prompt <YOUR-PROMPT>

Deploy on Openshift

Now that we've developed an application locally that leverages an LLM, we'll want to share it with a wider audience. Let's get it off our machine and run it on OpenShift.

Rebuild for x86

We'll need to rebuild the image for the x86 architecture for most use case outside of our Mac. Since this is an AI workload, we will also want to take advantage of Nvidia GPU's available outside our local machine. Therefore, this image's base image contains CUDA and builds llama.cpp specifically for a CUDA environment.

cp chatapp/model_services/builds

sh build.sh llama-2-7b-chat.Q5_K_S.gguf x86 locallm

Before building the image, you can change line 6 of builds/x86/Containerfile if you'd like to NOT use CUDA and GPU acceleration by setting -DLLAMA_CUBLAS to off

ENV CMAKE_ARGS="-DLLAMA_CUBLAS=off"

Push to Quay

Once you login to quay.io you can push your own newly built version of this LLM application to your repository for use by others.

podman login quay.io
podman push localhost/locallm quay.io/<YOUR-QUAY_REPO>/locallm

Deploy

Now that your model lives in a remote repository we can deploy it. Go to your OpenShift developer dashboard and select "+Add" to use the Openshift UI to deploy the application.

Select "Container images"

Then fill out the form on the Deploy page with your quay.io image name and make sure to set the "Target port" to 7860.

Hit "Create" at the bottom and watch your application start.

Once the pods are up and the application is working, navigate to the "Routs" section and click on the link created for you to interact with your app.