This is the main deployment repository for the lcls-cu-inj-model-deployment-test of the lcls-cu-inj-model model using LUME-Model in an online environment.
This repository was created from the lume-model-deployment-template using copier.
It provides a structured approach to deploying machine learning models using LUME-Model
in an online environment. It offers a reproducible method for containerizing and deploying ML models, ensuring consistency
and ease of use across different projects, while minimizing boilerplate code for deployment.
Please refer to the original template repository for detailed documentation on its features and usage. If the template is updated and you want to apply changes to your project:
copier updateThis will re-apply the template, preserving your answers and customizations where possible.
Before using this template, ensure you have registered your model in MLflow and prepared the PV mapping for your deployment. Please see the original template repository's README for instructions on how to do this.
Important
There is currently no validation implemented for this; user must ensure config matches the LUME-model and that the mapping is defined correctly.
In a Python environment with copier installed, run:
mkdir lume-example-deployment
copier copy gh:slaclab/lume-model-deployment-template lume-example-deployment
cd lume-example-deploymentCopy the pv_mapping.yaml file that you created in step 0 with your PV names and how they map to the model features
to the src/online_model/configs/ directory. If you want output PVs to be written back to EPICS,
ensure that the output PVs are included in this mapping as well.
Create a new repository on GitHub under the slaclab org (e.g., lume-example-deployment). Note that the repository
must be public, otherwise, additional configuration is needed (configuring tokens/authorizations).
Then run:
git init
git add -A
git commit -m "init commit"
git remote add origin https://github.com/slaclab/lume-example-deployment.git
git push --set-upstream origin mainOnce pushed, this will automatically trigger a GitHub Actions workflow to build and push the Docker image to
the GitHub Container Registry under slaclab. Once that's done, you can deploy to your target Kubernetes cluster.
If ArgoCD is already set up, it will automatically deploy the new image.
Caution
The image name is set as the registered model name. If the model has already been deployed before from another
repository under the same name, the image will already exist under the GitHub Container Registry for
slaclab/<other_deployment_name>. In this case, you will need to
either delete the existing image from the registry (after making sure it is no longer being used) or change
the registered name to a new unique name (e.g., if it's a new type of deployment).
To check you deployment, you can go to your MLflow experiment to see if the run is active with no errors, and you can plot the input/output variables under run -> model metrics.
If you have access to the vcluster, you can use kubectl to check the pod name and then print the logs:
kubectl get pods
kubectl logs -f <pod-name>If you want to test the deployment locally before pushing to GitHub, you can build and run the Docker image locally. Make sure you either are on SLAC network or have access to the MLflow server you are using, and that you have Docker installed.
Important
The "test" interface does not connect to EPICS and does not use the pv_mapping or any EPICS related transforms or config. It does not test any I/O interface, but instead generates random values for each input variable from their specified ranges. Therefore, the output values are not meaningful, but this allows you to test the most of the inference run.
To build and run the Docker image locally with the test interface, run:
docker build --build-arg INTERFACE=test -t lume:test .
docker run -e INTERFACE=test -e MODEL_VERSION=1 lume:testYou can also access the container's shell with:
docker run -it lume:test bashIf you want to test with a local MLflow server, you can run in your terminal with MLflow installed and use whatever port is available (e.g., 8082) (see docs):
mlflow server --host 127.0.0.1 --port 8082 --gunicorn-opts --timeout=60Then edit the template_config to set mlflow_tracking_uri="http://127.0.0.1:8082".
For more details, see the lume-model documentation and the Copier documentation.