🍱 BentoML: The Unified Serving Framework for AI Systems
BentoML is a Python library for building online serving systems optimized for AI apps and model inference. It supports serving any model format/runtime and custom Python code, offering the key primitives for serving optimizations, task queues, batching, multi-model chains, distributed orchestration, and multi-GPU serving.
🦾 OpenLLM: Self-hosting Large Language Models Made Easy
Run any open-source LLMs (Llama 3.1, Qwen2, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference performance, and a simplified workflow for production-grade cloud deployment.
☁️ BentoCloud: Fast and scalable infrastructure for building and scaling with BentoML on the cloud
BentoCloud is the complete platform for enterprise AI teams to build and scale Compound AI systems. It brings cutting-edge AI infrastructure into your cloud environment, enabling AI teams to run inference with unparalleled efficiency, rapidly iterate on system design, and effortlessly scale in production with full observability.
👀 Follow us on X @bentomlai and LinkedIn
📖 Read our blog