Shepherd Model Gateway

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

Why SMG?


🚀 Maximize GPU Utilization	Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend	Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed	Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control	Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability	40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted	Cloud Providers
vLLM	OpenAI
SGLang	Anthropic
TensorRT-LLM	Google Gemini
Ollama	AWS Bedrock
Any OpenAI-compatible server	Azure OpenAI

Features

Feature	Description
8 Routing Policies	cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline	Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration	Connect external tool servers via Model Context Protocol
High Availability	Mesh networking with SWIM protocol for multi-node deployments
Chat History	Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins	Extend with custom WebAssembly logic
Resilience	Circuit breakers, retries with backoff, rate limiting

Documentation


Getting Started	Installation and first steps
Architecture	How SMG works
Configuration	CLI reference and options
API Reference	OpenAI-compatible endpoints
Kubernetes Setup	In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,103 Commits
.cargo		.cargo
.github		.github
auth		auth
bindings		bindings
clients		clients
data_connector		data_connector
docker		docker
docs		docs
e2e_test		e2e_test
examples/wasm		examples/wasm
grpc_client		grpc_client
kv_index		kv_index
mcp		mcp
mesh		mesh
model_gateway		model_gateway
multimodal		multimodal
protocols		protocols
reasoning_parser		reasoning_parser
scripts		scripts
tokenizer		tokenizer
tool_parser		tool_parser
wasm		wasm
workflow		workflow
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
clippy.toml		clippy.toml
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
pytest.ini		pytest.ini
ruff.toml		ruff.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shepherd Model Gateway

Why SMG?

Quick Start

Supported Backends

Features

Documentation

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

lightseekorg/smg

Folders and files

Latest commit

History

Repository files navigation

Shepherd Model Gateway

Why SMG?

Quick Start

Supported Backends

Features

Documentation

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages