Release v0.1.0 - Iris · vllm-project/semantic-router

What's Changed

feat: support auto-enable reasoning mode based on intention by @Xunzhuo in #1
fix: remove no needed todo and verify CI by @Xunzhuo in #2
project: add bench and site owners by @Xunzhuo in #4
project: add code of conduct by @Xunzhuo in #5
chore: unify docker images by @Xunzhuo in #6
fix: use the correct go test file name. by @yafengio in #7
ci: disable notify action for now by @Xunzhuo in #10
docs: semantic cache stale types and implementation by @gluonfield in #9
chore: rm readthedocs as its deprecated by @Xunzhuo in #12
Removed redundant / from code img by @tao12345666333 in #13
chore: Update CONTRIBUTING.md by @cryo-zd in #17
chore: add DCO requirement in CONTRIBUTING.md by @cryo-zd in #18
fix(cache): cleanup expired cache entries during update operations by @QIN2DIM in #16
chore(logging): unify the logging method by @ZeroZ-lab in #19
fix:make reasoning effort configurable by @OneZero-Y in #21
docs: add vsr star history diagram by @Xunzhuo in #26
docs: add repo link in CONTRIBUTING.md by @cryo-zd in #27
project: add acknowledgements to huggingface-candle by @Xunzhuo in #28
chore: replace fmt.Printf with log.Printf for logging by @cryo-zd in #29
doc: update workflow to create config.yaml by @rootfs in #30
feat: implement batch classification API by @OneZero-Y in #24
chore: 1) install rust if not present 2) expose bench params in env var by @rootfs in #54
feat: Add comprehensive monitoring metrics for batch classification API by @OneZero-Y in #58
docs: add pre-commit requirement code quality checks to contributing by @OneZero-Y in #60
feat: reasoning model controller by @tao12345666333 in #56
test: add unit tests for getModelFamilyAndTemplateParam by @tao12345666333 in #63
docs: add reasoning model metrics by @tao12345666333 in #64
feat: add test framework for classifier with dependency injection by @aeft in #57
project: add vllm semantic router v0.1 roadmap by @Xunzhuo in #22
test: add unit test around ttft pkg by @yuluo-yx in #68
feat: code polish on classifier by @yuluo-yx in #67
feat: robust model name filter for DeepSeek by @tao12345666333 in #69
fix: correct candle-binding replace path in go.mod files by @aeft in #65
project: add blog section by @Xunzhuo in #70
chore: only run the workflow notify-owners on vllm-project/semantic-router by @liangyuanpeng in #72
feat(observability): structured JSON logs and event fields by @tao12345666333 in #66
chore: Normalize comment punctuation to use English period by @cryo-zd in #79
chore: Use (*OpenAIRouter)(nil) for interface compliance check by @cryo-zd in #77
pricing: add currency label and change the metric name to llm_model_cost_total by @tao12345666333 in #80
test: add go vet to CI by @cryo-zd in #81
feat(logging): adopt zap as unified logging library by @tao12345666333 in #83
docs: add python install setups in install-local by @yuluo-yx in #78
feat(config): watch config file and hot-reload router without restart by @tao12345666333 in #84
chore: remove GPU and model params in config. Backend and model aware optimization will be handled in the control plane by @rootfs in #93
chore: add go mod tidy check by @Xunzhuo in #99
fix: startup config for docker-compose by @liangyuanpeng in #73
fix: don't set reasoning effort for non-reasoning models by @rootfs in #97
chore: add github action badge in README by @yuluo-yx in #102
refactor: use slices.Contains for readability and consistency by @cryo-zd in #104
test: add more test cases and refactor SelectBestModelForCategory/SelectBestModelFromList/InitializeJailbreakClassifier for testability by @aeft in #101
docs: add github action badge for docs index by @yuluo-yx in #103
feat: add milvus persistent storage support by @rootfs in #105
Slight readme changes by @LysandreJik in #25
refactor: move classifier model init to classifier.go and unify the classifier model init logic by @aeft in #113
docs: add eslint check for docs website by @yuluo-yx in #114
Refactor: use worker pool for batch classification concurrency by @cryo-zd in #115
feat: add comprehensive unit tests for entropy-based routing. Tests c… by @rootfs in #112
docs: reasoning quickstart by @tao12345666333 in #110
o11y: Add TTFT and TPOT histograms for SLOs by @tao12345666333 in #126
docs: add markdown lint check and fix md lint style by @yuluo-yx in #117
Feature Enhancement: Batch Inference Support in candle-binding by @OneZero-Y in #71
infra: add yaml lint check and fix yaml style by @yuluo-yx in #131
perf: enable concurrent classification via Arc+clone by @cryo-zd in #127
feat: implement dataset-agnostic router reasoning benchmark by @rootfs in #125
o11y: Add request error counters by @tao12345666333 in #132
logging: unify stdlib log usage to pkg/observability (zap) by @tao12345666333 in #134
fix: add comments for readability by @JaredforReal in #135
docs(installation): update Go version requirement and add test tip for model downloads by @samzong in #146
docs: reorder the quickstart pages by @Xunzhuo in #143
project: add ack for kubernetes by @Xunzhuo in #141
docs: sync blog from official vLLM by @Xunzhuo in #142
infra: refactor makefile by @yuluo-yx in #149
infra: update Dockerfile.extproc by @yuluo-yx in #158
fix: use request id to locate the correct cache entry to update by @aeft in #154
feat: add codespell check and tidy linter check config files by @yuluo-yx in #159
fix: miss copy tools dir in dockerfile by @lengrongfu in #161
metrics: Add request-level token histograms by @tao12345666333 in #157
docs: add repo URL in docker/README.md by @cryo-zd in #163
[Docs] remove discarded fields from documents by @lengrongfu in #165
Correct tools directory copy command in Dockerfile by @yuluo-yx in #171
feat: add basic cache eviction policy: LRU/LFU/FIFO by @aeft in #166
docs: Model Performance Evaluation Guide by @JaredforReal in #136
api: add semantic route support by @Xunzhuo in #147
infra: update Dockerfile.extproc by @yuluo-yx in #169
chore: add just max token for different models in router bench by @rootfs in #137
feat: add more content for contribution docs by @yuluo-yx in #175
fix: avoid double counting cache hits by @cryo-zd in #177
docs(router.md): add error metrics and example queries for llm_request_errors_total by @samzong in #156
docs: add docker compose quickstart by @JaredforReal in #181
docs: add detailed category section by @Xunzhuo in #183
feat: fix precommit container error by @yuluo-yx in #182
feat: update rust version in docs by @yuluo-yx in #176
feat: add v1/models endpoint by @JaredforReal in #186
feat: when run make precommit-local, check container runtime by @yuluo-yx in #187
refactor: move use_reasoning to the model level from the category level to support non-reasoning models by @rootfs in #178
fix: fix the timing of precommit image build by @yuluo-yx in #188
feat: Update .gitignore for AI docs by @JaredforReal in #191
feat: Support generic categories and MMLR-Pro mapping by @tao12345666333 in #192
api: remove unused health-check path in configuration by @Xunzhuo in #201
feat: Implement testing profile with mock vllm in docker compose by @JaredforReal in #190
feat: add validation for vllm endpoint address by @Xunzhuo in #202
feat: add config validation to NewCacheBackend by @cryo-zd in #204
docs: add note around model name consistency by @Xunzhuo in #205
security: add security attributes related to root usage to container definitions by @fcanogab in #214
docs: add run precommit by docker or podman by @yuluo-yx in #218
fix: docker compose testing profile with mock-vllm failed to IPv4 validation by @JaredforReal in #219
docs: network tips by @JaredforReal in #208
feat: set up Grafana and Prometheus for Observability and Monitoring by @JaredforReal in #222
project: add promotion rules by @Xunzhuo in #212
feat: validate eviction policy in cache config by @cryo-zd in #223
docs: add tutorials for semantic cache by @Xunzhuo in #230
docs: refactor and reogranize the contents by @Xunzhuo in #235
docs: k8s quickstart and observability with k8s by @JaredforReal in #225
feat: when run test-vllm, get model from openai models api by @yuluo-yx in #236
infra: cache models in test-and-build GHA by @yuluo-yx in #237
infra: fix models cache GHA by @yuluo-yx in #238
feat: add mock vLLM infrastructure for lightweight e2e testing by @yossiovadia in #228
LLM-Katan Terminal animation demo in the readme files by @yossiovadia in #240
optimize: use openai go sdk ChatCompletion replace map struct by @yuluo-yx in #246
chore: correct misplaced comment for struct UnifiedClassifier by @cryo-zd in #247
fix: LoRA Model Training Configuration and Data Balance by @OneZero-Y in #233
infra: add GHA restore key by @yuluo-yx in #244
perf: optimize FindSimilarTools by early pruning by @cryo-zd in #248
metrics: Add TTFT/TPOT p95 dashboard by @tao12345666333 in #250
feat: enhance terminal demo with improved layout and OpenAI compatibility showcase by @yossiovadia in #249
ci: avoid HF 429 on PRs by caching models and downloading minimal mod… by @tao12345666333 in #252
ci: support running docker-release in upper case user fork by @Xunzhuo in #258
feat: add multi-architecture support for Envoy and Golang by @Aias00 in #264
feat: support domain level auto system prompt injection by @Xunzhuo in #257
Fix: Envoy ext_proc 500 error when both value and raw_value are set in HeaderValue by @ztang2370 in #255
feat: support kubernetes environment by @Xunzhuo in #245
metrics: TTFT in streaming mode by @tao12345666333 in #203
feat: containerize and auto-release llm-katan by @Xunzhuo in #259
test: Add unit test to ensure header mutations only set one of Value or RawValue fields by @ztang2370 in #271
docs(style): add theme switching to the document website by @yuluo-yx in #221
[Docs] Use Docsaurus style for admonitions in install-doc by @windsonsea in #262
feat: support respond vsr decision in header by @Xunzhuo in #273
fix: force install hf_transfer to avoid missing pkg by @rootfs in #287
Update README.md by @yossiovadia in #289
test: add test for ToolsDatabase by @cryo-zd in #284
docs: add mermaid modal by @yuluo-yx in #288
feat: enable E2E testing with LLM Katan - 00-client-request-test by @yossiovadia in #290
feat: implement comprehensive ExtProc testing with cache bypass by @yossiovadia in #292
feat: support /v1/models in direct response by @Xunzhuo in #283
feat: add stream mode support by @AkisAya in #282
feat: support injection system prompt response header by @Xunzhuo in #297
docs: Fix documentation links in README.md by @danchev in #298
feat: add Grafana+Prometheus in k8s by @JaredforReal in #294
chore: update misplaced comments by @cryo-zd in #300
e2e test: 02-router-classification: verify router classification by @yossiovadia in #302
03 classification api test by @yossiovadia in #304
docs: use ts replace js in docs website by @yuluo-yx in #299
feat(infra): enhance Docker workflows with Buildx and QEMU setup by @Aias00 in #307
fix: broken link in readme by @Xunzhuo in #316
feat: add open webui pipe by @Xunzhuo in #315
feat: add system prompt toggle endpoint by @rootfs in #301
Fix/improve batch classification test by @yossiovadia in #319
fix: use unified classifier in intent classification API when available by @yossiovadia in #320
feat: add CI test for k8s core deployment by @JaredforReal in #317
Fix Envoy container health check by replacing wget with curl by @Copilot in #323
Fix API silent failures and add OpenAPI 3.0 spec with Swagger UI by @Copilot in #326
Add OpenTelemetry Distributed Tracing for Fine-Grained Observability by @Copilot in #322
fix: use both unified and legacy classifier to prevent failure by @rootfs in #332
fix: use classification unit test by @rootfs in #333
feat: add comprehensive PII detection test suite by @yossiovadia in #334
Feature/add jailbreak detection test by @yossiovadia in #331
Feature/improve pii extproc testing by @yossiovadia in #335
feat(app): add direct execution support for local development by @FeiDaLI in #341
feat: add reasoning rate & cost & refusal rates by @JaredforReal in #327
perf: optimize FindSimilar by tracking best match by @cryo-zd in #347
docs: container connectivity troubleshooting by @JaredforReal in #346
chore: optimize Docker CI for faster builds and multi-architecture support by @Aias00 in #349
Bench: Add more dataset in router evaluation by @rootfs in #270
fix: enhance llm-katan OpenAI API compatibility for issue #241 by @yossiovadia in #354
Refactor(FindSimiliar): MilvusCache to use Milvus Search API by @srini-abhiram in #352
add wiki article training by @joyful-ii-V-I in #353
chore: fix pre-commit failures in #353 by @rootfs in #357
fix: resolve streaming clients hanging on security blocks (issue #355) by @yossiovadia in #356
feat: add design spec for additional prompt classification by @rootfs in #358
docs: move proposals to site by @Xunzhuo in #361
refactor(headers): centralize custom HTTP headers into dedicated package by @Xunzhuo in #362
feat: refactor observability configs for Compose and add for Local by @JaredforReal in #351
docs: add NVIDIA Dynamo integration proposal by @Xunzhuo in #373
fix: keep memory cache metrics accurate by @cryo-zd in #372
OpenShift Deployment with GPU Support by @yossiovadia in #376
fix: resolve semantic cache hit streaming response format issue by @Xunzhuo in #378
feat: enhance CI pipeline with improved caching and multi-arch support by @Aias00 in #360
refactor(structure): deploy and tools by @JaredforReal in #377
Openshift observability by @yossiovadia in #381
Openshift openwebui integration clean by @yossiovadia in #384
feat: enrich open webui chain of thought by @Xunzhuo in #379
docs: update readme to add open-webui chat demo by @Xunzhuo in #387
chore: clean-up unused diagrams by @Xunzhuo in #386
fix: fix docs website dark theme promoton and team btn not show font bug by @yuluo-yx in #390
feat: add out-of-tree and mcp based classification support by @rootfs in #375
feat: Modern Dashboard MVP by @JaredforReal in #388
feat: support inferencepool v1 by @Xunzhuo in #393
fix: remove log tail limit in validation script for model loading detection by @yossiovadia in #392
docs(config): add accuracy/latency/token-efficiency recipes and guide by @tao12345666333 in #394
feat: publish and release dashboard image by @Xunzhuo in #395
feat(Istio): integrate with Istio gateway via extproc by @srampal in #229
feat: add dashboard landing page by @Xunzhuo in #396
feat: add auto to online demo by @Xunzhuo in #400
docs: Add the tag to the unclear mermaid diagrams by @yuluo-yx in #398
feat(dashboard): add comprehensive configuration editing UI by @Xunzhuo in #402
infra: add tx and tsx support for precommit hook by @yuluo-yx in #403
feat(dashboard): enhance UI with navigation improvements and layout by @Xunzhuo in #405
feat: k8s support and some fixes by @JaredforReal in #407
feat: add topology for vllm dash by @Xunzhuo in #409
project: add publication and talk sections by @Xunzhuo in #206
chore: add rootfs and yuluo-yx as website owners by @yuluo-yx in #399
docs: add missing observability articles to sidebar by @Xunzhuo in #412
refactor(config): move reasoning fields from Category to ModelScore by @Xunzhuo in #414
infra: add golangci lint check by @yuluo-yx in #401
refactor(config): remove models field from vLLM endpoints by @Xunzhuo in #413
fix(make): mark model downloads with .downloaded sentinel (#309) by @samzong in #410
feat: enable system prompt inject from mcp server based classifier by @rootfs in #408
Docs: Add integration proposal for PS and SR by @zerofishnoodles in #418
feat(dashboard): enhance UI with collapsible sidebar, improved monitoring, and docker-compose updates by @Xunzhuo in #422
feat: add mcp classification server doc and example embedding based mcp classification server by @rootfs in #417
fix: fix the torch dependency for doc build by @rootfs in #428
ux: add quickstart script by @Xunzhuo in #424
fix: stop returning expired in-memory cache hits by @cryo-zd in #423
feat: use decoder only model for mcp classification server by @rootfs in #427
feat(website): add YouTube dashboard demo section to homepage by @Xunzhuo in #433
feat: make llm-katan as default in docker compose up by @JaredforReal in #426
doc: add dashboard.md in overview & update README by @JaredforReal in #432
feat(website): add News page with articles about vLLM Semantic Router by @wangchen615 in #435
docs: add tentative bi-weekly community meetings schedule by @wangchen615 in #198
chore(e2e): remove legacy mock/real vLLM test modes and Makefile targets by @samzong in #421
deploy: update docker compose file by @yuluo-yx in #425
feat: add OpenShift demo scripts and documentation by @yossiovadia in #446
fix: add missing files in istio deployment by @srampal in #449
Enhancement: Use milvus vector database for mcp-classifier-server in examples by @JackLCL in #445
fix: CI error & pre-commit & add MiniLM-L12-v2 & docker-compose-down by @JaredforReal in #450
feat: add tracing to docker compose by @JaredforReal in #434
fix: python pre-commit error by @JaredforReal in #458
feat: standardize editor configs for cross-platform development by @yuluo-yx in #456
docs(readme): add Latest News and Previous News sections by @Xunzhuo in #460
feat(website): add new projects to acknowledgements section by @Xunzhuo in #461
fix: README by @JaredforReal in #463
fix:add binary attributes for image files to prevent line ending conversion by @OneZero-Y in #459
fix: fix docker build for the mock-vllm component and wrong vsr_base_url in vLLM Semantic Router Pipe by @carlory in #462
optimize: optimize makefile target help by @yuluo-yx in #455
chore: add docker makefile target help by @yuluo-yx in #467
feat: fine tune qwen3 for knowledge specialization by @rootfs in #447
docs: ddd error prompts when installing VSR using Docker Compose. by @yuluo-yx in #470
Openshift dashboard clean by @yossiovadia in #469
chore: limit make test to minimal model download by @cryo-zd in #472
feat: add support for MoM model name by @Xunzhuo in #474
project: add preview for mom request by @Xunzhuo in #475
feat: add knob for /v1/models to control if respond real models. by @Xunzhuo in #476
chore: Update test description from Math to General by @carlory in #483
feat: add HuggingChat support by @JaredforReal in #477
project: 2025 Q4 roadmap by @Xunzhuo in #487
feat: add shelleck precommit hook by @yuluo-yx in #488
project: add q4 roadmap news by @Xunzhuo in #495
fix missing shellcheck in pre-commit image by @carlory in #497
docs: update contributing docs by @yuluo-yx in #501
feat(demo): enhance OpenShift demo scripts with improved UX by @yossiovadia in #478
fix: fix precommit Argument list too long error by @yuluo-yx in #502
feat: enforce milvus dial timeout if set by @cryo-zd in #503
Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs by @Copilot in #506
Allow semantic cache similarity threshold to be set at the category level by @Copilot in #493
Allow jailbreak detection and threshold to be configured at the category level by @Copilot in #508
Allow PII detection threshold to be set at the category level by @Copilot in #510
Fix: The caller information points to the wrapper function instead of the actual call location by @carlory in #518
feat: Implement hybrid cache that use in-memory index and milvus based doc store by @rootfs in #504
feat: add dashboard & openwebui to k8s deploy by @JaredforReal in #411
refactor: Implement modular candle-binding architecture (#254) by @rootfs in #266
fix:cache test import error by @OneZero-Y in #515
webiste: add scroll top btn by @yuluo-yx in #535
Add more News Blogs by @Xunzhuo in #543
refactor: k8s ci by @JaredforReal in #540
fix(website/news): fix the author name for decoding semantic router blog by @psinghal20 in #544
fix:hnsw heap polarity by @cryo-zd in #550
chore: upgrade rust version to 1.90 in all related Dockerfiles by @carlory in #499
fix: /app/extproc-server: /lib64/libc.so.6: version GLIBC_2.39 not found by @carlory in #551
feat(routing): Implement in-tree keyword-based routing by @srini-abhiram in #546
fix(k8s ci): extend wait windows in the workflow by @JaredforReal in #553
fix: Resolve quickstart script failures and add automated testing by @yehudit1987 in #548
feat(llm-katan): add CPU quantization for faster inference by @yossiovadia in #556
Fix regression to Istio deployment caused by recent commits by @srampal in #558
docs: Add keyword classifier configuration guide by @srini-abhiram in #559
chore: add wikipedia_data to .gitignore by @carlory in #563
docs: update architecture and add req flow by @Xunzhuo in #562
feat: add qwen3 lora adapter support in candle-binding by @rootfs in #549
fix: make command warning & CI pre-commit error by @JaredforReal in #569
docs: fix the display of the mobile menu. by @yuluo-yx in #570
refactor(core): restructure project architecture by @Xunzhuo in #572
refactor(config): reorganize configuration structure with hierarchical grouping by @Xunzhuo in #574
fix: building on non-cuda platforms without nvcc by @NickJLange in #576
refactor(config): restructure config to use nested model objects by @Xunzhuo in #577
paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads by @Xunzhuo in #578
feat(router): add intent-aware LoRA routing support by @Xunzhuo in #579
test(e2e): expand classification coverage and fix cache test issues by @yossiovadia in #585
chore: help command for the makefile rollback by @yuluo-yx in #583
fix: fix of deployment on openshift huggingface cli issues by @cooktheryan in #588
feat(llm-d): integrate vsr with llm-d by @srampal in #589
fix: correct HNSW frontier comparisons in hybrid cache by @cryo-zd in #587
[Docs] Add production stack integration tutorial by @zerofishnoodles in #592
refactor: k8s aigw deploy mode by @Xunzhuo in #597
feat: add integration with vLLM AIBrix by @Xunzhuo in #599
refactor: router core by @Xunzhuo in #601
fix: resolve classify_unified_batch interior mutability issue by @OneZero-Y in #596
fix(tests): resolve skipped BERT similarity model tests (Section 1/5) by @yehudit1987 in #600
fix: resolve LoRA training accuracy regression (issue #584) by @yossiovadia in #590
Add Blog for Modular LoRA by @Xunzhuo in #534
[Blog]: Semantic Tool Selection by @Xunzhuo in #604
feat(website): simplify publications page UI and optimize mobile display by @Xunzhuo in #605
docs: redirect kubernetes installation page to ai-gateway guide by @Xunzhuo in #603
[Docs] Simplify estimation data content by @Xunzhuo in #607
fix(tests): enable all 5 Milvus hybrid cache tests (Section 2/5) by @yehudit1987 in #602
fix: correct yaml linting hook to call yaml-lint instead of markdown-lint by @yossiovadia in #609
feat: add embedding model continuous batching scheduler by @rootfs in #564
Revert "fix: correct yaml linting hook to call yaml-lint instead of markdown-lint" by @rootfs in #610
chore: fix milvus cache unit test by @rootfs in #612
fix: correct yaml linting hook and fix trailing spaces/comment spacing by @yossiovadia in #611
Feat: fix-issue-336: Implement In-Tree Embedding Similarity Matching by @Sophie8 in #606
feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. by @szedan-rh in #593
chore: fix cache unit test by @rootfs in #613
fix:Memory Management in FFI Error Handling by @OneZero-Y in #614
fix: parse Milvus snake_case config fields correctly by @cryo-zd in #616
feat: add helm support deploy support by @yuluo-yx in #532
infra(ci): add GHA exec condition by @yuluo-yx in #619
[Refactor] Remove ClassifyCategory and add embedding classifier config by @Xunzhuo in #620
fix(tests): Enable TestCandleBertTokensWithLabels and expose CI failures (Section 4/5) by @yehudit1987 in #621
[Doc]: update editUrl in docusaurus config to point to the correct website directory by @petecheslock in #622
fix: auto-generate lora_config.json in training script by @yossiovadia in #629
[Doc] Update the llm-d doc wording, use the official llm-d container image by @srampal in #631
test: Improve e2e-classification tests. by @yossiovadia in #630
feat: removes the dependency of once_cell by @htiennv in #633
[Doc] Reorganize intelligent routing tutorials into focused guides by @Xunzhuo in #636
Fix OpenShift Dashboard Playground OpenWebUI Connection by @szedan-rh in #634
fix(openshift): add ChatUI (HuggingChat) deployment with MongoDB support by @szedan-rh in #637
Test: Validate Unified Classifier correctly chooses between LoRA path and Traditional path for inference. by @yossiovadia in #639
[Feat]: VSR + public LLM/ OpenAI + local llm + istio + LLM-d deployment guide by @srampal in #643
ci(helm): add workflow to publish Helm chart to GHCR on merge by @Xunzhuo in #649
fix(helm): remove namespace template to resolve installation conflicts by @Xunzhuo in #651
[Misc] Reduce initial delay for liveness and readiness probes by @Xunzhuo in #652
[Doc] Migrate Helm README to helm-docs format and remove example values files by @Xunzhuo in #653
[Feat] Add automate e2e test framework for extensible integration tests by @Xunzhuo in #655
[Integration]: Add integration with Kserve functionality by @cooktheryan in #566
chore: enhance moderator by @rootfs in #670
Spam filter by @rootfs in #671
chore: refactor spam filter by @rootfs in #672
feat(e2e): enhance setup-only mode and add startup banner by @Xunzhuo in #673
[feat]: Add DeBERTa v3 prompt injection detection support by @yuezhu1 in #674
[CI/Build] Fail e2e tests when accuracy is 0% by @Xunzhuo in #676
✨ feat(helm): add support for extra initContainer env variables. by @samzong in #679
feat: Implement ReDoS-safe regex scanning provider by @srini-abhiram in #644
fix(tests): resolve 3 skipped model directory tests (Section 3/5) by @yehudit1987 in #632
feat: add Jaeger tracing observability to OpenShift deployment by @szedan-rh in #646
[CI/Build] Fix compilerBrokenImport on macOS M1 by @carlory in #682
fix: Grafana monitoring page iframe embedding and dynamic cluster configuration by @szedan-rh in #642
chore: update community meeting calendar by @rootfs in #685
fix: fixed the font display issue on the team page in dark mode. by @yuluo-yx in #689
[Feat]: Signal-Decision Driven Semantic Routing with Dynamic Plugin Architecture by @Xunzhuo in #681
Add E2E tests for keyword routing (Issue #667) by @szedan-rh in #684
feat: Add aibrix profile for E2E testing framework by @yehudit1987 in #688
chore: Delete test_file.txt by @yuluo-yx in #697
infra(precommit): fix md precommit error by @yuluo-yx in #700
📝 docs(gaie): add Gateway API inference extension docs (#664) by @samzong in #677
feat(e2e): Add comprehensive signal-decision engine test coverage by @yehudit1987 in #695
fix(647): enable LoRA PII auto-detection with minimal changes by @yossiovadia in #709
fix(api): expose actual PII confidence scores instead of hardcoded 0.9 by @yossiovadia in #718
[Bugfix] adjust istio config to align with new architecture by @srampal in #711
docs: add SEO config by @yuluo-yx in #719
doc: Fix lost documentation links by adding the missing sidebar entries by @samzong in #721
fix: keep existing InMemory HNSW nodes searchable after eviction by @cryo-zd in #722
📝 doc(architecture): add gateway integrations overview by @samzong in #720
chore: adjust github ci exec condition by @yuluo-yx in #704
fix: Move keyword routing tests to e2e framework and validate matched_keywords by @szedan-rh in #694
fix the ci test for quickstart.sh script, In case we had failure in downloading embeddinggemma-300m, to fallback into minimal models by @szedan-rh in #737
feat: add LLM-D profile for E2E testing framework by @samzong in #705
feat: add RedisVL as new semantic cache storage by @rootfs in #734
docs(installation): update model_config examples and clarify vLLM backend setup by @samzong in #741
docs: add DeepWiki badge to README.md, enable auto refresh. by @samzong in #744
Bugfix: rename server_keyword.py.py to server_keyword.py by @samzong in #745
[feat]Support Qwen/Qwen3Guard-Gen-0.6B for prompt_guard by @yuezhu1 in #748
feat(e2e): add comprehensive E2E test coverage for MCP classifier by @szedan-rh in #743
feat: optimize cache, add checkConnection by @yuluo-yx in #739
[Bugfix]: owner-notification: checkout base repo (not PR head) to eli… by @samzong in #747
feat: Add istio profile for E2E testing framework by @asaadbalum in #728
[Feat] add model-downloader image and CI workflow for ghcr publishing by @samzong in #738
test: Redis CI bootstrap by @cryo-zd in #751
✨ feat(observability): add configurable Prometheus metrics endpoint by @samzong in #740
[Fix] workflow(owner-notification): fix workflow error by @samzong in #756
test(e2e): add embedding signal E2E tests for CRDs by @yehudit1987 in #749
Proposal: add TruthLens for Hallucination Detection and Mitigation by @Xunzhuo in #758
[Misc]: 🔧 chore(ci): simplify precommit-publish workflow by removing nightly date tag generation by @samzong in #753
[Feat] helm: use downloader image and add global.imageRegistry support by @samzong in #759
[chore] Add Qwen3Guard category extraction support by @yuezhu1 in #761
[CI] refactor helm publish workflow fix PR test error by @samzong in #762
fix(pii): resolve inconsistent PII detection for EMAIL_ADDRESS by @yehudit1987 in #765
[CI] feat(ci): Optimize CI workflows with concurrency and path filtering by @samzong in #763
feat: fix podman supporting in docker-compose targets and quickstart.sh by @liavweiss in #772
fix(tests): add CI failure tolerance and fix 4 embedding tests (Section 5/5) by @yehudit1987 in #623
[Feat] Add HuggingFace Spaces playground for semantic router by @Xunzhuo in #779
[CI] 🔧 chore(ci): skip workflows for draft pull requests by @samzong in #776
feat: Add production-stack profile for E2E testing framework by @liavweiss in #767
[Doc] Add Signal-Decision Architecture blog to README news by @Xunzhuo in #783
feat(cache): implement O(1) eviction policies and O(k) TTL cleanup by @asaadbalum in #781
fix(ci): optimize docker integration tests with minimal compose by @noalimoy in #786
fix(dashboard): ensure devDependencies are installed during Docker build by @noalimoy in #780
[Misc] 🔧 chore(kube): generate kind config if missing before cluster creation by @samzong in #775
feat(classifier): enable LoRA auto-detection for intent classification by @yossiovadia in #726
[Feat] add time-windowed endpoint metrics for load balancing by @tao12345666333 in #742
Initial PR for performance test on integration test that running on CI by @szedan-rh in #778
[Doc]: correct minor typos and formatting in documentation files by @wilsonwu in #794
fix(test): correct relative path for PII LoRA model in auto-detection test by @yossiovadia in #788
docs: add redis cache doc to sidebar by @cryo-zd in #795
perf(e2e): reduce test case count to optimize CI execution time by @yossiovadia in #797
[feat] Fact Check Model Training by @yuezhu1 in #810
feat(deployment): add startupProbe for slow model loading by @noalimoy in #809
[Feat] Add reasoning mode evaluation benchmark (Issue #42) by @asaadbalum in #791
Move model storage to the /mnt directory on both the host and the Kin… by @liavweiss in #792
[Feat][Memory] Add OpenAI Response API support by @Xunzhuo in #802
Feat: Add Hallucination Detection Gatekeeper by @Xunzhuo in #799
Fix: ping dep version to make sure integration tests pass by @Xunzhuo in #815
[DOC]✨ feat(milvus): add Milvus deployment into Kubernetes and semantic cache support by @samzong in #773
[Feat]: Add Dynamo E2E test profile with GPU support by @abdallahsamabd in #789
feat(llm-katan): Add Kubernetes deployment support by @noalimoy in #710
Fix the perofrmacne test report by @szedan-rh in #801
feat(classifier): enable LoRA auto-detection for jailbreak classification by @yossiovadia in #812
[Doc] Add new cookbook category and common errors to troubleshooting by @samzong in #818
fix(ci): use minimal models for nightly performance baseline by @szedan-rh in #825
[Feat] Feature: New Python-based Model Manager by @samzong in #820
Add hybrid routing tests, Keyword → Embedding → BERT → MCP by @szedan-rh in #829
Add Entropy testing for reasnoning decision acccording to probabiliti… by @szedan-rh in #833
Disable the peformance comparision agaist baseline, keep just the per… by @szedan-rh in #836
update: Improve Model Manager Configuration and CI Integration by @JaredforReal in #830
[Misc] fix(dashboard): proxy Jaeger /dependencies route by @samzong in #839
Adding new tests for reasoning filter by @szedan-rh in #843
[CI] e2e: add Response API basic operations tests by @tao12345666333 in #826
Sponsor: Add AMD Partnership by @Xunzhuo in #847
feat: add hallucination bench by @rootfs in #838
Test: Add comprehensive tests for PII and TLS utility modules by @JaredforReal in #840
[Misc] [Dashboard/frontend] fix: regenerate package-lock.json with official npm registry by @samzong in #846
Feature: add finance factual benchmark for hallucination detection by @Sophie8 in #851
[Feat] [Dashboard/Frontend] Add configurable port support for Open WebUI iframe by @samzong in #844
[Feat]: add upstream request span and trace context propagation for distributed tracing by @HanFa in #852
refactor: remove unused MappingPath from FactCheckModelConfig by @Xunzhuo in #854
[Bugfix]: StatefulSet readiness detection and add Dynamo demo video by @abdallahsamabd in #856
fix: resolve empty/wrong domain classifications by @yehudit1987 in #827
[Feat] All-in-One Docker image for single-container by @samzong in #845
feat: dashboard playground tab connection failure by @liavweiss in #850
[Misc] 🔧 chore(docker-stack.yml): disable arm64 build in docker-stack workflow due to buildx limitations by @samzong in #859
[Feat] Add dashboard checks and CI workflow by @samzong in #861
fix: refactor documentation and improve clarity across multiple doc files by @wilsonwu in #865
feat(hf-playground): add more models to hf playground by @Xunzhuo in #864
Created comprehensive test coverage in req_filter_tools_test.go with … by @szedan-rh in #848
[CI] ci/optimize e2e profile matrix by @samzong in #870
[CI] fix(ci): remove paths-ignore in integration test dynamic workflow by @samzong in #873
[Feat] Implement VSR CLI tool for better user experience by @srini-abhiram in #824
[CI] Fix curl network errors by switching to official setup actions by @samzong in #860
[Bugfix]: enable kv cache for frontend in disaggregated router deployment and add more categories in classifier by @abdallahsamabd in #869
[Misc] 🔧 chore(build-cli): conditional rust build for build-cli by @samzong in #874
[Misc] extract C float-array conversion helper by @ErikJiang in #883
feat: Fix Playground admin signup: proxy OpenWebUI /workspace+/auth and route /api/v1 via dashboard by @liavweiss in #884
Bugfix: add config validation and fix state mutation by @henschwartz in #880
fix(tsconfig): add ignoreDeprecations option to TypeScript configuration by @wilsonwu in #885
refactor: mom models handling by @Xunzhuo in #862
Fix(CI): pass the dashboard build failures by @Xunzhuo in #887
♻️ refactor(modeldownload): detect and use correct HuggingFace CLI by @samzong in #891
deploy(k8s): remove llmd-base default namespace by @scydas in #892
[CI] fix/llmd auth reviewer binding error and e2e ci-change filter by @samzong in #894
Feat: Add vLLM-SR PYPI Support by @Xunzhuo in #896
refactor(config): simplify external model configuration for guardrails by @Xunzhuo in #899
Feat(core): Add User Feedback Signals Support by @Xunzhuo in #900
feat(dashboard): replace external chat UI with native React component by @asaadbalum in #888
Project: Re-Organize the Layout by @Xunzhuo in #902
Feat(router): add preference-based Routing by @Xunzhuo in #912
[Misc] ✨ feat(website): add react‑icons and use icons on team page by @samzong in #914
Fix dashboard config validation and routing for partial updates (Issue #857) by @henschwartz in #909
[CI] 🔧 chore(ci): move all dockerfile to tools/docker and update Dockerfile paths by @samzong in #915
Docs: Update Outdated Contents by @Xunzhuo in #916
Docs: add hallucination detection guide to content safety tutorials by @Xunzhuo in #919
[Misc] Refactor embedding dimension validation by @ErikJiang in #876
Fix(CI): update decision engine to pass when no decision matched by @Xunzhuo in #923
[Misc] 📝 docs(pr-template): add CLI & Dashboard type to PR template by @samzong in #924
[Dashboard] ♻️ refactor(dashboard): drop OpenWebUI & ChatUI depends for dashboard by @samzong in #920
Chore: clean-up unused files by @Xunzhuo in #926
Feat(dashboard,router): add enhanced UI components and signal tracking by @Xunzhuo in #927
fix: inject chat_template_kwargs=false when use_reasoning is disabled (Qwen3/DeepSeek) by @liavweiss in #890
[Doc]: add NVIDIA Dynamo installation guide by @abdallahsamabd in #931
fix: streaming cache incremental chunks for cache hits + cache streaming responses by @liavweiss in #937
docs: fix memory values in embedding routing performance table by @liavweiss in #939
[CI/Build][Dashboard] Fix OpenShift dashboard build context by @nerdalert in #942
[CI/Build][Dashboard] Update dashboard build to Go 1.24.1 by @nerdalert in #941
feat(dashboard): align dashboard with vllm-sr CLI functionality by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/932
Project: Update Team with New Members by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/945
💄 style(team): prevent company name wrap and fix spacing by @samzong in https://github.com/vllm-project/semantic-router/pull/947
feat(dashboard): corrent CSS class names and CLI command reference by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/944
fix: regenerate response ID and timestamp for cache hits to enable proper observability by @liavweiss in https://github.com/vllm-project/semantic-router/pull/946
Chore: Add alias for Local Models by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/943
fix(dashboard): route chat completions through Envoy proxy by @yehudit1987 in https://github.com/vllm-project/semantic-router/pull/936
fix(cache): initialize embedding models before semantic cache (#928) by @noalimoy in https://github.com/vllm-project/semantic-router/pull/948
Feat: Support Path Suffix for LLM Endpoints by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/949

New Contributors

@yafengio made their first contribution in #7
@gluonfield made their first contribution in #9
@tao12345666333 made their first contribution in #13
@cryo-zd made their first contribution in #17
@QIN2DIM made their first contribution in #16
@ZeroZ-lab made their first contribution in #19
@aeft made their first contribution in #57
@liangyuanpeng made their first contribution in #72
@LysandreJik made their first contribution in #25
@JaredforReal made their first contribution in #135
@samzong made their first contribution in #146
@lengrongfu made their first contribution in #161
@fcanogab made their first contribution in #214
@yossiovadia made their first contribution in #228
@Aias00 made their first contribution in #264
@ztang2370 made their first contribution in #255
@windsonsea made their first contribution in #262
@AkisAya made their first contribution in #282
@danchev made their first contribution in #298
@Copilot made their first contribution in #323
@FeiDaLI made their first contribution in #341
@srini-abhiram made their first contribution in #352
@joyful-ii-V-I made their first contribution in #353
@srampal made their first contribution in #229
@zerofishnoodles made their first contribution in #418
@wangchen615 made their first contribution in #435
@JackLCL made their first contribution in #445
@psinghal20 made their first contribution in #544
@yehudit1987 made their first contribution in #548
@NickJLange made their first contribution in #576
@cooktheryan made their first contribution in #588
@Sophie8 made their first contribution in #606
@szedan-rh made their first contribution in #593
@petecheslock made their first contribution in #622
@htiennv made their first contribution in #633
@yuezhu1 made their first contribution in #674
@asaadbalum made their first contribution in #728
@liavweiss made their first contribution in #772
@noalimoy made their first contribution in #786
@wilsonwu made their first contribution in #794
@abdallahsamabd made their first contribution in #789
@HanFa made their first contribution in #852
@ErikJiang made their first contribution in #883
@henschwartz made their first contribution in #880
@scydas made their first contribution in #892
@nerdalert made their first contribution in #942

Full Changelog: https://github.com/vllm-project/semantic-router/commits/v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.0 - Iris

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!