What's Changed
- feat: support auto-enable reasoning mode based on intention by @Xunzhuo in #1
- fix: remove no needed todo and verify CI by @Xunzhuo in #2
- project: add bench and site owners by @Xunzhuo in #4
- project: add code of conduct by @Xunzhuo in #5
- chore: unify docker images by @Xunzhuo in #6
- fix: use the correct go test file name. by @yafengio in #7
- ci: disable notify action for now by @Xunzhuo in #10
- docs: semantic cache stale types and implementation by @gluonfield in #9
- chore: rm readthedocs as its deprecated by @Xunzhuo in #12
- Removed redundant / from code img by @tao12345666333 in #13
- chore: Update CONTRIBUTING.md by @cryo-zd in #17
- chore: add DCO requirement in CONTRIBUTING.md by @cryo-zd in #18
- fix(cache): cleanup expired cache entries during update operations by @QIN2DIM in #16
- chore(logging): unify the logging method by @ZeroZ-lab in #19
- fix:make reasoning effort configurable by @OneZero-Y in #21
- docs: add vsr star history diagram by @Xunzhuo in #26
- docs: add repo link in CONTRIBUTING.md by @cryo-zd in #27
- project: add acknowledgements to huggingface-candle by @Xunzhuo in #28
- chore: replace fmt.Printf with log.Printf for logging by @cryo-zd in #29
- doc: update workflow to create config.yaml by @rootfs in #30
- feat: implement batch classification API by @OneZero-Y in #24
- chore: 1) install rust if not present 2) expose bench params in env var by @rootfs in #54
- feat: Add comprehensive monitoring metrics for batch classification API by @OneZero-Y in #58
- docs: add pre-commit requirement code quality checks to contributing by @OneZero-Y in #60
- feat: reasoning model controller by @tao12345666333 in #56
- test: add unit tests for getModelFamilyAndTemplateParam by @tao12345666333 in #63
- docs: add reasoning model metrics by @tao12345666333 in #64
- feat: add test framework for classifier with dependency injection by @aeft in #57
- project: add vllm semantic router v0.1 roadmap by @Xunzhuo in #22
- test: add unit test around ttft pkg by @yuluo-yx in #68
- feat: code polish on classifier by @yuluo-yx in #67
- feat: robust model name filter for DeepSeek by @tao12345666333 in #69
- fix: correct candle-binding replace path in go.mod files by @aeft in #65
- project: add blog section by @Xunzhuo in #70
- chore: only run the workflow notify-owners on vllm-project/semantic-router by @liangyuanpeng in #72
- feat(observability): structured JSON logs and event fields by @tao12345666333 in #66
- chore: Normalize comment punctuation to use English period by @cryo-zd in #79
- chore: Use (*OpenAIRouter)(nil) for interface compliance check by @cryo-zd in #77
- pricing: add currency label and change the metric name to llm_model_cost_total by @tao12345666333 in #80
- test: add go vet to CI by @cryo-zd in #81
- feat(logging): adopt zap as unified logging library by @tao12345666333 in #83
- docs: add python install setups in install-local by @yuluo-yx in #78
- feat(config): watch config file and hot-reload router without restart by @tao12345666333 in #84
- chore: remove GPU and model params in config. Backend and model aware optimization will be handled in the control plane by @rootfs in #93
- chore: add go mod tidy check by @Xunzhuo in #99
- fix: startup config for docker-compose by @liangyuanpeng in #73
- fix: don't set reasoning effort for non-reasoning models by @rootfs in #97
- chore: add github action badge in README by @yuluo-yx in #102
- refactor: use slices.Contains for readability and consistency by @cryo-zd in #104
- test: add more test cases and refactor SelectBestModelForCategory/SelectBestModelFromList/InitializeJailbreakClassifier for testability by @aeft in #101
- docs: add github action badge for docs index by @yuluo-yx in #103
- feat: add milvus persistent storage support by @rootfs in #105
- Slight readme changes by @LysandreJik in #25
- refactor: move classifier model init to classifier.go and unify the classifier model init logic by @aeft in #113
- docs: add eslint check for docs website by @yuluo-yx in #114
- Refactor: use worker pool for batch classification concurrency by @cryo-zd in #115
- feat: add comprehensive unit tests for entropy-based routing. Tests c… by @rootfs in #112
- docs: reasoning quickstart by @tao12345666333 in #110
- o11y: Add TTFT and TPOT histograms for SLOs by @tao12345666333 in #126
- docs: add markdown lint check and fix md lint style by @yuluo-yx in #117
- Feature Enhancement: Batch Inference Support in candle-binding by @OneZero-Y in #71
- infra: add yaml lint check and fix yaml style by @yuluo-yx in #131
- perf: enable concurrent classification via Arc+clone by @cryo-zd in #127
- feat: implement dataset-agnostic router reasoning benchmark by @rootfs in #125
- o11y: Add request error counters by @tao12345666333 in #132
- logging: unify stdlib log usage to pkg/observability (zap) by @tao12345666333 in #134
- fix: add comments for readability by @JaredforReal in #135
- docs(installation): update Go version requirement and add test tip for model downloads by @samzong in #146
- docs: reorder the quickstart pages by @Xunzhuo in #143
- project: add ack for kubernetes by @Xunzhuo in #141
- docs: sync blog from official vLLM by @Xunzhuo in #142
- infra: refactor makefile by @yuluo-yx in #149
- infra: update Dockerfile.extproc by @yuluo-yx in #158
- fix: use request id to locate the correct cache entry to update by @aeft in #154
- feat: add codespell check and tidy linter check config files by @yuluo-yx in #159
- fix: miss copy tools dir in dockerfile by @lengrongfu in #161
- metrics: Add request-level token histograms by @tao12345666333 in #157
- docs: add repo URL in docker/README.md by @cryo-zd in #163
- [Docs] remove discarded fields from documents by @lengrongfu in #165
- Correct tools directory copy command in Dockerfile by @yuluo-yx in #171
- feat: add basic cache eviction policy: LRU/LFU/FIFO by @aeft in #166
- docs: Model Performance Evaluation Guide by @JaredforReal in #136
- api: add semantic route support by @Xunzhuo in #147
- infra: update Dockerfile.extproc by @yuluo-yx in #169
- chore: add just max token for different models in router bench by @rootfs in #137
- feat: add more content for contribution docs by @yuluo-yx in #175
- fix: avoid double counting cache hits by @cryo-zd in #177
- docs(router.md): add error metrics and example queries for llm_request_errors_total by @samzong in #156
- docs: add docker compose quickstart by @JaredforReal in #181
- docs: add detailed category section by @Xunzhuo in #183
- feat: fix precommit container error by @yuluo-yx in #182
- feat: update rust version in docs by @yuluo-yx in #176
- feat: add v1/models endpoint by @JaredforReal in #186
- feat: when run make precommit-local, check container runtime by @yuluo-yx in #187
- refactor: move use_reasoning to the model level from the category level to support non-reasoning models by @rootfs in #178
- fix: fix the timing of precommit image build by @yuluo-yx in #188
- feat: Update .gitignore for AI docs by @JaredforReal in #191
- feat: Support generic categories and MMLR-Pro mapping by @tao12345666333 in #192
- api: remove unused health-check path in configuration by @Xunzhuo in #201
- feat: Implement testing profile with mock vllm in docker compose by @JaredforReal in #190
- feat: add validation for vllm endpoint address by @Xunzhuo in #202
- feat: add config validation to NewCacheBackend by @cryo-zd in #204
- docs: add note around model name consistency by @Xunzhuo in #205
- security: add security attributes related to root usage to container definitions by @fcanogab in #214
- docs: add run precommit by docker or podman by @yuluo-yx in #218
- fix: docker compose testing profile with mock-vllm failed to IPv4 validation by @JaredforReal in #219
- docs: network tips by @JaredforReal in #208
- feat: set up Grafana and Prometheus for Observability and Monitoring by @JaredforReal in #222
- project: add promotion rules by @Xunzhuo in #212
- feat: validate eviction policy in cache config by @cryo-zd in #223
- docs: add tutorials for semantic cache by @Xunzhuo in #230
- docs: refactor and reogranize the contents by @Xunzhuo in #235
- docs: k8s quickstart and observability with k8s by @JaredforReal in #225
- feat: when run test-vllm, get model from openai models api by @yuluo-yx in #236
- infra: cache models in test-and-build GHA by @yuluo-yx in #237
- infra: fix models cache GHA by @yuluo-yx in #238
- feat: add mock vLLM infrastructure for lightweight e2e testing by @yossiovadia in #228
- LLM-Katan Terminal animation demo in the readme files by @yossiovadia in #240
- optimize: use openai go sdk ChatCompletion replace map struct by @yuluo-yx in #246
- chore: correct misplaced comment for struct UnifiedClassifier by @cryo-zd in #247
- fix: LoRA Model Training Configuration and Data Balance by @OneZero-Y in #233
- infra: add GHA restore key by @yuluo-yx in #244
- perf: optimize FindSimilarTools by early pruning by @cryo-zd in #248
- metrics: Add TTFT/TPOT p95 dashboard by @tao12345666333 in #250
- feat: enhance terminal demo with improved layout and OpenAI compatibility showcase by @yossiovadia in #249
- ci: avoid HF 429 on PRs by caching models and downloading minimal mod… by @tao12345666333 in #252
- ci: support running docker-release in upper case user fork by @Xunzhuo in #258
- feat: add multi-architecture support for Envoy and Golang by @Aias00 in #264
- feat: support domain level auto system prompt injection by @Xunzhuo in #257
- Fix: Envoy ext_proc 500 error when both value and raw_value are set in HeaderValue by @ztang2370 in #255
- feat: support kubernetes environment by @Xunzhuo in #245
- metrics: TTFT in streaming mode by @tao12345666333 in #203
- feat: containerize and auto-release llm-katan by @Xunzhuo in #259
- test: Add unit test to ensure header mutations only set one of Value or RawValue fields by @ztang2370 in #271
- docs(style): add theme switching to the document website by @yuluo-yx in #221
- [Docs] Use Docsaurus style for admonitions in install-doc by @windsonsea in #262
- feat: support respond vsr decision in header by @Xunzhuo in #273
- fix: force install hf_transfer to avoid missing pkg by @rootfs in #287
- Update README.md by @yossiovadia in #289
- test: add test for ToolsDatabase by @cryo-zd in #284
- docs: add mermaid modal by @yuluo-yx in #288
- feat: enable E2E testing with LLM Katan - 00-client-request-test by @yossiovadia in #290
- feat: implement comprehensive ExtProc testing with cache bypass by @yossiovadia in #292
- feat: support /v1/models in direct response by @Xunzhuo in #283
- feat: add stream mode support by @AkisAya in #282
- feat: support injection system prompt response header by @Xunzhuo in #297
- docs: Fix documentation links in README.md by @danchev in #298
- feat: add Grafana+Prometheus in k8s by @JaredforReal in #294
- chore: update misplaced comments by @cryo-zd in #300
- e2e test: 02-router-classification: verify router classification by @yossiovadia in #302
- 03 classification api test by @yossiovadia in #304
- docs: use ts replace js in docs website by @yuluo-yx in #299
- feat(infra): enhance Docker workflows with Buildx and QEMU setup by @Aias00 in #307
- fix: broken link in readme by @Xunzhuo in #316
- feat: add open webui pipe by @Xunzhuo in #315
- feat: add system prompt toggle endpoint by @rootfs in #301
- Fix/improve batch classification test by @yossiovadia in #319
- fix: use unified classifier in intent classification API when available by @yossiovadia in #320
- feat: add CI test for k8s core deployment by @JaredforReal in #317
- Fix Envoy container health check by replacing wget with curl by @Copilot in #323
- Fix API silent failures and add OpenAPI 3.0 spec with Swagger UI by @Copilot in #326
- Add OpenTelemetry Distributed Tracing for Fine-Grained Observability by @Copilot in #322
- fix: use both unified and legacy classifier to prevent failure by @rootfs in #332
- fix: use classification unit test by @rootfs in #333
- feat: add comprehensive PII detection test suite by @yossiovadia in #334
- Feature/add jailbreak detection test by @yossiovadia in #331
- Feature/improve pii extproc testing by @yossiovadia in #335
- feat(app): add direct execution support for local development by @FeiDaLI in #341
- feat: add reasoning rate & cost & refusal rates by @JaredforReal in #327
- perf: optimize FindSimilar by tracking best match by @cryo-zd in #347
- docs: container connectivity troubleshooting by @JaredforReal in #346
- chore: optimize Docker CI for faster builds and multi-architecture support by @Aias00 in #349
- Bench: Add more dataset in router evaluation by @rootfs in #270
- fix: enhance llm-katan OpenAI API compatibility for issue #241 by @yossiovadia in #354
- Refactor(FindSimiliar): MilvusCache to use Milvus Search API by @srini-abhiram in #352
- add wiki article training by @joyful-ii-V-I in #353
- chore: fix pre-commit failures in #353 by @rootfs in #357
- fix: resolve streaming clients hanging on security blocks (issue #355) by @yossiovadia in #356
- feat: add design spec for additional prompt classification by @rootfs in #358
- docs: move proposals to site by @Xunzhuo in #361
- refactor(headers): centralize custom HTTP headers into dedicated package by @Xunzhuo in #362
- feat: refactor observability configs for Compose and add for Local by @JaredforReal in #351
- docs: add NVIDIA Dynamo integration proposal by @Xunzhuo in #373
- fix: keep memory cache metrics accurate by @cryo-zd in #372
- OpenShift Deployment with GPU Support by @yossiovadia in #376
- fix: resolve semantic cache hit streaming response format issue by @Xunzhuo in #378
- feat: enhance CI pipeline with improved caching and multi-arch support by @Aias00 in #360
- refactor(structure): deploy and tools by @JaredforReal in #377
- Openshift observability by @yossiovadia in #381
- Openshift openwebui integration clean by @yossiovadia in #384
- feat: enrich open webui chain of thought by @Xunzhuo in #379
- docs: update readme to add open-webui chat demo by @Xunzhuo in #387
- chore: clean-up unused diagrams by @Xunzhuo in #386
- fix: fix docs website dark theme promoton and team btn not show font bug by @yuluo-yx in #390
- feat: add out-of-tree and mcp based classification support by @rootfs in #375
- feat: Modern Dashboard MVP by @JaredforReal in #388
- feat: support inferencepool v1 by @Xunzhuo in #393
- fix: remove log tail limit in validation script for model loading detection by @yossiovadia in #392
- docs(config): add accuracy/latency/token-efficiency recipes and guide by @tao12345666333 in #394
- feat: publish and release dashboard image by @Xunzhuo in #395
- feat(Istio): integrate with Istio gateway via extproc by @srampal in #229
- feat: add dashboard landing page by @Xunzhuo in #396
- feat: add auto to online demo by @Xunzhuo in #400
- docs: Add the tag to the unclear mermaid diagrams by @yuluo-yx in #398
- feat(dashboard): add comprehensive configuration editing UI by @Xunzhuo in #402
- infra: add tx and tsx support for precommit hook by @yuluo-yx in #403
- feat(dashboard): enhance UI with navigation improvements and layout by @Xunzhuo in #405
- feat: k8s support and some fixes by @JaredforReal in #407
- feat: add topology for vllm dash by @Xunzhuo in #409
- project: add publication and talk sections by @Xunzhuo in #206
- chore: add rootfs and yuluo-yx as website owners by @yuluo-yx in #399
- docs: add missing observability articles to sidebar by @Xunzhuo in #412
- refactor(config): move reasoning fields from Category to ModelScore by @Xunzhuo in #414
- infra: add golangci lint check by @yuluo-yx in #401
- refactor(config): remove models field from vLLM endpoints by @Xunzhuo in #413
- fix(make): mark model downloads with .downloaded sentinel (#309) by @samzong in #410
- feat: enable system prompt inject from mcp server based classifier by @rootfs in #408
- Docs: Add integration proposal for PS and SR by @zerofishnoodles in #418
- feat(dashboard): enhance UI with collapsible sidebar, improved monitoring, and docker-compose updates by @Xunzhuo in #422
- feat: add mcp classification server doc and example embedding based mcp classification server by @rootfs in #417
- fix: fix the torch dependency for doc build by @rootfs in #428
- ux: add quickstart script by @Xunzhuo in #424
- fix: stop returning expired in-memory cache hits by @cryo-zd in #423
- feat: use decoder only model for mcp classification server by @rootfs in #427
- feat(website): add YouTube dashboard demo section to homepage by @Xunzhuo in #433
- feat: make llm-katan as default in docker compose up by @JaredforReal in #426
- doc: add dashboard.md in overview & update README by @JaredforReal in #432
- feat(website): add News page with articles about vLLM Semantic Router by @wangchen615 in #435
- docs: add tentative bi-weekly community meetings schedule by @wangchen615 in #198
- chore(e2e): remove legacy mock/real vLLM test modes and Makefile targets by @samzong in #421
- deploy: update docker compose file by @yuluo-yx in #425
- feat: add OpenShift demo scripts and documentation by @yossiovadia in #446
- fix: add missing files in istio deployment by @srampal in #449
- Enhancement: Use milvus vector database for mcp-classifier-server in examples by @JackLCL in #445
- fix: CI error & pre-commit & add MiniLM-L12-v2 & docker-compose-down by @JaredforReal in #450
- feat: add tracing to docker compose by @JaredforReal in #434
- fix: python pre-commit error by @JaredforReal in #458
- feat: standardize editor configs for cross-platform development by @yuluo-yx in #456
- docs(readme): add Latest News and Previous News sections by @Xunzhuo in #460
- feat(website): add new projects to acknowledgements section by @Xunzhuo in #461
- fix: README by @JaredforReal in #463
- fix:add binary attributes for image files to prevent line ending conversion by @OneZero-Y in #459
- fix: fix docker build for the mock-vllm component and wrong vsr_base_url in vLLM Semantic Router Pipe by @carlory in #462
- optimize: optimize makefile target help by @yuluo-yx in #455
- chore: add docker makefile target help by @yuluo-yx in #467
- feat: fine tune qwen3 for knowledge specialization by @rootfs in #447
- docs: ddd error prompts when installing VSR using Docker Compose. by @yuluo-yx in #470
- Openshift dashboard clean by @yossiovadia in #469
- chore: limit make test to minimal model download by @cryo-zd in #472
- feat: add support for MoM model name by @Xunzhuo in #474
- project: add preview for mom request by @Xunzhuo in #475
- feat: add knob for /v1/models to control if respond real models. by @Xunzhuo in #476
- chore: Update test description from Math to General by @carlory in #483
- feat: add HuggingChat support by @JaredforReal in #477
- project: 2025 Q4 roadmap by @Xunzhuo in #487
- feat: add shelleck precommit hook by @yuluo-yx in #488
- project: add q4 roadmap news by @Xunzhuo in #495
- fix missing shellcheck in pre-commit image by @carlory in #497
- docs: update contributing docs by @yuluo-yx in #501
- feat(demo): enhance OpenShift demo scripts with improved UX by @yossiovadia in #478
- fix: fix precommit Argument list too long error by @yuluo-yx in #502
- feat: enforce milvus dial timeout if set by @cryo-zd in #503
- Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs by @Copilot in #506
- Allow semantic cache similarity threshold to be set at the category level by @Copilot in #493
- Allow jailbreak detection and threshold to be configured at the category level by @Copilot in #508
- Allow PII detection threshold to be set at the category level by @Copilot in #510
- Fix: The caller information points to the wrapper function instead of the actual call location by @carlory in #518
- feat: Implement hybrid cache that use in-memory index and milvus based doc store by @rootfs in #504
- feat: add dashboard & openwebui to k8s deploy by @JaredforReal in #411
- refactor: Implement modular candle-binding architecture (#254) by @rootfs in #266
- fix:cache test import error by @OneZero-Y in #515
- webiste: add scroll top btn by @yuluo-yx in #535
- Add more News Blogs by @Xunzhuo in #543
- refactor: k8s ci by @JaredforReal in #540
- fix(website/news): fix the author name for decoding semantic router blog by @psinghal20 in #544
- fix:hnsw heap polarity by @cryo-zd in #550
- chore: upgrade rust version to 1.90 in all related Dockerfiles by @carlory in #499
- fix: /app/extproc-server: /lib64/libc.so.6: version GLIBC_2.39 not found by @carlory in #551
- feat(routing): Implement in-tree keyword-based routing by @srini-abhiram in #546
- fix(k8s ci): extend wait windows in the workflow by @JaredforReal in #553
- fix: Resolve quickstart script failures and add automated testing by @yehudit1987 in #548
- feat(llm-katan): add CPU quantization for faster inference by @yossiovadia in #556
- Fix regression to Istio deployment caused by recent commits by @srampal in #558
- docs: Add keyword classifier configuration guide by @srini-abhiram in #559
- chore: add wikipedia_data to .gitignore by @carlory in #563
- docs: update architecture and add req flow by @Xunzhuo in #562
- feat: add qwen3 lora adapter support in candle-binding by @rootfs in #549
- fix: make command warning & CI pre-commit error by @JaredforReal in #569
- docs: fix the display of the mobile menu. by @yuluo-yx in #570
- refactor(core): restructure project architecture by @Xunzhuo in #572
- refactor(config): reorganize configuration structure with hierarchical grouping by @Xunzhuo in #574
- fix: building on non-cuda platforms without nvcc by @NickJLange in #576
- refactor(config): restructure config to use nested model objects by @Xunzhuo in #577
- paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads by @Xunzhuo in #578
- feat(router): add intent-aware LoRA routing support by @Xunzhuo in #579
- test(e2e): expand classification coverage and fix cache test issues by @yossiovadia in #585
- chore: help command for the makefile rollback by @yuluo-yx in #583
- fix: fix of deployment on openshift huggingface cli issues by @cooktheryan in #588
- feat(llm-d): integrate vsr with llm-d by @srampal in #589
- fix: correct HNSW frontier comparisons in hybrid cache by @cryo-zd in #587
- [Docs] Add production stack integration tutorial by @zerofishnoodles in #592
- refactor: k8s aigw deploy mode by @Xunzhuo in #597
- feat: add integration with vLLM AIBrix by @Xunzhuo in #599
- refactor: router core by @Xunzhuo in #601
- fix: resolve classify_unified_batch interior mutability issue by @OneZero-Y in #596
- fix(tests): resolve skipped BERT similarity model tests (Section 1/5) by @yehudit1987 in #600
- fix: resolve LoRA training accuracy regression (issue #584) by @yossiovadia in #590
- Add Blog for Modular LoRA by @Xunzhuo in #534
- [Blog]: Semantic Tool Selection by @Xunzhuo in #604
- feat(website): simplify publications page UI and optimize mobile display by @Xunzhuo in #605
- docs: redirect kubernetes installation page to ai-gateway guide by @Xunzhuo in #603
- [Docs] Simplify estimation data content by @Xunzhuo in #607
- fix(tests): enable all 5 Milvus hybrid cache tests (Section 2/5) by @yehudit1987 in #602
- fix: correct yaml linting hook to call yaml-lint instead of markdown-lint by @yossiovadia in #609
- feat: add embedding model continuous batching scheduler by @rootfs in #564
- Revert "fix: correct yaml linting hook to call yaml-lint instead of markdown-lint" by @rootfs in #610
- chore: fix milvus cache unit test by @rootfs in #612
- fix: correct yaml linting hook and fix trailing spaces/comment spacing by @yossiovadia in #611
- Feat: fix-issue-336: Implement In-Tree Embedding Similarity Matching by @Sophie8 in #606
- feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. by @szedan-rh in #593
- chore: fix cache unit test by @rootfs in #613
- fix:Memory Management in FFI Error Handling by @OneZero-Y in #614
- fix: parse Milvus snake_case config fields correctly by @cryo-zd in #616
- feat: add helm support deploy support by @yuluo-yx in #532
- infra(ci): add GHA exec condition by @yuluo-yx in #619
- [Refactor] Remove ClassifyCategory and add embedding classifier config by @Xunzhuo in #620
- fix(tests): Enable TestCandleBertTokensWithLabels and expose CI failures (Section 4/5) by @yehudit1987 in #621
- [Doc]: update editUrl in docusaurus config to point to the correct website directory by @petecheslock in #622
- fix: auto-generate lora_config.json in training script by @yossiovadia in #629
- [Doc] Update the llm-d doc wording, use the official llm-d container image by @srampal in #631
- test: Improve e2e-classification tests. by @yossiovadia in #630
- feat: removes the dependency of
once_cellby @htiennv in #633 - [Doc] Reorganize intelligent routing tutorials into focused guides by @Xunzhuo in #636
- Fix OpenShift Dashboard Playground OpenWebUI Connection by @szedan-rh in #634
- fix(openshift): add ChatUI (HuggingChat) deployment with MongoDB support by @szedan-rh in #637
- Test: Validate Unified Classifier correctly chooses between LoRA path and Traditional path for inference. by @yossiovadia in #639
- [Feat]: VSR + public LLM/ OpenAI + local llm + istio + LLM-d deployment guide by @srampal in #643
- ci(helm): add workflow to publish Helm chart to GHCR on merge by @Xunzhuo in #649
- fix(helm): remove namespace template to resolve installation conflicts by @Xunzhuo in #651
- [Misc] Reduce initial delay for liveness and readiness probes by @Xunzhuo in #652
- [Doc] Migrate Helm README to helm-docs format and remove example values files by @Xunzhuo in #653
- [Feat] Add automate e2e test framework for extensible integration tests by @Xunzhuo in #655
- [Integration]: Add integration with Kserve functionality by @cooktheryan in #566
- chore: enhance moderator by @rootfs in #670
- Spam filter by @rootfs in #671
- chore: refactor spam filter by @rootfs in #672
- feat(e2e): enhance setup-only mode and add startup banner by @Xunzhuo in #673
- [feat]: Add DeBERTa v3 prompt injection detection support by @yuezhu1 in #674
- [CI/Build] Fail e2e tests when accuracy is 0% by @Xunzhuo in #676
- ✨ feat(helm): add support for extra initContainer env variables. by @samzong in #679
- feat: Implement ReDoS-safe regex scanning provider by @srini-abhiram in #644
- fix(tests): resolve 3 skipped model directory tests (Section 3/5) by @yehudit1987 in #632
- feat: add Jaeger tracing observability to OpenShift deployment by @szedan-rh in #646
- [CI/Build] Fix compilerBrokenImport on macOS M1 by @carlory in #682
- fix: Grafana monitoring page iframe embedding and dynamic cluster configuration by @szedan-rh in #642
- chore: update community meeting calendar by @rootfs in #685
- fix: fixed the font display issue on the team page in dark mode. by @yuluo-yx in #689
- [Feat]: Signal-Decision Driven Semantic Routing with Dynamic Plugin Architecture by @Xunzhuo in #681
- Add E2E tests for keyword routing (Issue #667) by @szedan-rh in #684
- feat: Add aibrix profile for E2E testing framework by @yehudit1987 in #688
- chore: Delete test_file.txt by @yuluo-yx in #697
- infra(precommit): fix md precommit error by @yuluo-yx in #700
- 📝 docs(gaie): add Gateway API inference extension docs (#664) by @samzong in #677
- feat(e2e): Add comprehensive signal-decision engine test coverage by @yehudit1987 in #695
- fix(647): enable LoRA PII auto-detection with minimal changes by @yossiovadia in #709
- fix(api): expose actual PII confidence scores instead of hardcoded 0.9 by @yossiovadia in #718
- [Bugfix] adjust istio config to align with new architecture by @srampal in #711
- docs: add SEO config by @yuluo-yx in #719
- doc: Fix lost documentation links by adding the missing sidebar entries by @samzong in #721
- fix: keep existing InMemory HNSW nodes searchable after eviction by @cryo-zd in #722
- 📝 doc(architecture): add gateway integrations overview by @samzong in #720
- chore: adjust github ci exec condition by @yuluo-yx in #704
- fix: Move keyword routing tests to e2e framework and validate matched_keywords by @szedan-rh in #694
- fix the ci test for quickstart.sh script, In case we had failure in downloading embeddinggemma-300m, to fallback into minimal models by @szedan-rh in #737
- feat: add LLM-D profile for E2E testing framework by @samzong in #705
- feat: add RedisVL as new semantic cache storage by @rootfs in #734
- docs(installation): update model_config examples and clarify vLLM backend setup by @samzong in #741
- docs: add DeepWiki badge to README.md, enable auto refresh. by @samzong in #744
- Bugfix: rename server_keyword.py.py to server_keyword.py by @samzong in #745
- [feat]Support Qwen/Qwen3Guard-Gen-0.6B for prompt_guard by @yuezhu1 in #748
- feat(e2e): add comprehensive E2E test coverage for MCP classifier by @szedan-rh in #743
- feat: optimize cache, add checkConnection by @yuluo-yx in #739
- [Bugfix]: owner-notification: checkout base repo (not PR head) to eli… by @samzong in #747
- feat: Add istio profile for E2E testing framework by @asaadbalum in #728
- [Feat] add model-downloader image and CI workflow for ghcr publishing by @samzong in #738
- test: Redis CI bootstrap by @cryo-zd in #751
- ✨ feat(observability): add configurable Prometheus metrics endpoint by @samzong in #740
- [Fix] workflow(owner-notification): fix workflow error by @samzong in #756
- test(e2e): add embedding signal E2E tests for CRDs by @yehudit1987 in #749
- Proposal: add TruthLens for Hallucination Detection and Mitigation by @Xunzhuo in #758
- [Misc]: 🔧 chore(ci): simplify precommit-publish workflow by removing nightly date tag generation by @samzong in #753
- [Feat] helm: use downloader image and add global.imageRegistry support by @samzong in #759
- [chore] Add Qwen3Guard category extraction support by @yuezhu1 in #761
- [CI] refactor helm publish workflow fix PR test error by @samzong in #762
- fix(pii): resolve inconsistent PII detection for EMAIL_ADDRESS by @yehudit1987 in #765
- [CI] feat(ci): Optimize CI workflows with concurrency and path filtering by @samzong in #763
- feat: fix podman supporting in docker-compose targets and quickstart.sh by @liavweiss in #772
- fix(tests): add CI failure tolerance and fix 4 embedding tests (Section 5/5) by @yehudit1987 in #623
- [Feat] Add HuggingFace Spaces playground for semantic router by @Xunzhuo in #779
- [CI] 🔧 chore(ci): skip workflows for draft pull requests by @samzong in #776
- feat: Add production-stack profile for E2E testing framework by @liavweiss in #767
- [Doc] Add Signal-Decision Architecture blog to README news by @Xunzhuo in #783
- feat(cache): implement O(1) eviction policies and O(k) TTL cleanup by @asaadbalum in #781
- fix(ci): optimize docker integration tests with minimal compose by @noalimoy in #786
- fix(dashboard): ensure devDependencies are installed during Docker build by @noalimoy in #780
- [Misc] 🔧 chore(kube): generate kind config if missing before cluster creation by @samzong in #775
- feat(classifier): enable LoRA auto-detection for intent classification by @yossiovadia in #726
- [Feat] add time-windowed endpoint metrics for load balancing by @tao12345666333 in #742
- Initial PR for performance test on integration test that running on CI by @szedan-rh in #778
- [Doc]: correct minor typos and formatting in documentation files by @wilsonwu in #794
- fix(test): correct relative path for PII LoRA model in auto-detection test by @yossiovadia in #788
- docs: add redis cache doc to sidebar by @cryo-zd in #795
- perf(e2e): reduce test case count to optimize CI execution time by @yossiovadia in #797
- [feat] Fact Check Model Training by @yuezhu1 in #810
- feat(deployment): add startupProbe for slow model loading by @noalimoy in #809
- [Feat] Add reasoning mode evaluation benchmark (Issue #42) by @asaadbalum in #791
- Move model storage to the /mnt directory on both the host and the Kin… by @liavweiss in #792
- [Feat][Memory] Add OpenAI Response API support by @Xunzhuo in #802
- Feat: Add Hallucination Detection Gatekeeper by @Xunzhuo in #799
- Fix: ping dep version to make sure integration tests pass by @Xunzhuo in #815
- [DOC]✨ feat(milvus): add Milvus deployment into Kubernetes and semantic cache support by @samzong in #773
- [Feat]: Add Dynamo E2E test profile with GPU support by @abdallahsamabd in #789
- feat(llm-katan): Add Kubernetes deployment support by @noalimoy in #710
- Fix the perofrmacne test report by @szedan-rh in #801
- feat(classifier): enable LoRA auto-detection for jailbreak classification by @yossiovadia in #812
- [Doc] Add new cookbook category and common errors to troubleshooting by @samzong in #818
- fix(ci): use minimal models for nightly performance baseline by @szedan-rh in #825
- [Feat] Feature: New Python-based Model Manager by @samzong in #820
- Add hybrid routing tests, Keyword → Embedding → BERT → MCP by @szedan-rh in #829
- Add Entropy testing for reasnoning decision acccording to probabiliti… by @szedan-rh in #833
- Disable the peformance comparision agaist baseline, keep just the per… by @szedan-rh in #836
- update: Improve Model Manager Configuration and CI Integration by @JaredforReal in #830
- [Misc] fix(dashboard): proxy Jaeger /dependencies route by @samzong in #839
- Adding new tests for reasoning filter by @szedan-rh in #843
- [CI] e2e: add Response API basic operations tests by @tao12345666333 in #826
- Sponsor: Add AMD Partnership by @Xunzhuo in #847
- feat: add hallucination bench by @rootfs in #838
- Test: Add comprehensive tests for PII and TLS utility modules by @JaredforReal in #840
- [Misc] [Dashboard/frontend] fix: regenerate package-lock.json with official npm registry by @samzong in #846
- Feature: add finance factual benchmark for hallucination detection by @Sophie8 in #851
- [Feat] [Dashboard/Frontend] Add configurable port support for Open WebUI iframe by @samzong in #844
- [Feat]: add upstream request span and trace context propagation for distributed tracing by @HanFa in #852
- refactor: remove unused MappingPath from FactCheckModelConfig by @Xunzhuo in #854
- [Bugfix]: StatefulSet readiness detection and add Dynamo demo video by @abdallahsamabd in #856
- fix: resolve empty/wrong domain classifications by @yehudit1987 in #827
- [Feat] All-in-One Docker image for single-container by @samzong in #845
- feat: dashboard playground tab connection failure by @liavweiss in #850
- [Misc] 🔧 chore(docker-stack.yml): disable arm64 build in docker-stack workflow due to buildx limitations by @samzong in #859
- [Feat] Add dashboard checks and CI workflow by @samzong in #861
- fix: refactor documentation and improve clarity across multiple doc files by @wilsonwu in #865
- feat(hf-playground): add more models to hf playground by @Xunzhuo in #864
- Created comprehensive test coverage in req_filter_tools_test.go with … by @szedan-rh in #848
- [CI] ci/optimize e2e profile matrix by @samzong in #870
- [CI] fix(ci): remove paths-ignore in integration test dynamic workflow by @samzong in #873
- [Feat] Implement VSR CLI tool for better user experience by @srini-abhiram in #824
- [CI] Fix curl network errors by switching to official setup actions by @samzong in #860
- [Bugfix]: enable kv cache for frontend in disaggregated router deployment and add more categories in classifier by @abdallahsamabd in #869
- [Misc] 🔧 chore(build-cli): conditional rust build for build-cli by @samzong in #874
- [Misc] extract C float-array conversion helper by @ErikJiang in #883
- feat: Fix Playground admin signup: proxy OpenWebUI /workspace+/auth and route /api/v1 via dashboard by @liavweiss in #884
- Bugfix: add config validation and fix state mutation by @henschwartz in #880
- fix(tsconfig): add ignoreDeprecations option to TypeScript configuration by @wilsonwu in #885
- refactor: mom models handling by @Xunzhuo in #862
- Fix(CI): pass the dashboard build failures by @Xunzhuo in #887
- ♻️ refactor(modeldownload): detect and use correct HuggingFace CLI by @samzong in #891
- deploy(k8s): remove llmd-base default namespace by @scydas in #892
- [CI] fix/llmd auth reviewer binding error and e2e ci-change filter by @samzong in #894
- Feat: Add vLLM-SR PYPI Support by @Xunzhuo in #896
- refactor(config): simplify external model configuration for guardrails by @Xunzhuo in #899
- Feat(core): Add User Feedback Signals Support by @Xunzhuo in #900
- feat(dashboard): replace external chat UI with native React component by @asaadbalum in #888
- Project: Re-Organize the Layout by @Xunzhuo in #902
- Feat(router): add preference-based Routing by @Xunzhuo in #912
- [Misc] ✨ feat(website): add react‑icons and use icons on team page by @samzong in #914
- Fix dashboard config validation and routing for partial updates (Issue #857) by @henschwartz in #909
- [CI] 🔧 chore(ci): move all dockerfile to tools/docker and update Dockerfile paths by @samzong in #915
- Docs: Update Outdated Contents by @Xunzhuo in #916
- Docs: add hallucination detection guide to content safety tutorials by @Xunzhuo in #919
- [Misc] Refactor embedding dimension validation by @ErikJiang in #876
- Fix(CI): update decision engine to pass when no decision matched by @Xunzhuo in #923
- [Misc] 📝 docs(pr-template): add CLI & Dashboard type to PR template by @samzong in #924
- [Dashboard] ♻️ refactor(dashboard): drop OpenWebUI & ChatUI depends for dashboard by @samzong in #920
- Chore: clean-up unused files by @Xunzhuo in #926
- Feat(dashboard,router): add enhanced UI components and signal tracking by @Xunzhuo in #927
- fix: inject chat_template_kwargs=false when use_reasoning is disabled (Qwen3/DeepSeek) by @liavweiss in #890
- [Doc]: add NVIDIA Dynamo installation guide by @abdallahsamabd in #931
- fix: streaming cache incremental chunks for cache hits + cache streaming responses by @liavweiss in #937
- docs: fix memory values in embedding routing performance table by @liavweiss in #939
- [CI/Build][Dashboard] Fix OpenShift dashboard build context by @nerdalert in #942
- [CI/Build][Dashboard] Update dashboard build to Go 1.24.1 by @nerdalert in #941
- feat(dashboard): align dashboard with vllm-sr CLI functionality by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/932
- Project: Update Team with New Members by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/945
- 💄 style(team): prevent company name wrap and fix spacing by @samzong in https://github.com/vllm-project/semantic-router/pull/947
- feat(dashboard): corrent CSS class names and CLI command reference by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/944
- fix: regenerate response ID and timestamp for cache hits to enable proper observability by @liavweiss in https://github.com/vllm-project/semantic-router/pull/946
- Chore: Add alias for Local Models by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/943
- fix(dashboard): route chat completions through Envoy proxy by @yehudit1987 in https://github.com/vllm-project/semantic-router/pull/936
- fix(cache): initialize embedding models before semantic cache (#928) by @noalimoy in https://github.com/vllm-project/semantic-router/pull/948
- Feat: Support Path Suffix for LLM Endpoints by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/949
New Contributors
- @yafengio made their first contribution in #7
- @gluonfield made their first contribution in #9
- @tao12345666333 made their first contribution in #13
- @cryo-zd made their first contribution in #17
- @QIN2DIM made their first contribution in #16
- @ZeroZ-lab made their first contribution in #19
- @aeft made their first contribution in #57
- @liangyuanpeng made their first contribution in #72
- @LysandreJik made their first contribution in #25
- @JaredforReal made their first contribution in #135
- @samzong made their first contribution in #146
- @lengrongfu made their first contribution in #161
- @fcanogab made their first contribution in #214
- @yossiovadia made their first contribution in #228
- @Aias00 made their first contribution in #264
- @ztang2370 made their first contribution in #255
- @windsonsea made their first contribution in #262
- @AkisAya made their first contribution in #282
- @danchev made their first contribution in #298
- @Copilot made their first contribution in #323
- @FeiDaLI made their first contribution in #341
- @srini-abhiram made their first contribution in #352
- @joyful-ii-V-I made their first contribution in #353
- @srampal made their first contribution in #229
- @zerofishnoodles made their first contribution in #418
- @wangchen615 made their first contribution in #435
- @JackLCL made their first contribution in #445
- @psinghal20 made their first contribution in #544
- @yehudit1987 made their first contribution in #548
- @NickJLange made their first contribution in #576
- @cooktheryan made their first contribution in #588
- @Sophie8 made their first contribution in #606
- @szedan-rh made their first contribution in #593
- @petecheslock made their first contribution in #622
- @htiennv made their first contribution in #633
- @yuezhu1 made their first contribution in #674
- @asaadbalum made their first contribution in #728
- @liavweiss made their first contribution in #772
- @noalimoy made their first contribution in #786
- @wilsonwu made their first contribution in #794
- @abdallahsamabd made their first contribution in #789
- @HanFa made their first contribution in #852
- @ErikJiang made their first contribution in #883
- @henschwartz made their first contribution in #880
- @scydas made their first contribution in #892
- @nerdalert made their first contribution in #942
Full Changelog: https://github.com/vllm-project/semantic-router/commits/v0.1.0