Skip to content

v0.1.0 - Iris

Latest

Choose a tag to compare

@Xunzhuo Xunzhuo released this 05 Jan 05:59
· 32 commits to main since this release
8ad0c46
iris-1

What's Changed

  • feat: support auto-enable reasoning mode based on intention by @Xunzhuo in #1
  • fix: remove no needed todo and verify CI by @Xunzhuo in #2
  • project: add bench and site owners by @Xunzhuo in #4
  • project: add code of conduct by @Xunzhuo in #5
  • chore: unify docker images by @Xunzhuo in #6
  • fix: use the correct go test file name. by @yafengio in #7
  • ci: disable notify action for now by @Xunzhuo in #10
  • docs: semantic cache stale types and implementation by @gluonfield in #9
  • chore: rm readthedocs as its deprecated by @Xunzhuo in #12
  • Removed redundant / from code img by @tao12345666333 in #13
  • chore: Update CONTRIBUTING.md by @cryo-zd in #17
  • chore: add DCO requirement in CONTRIBUTING.md by @cryo-zd in #18
  • fix(cache): cleanup expired cache entries during update operations by @QIN2DIM in #16
  • chore(logging): unify the logging method by @ZeroZ-lab in #19
  • fix:make reasoning effort configurable by @OneZero-Y in #21
  • docs: add vsr star history diagram by @Xunzhuo in #26
  • docs: add repo link in CONTRIBUTING.md by @cryo-zd in #27
  • project: add acknowledgements to huggingface-candle by @Xunzhuo in #28
  • chore: replace fmt.Printf with log.Printf for logging by @cryo-zd in #29
  • doc: update workflow to create config.yaml by @rootfs in #30
  • feat: implement batch classification API by @OneZero-Y in #24
  • chore: 1) install rust if not present 2) expose bench params in env var by @rootfs in #54
  • feat: Add comprehensive monitoring metrics for batch classification API by @OneZero-Y in #58
  • docs: add pre-commit requirement code quality checks to contributing by @OneZero-Y in #60
  • feat: reasoning model controller by @tao12345666333 in #56
  • test: add unit tests for getModelFamilyAndTemplateParam by @tao12345666333 in #63
  • docs: add reasoning model metrics by @tao12345666333 in #64
  • feat: add test framework for classifier with dependency injection by @aeft in #57
  • project: add vllm semantic router v0.1 roadmap by @Xunzhuo in #22
  • test: add unit test around ttft pkg by @yuluo-yx in #68
  • feat: code polish on classifier by @yuluo-yx in #67
  • feat: robust model name filter for DeepSeek by @tao12345666333 in #69
  • fix: correct candle-binding replace path in go.mod files by @aeft in #65
  • project: add blog section by @Xunzhuo in #70
  • chore: only run the workflow notify-owners on vllm-project/semantic-router by @liangyuanpeng in #72
  • feat(observability): structured JSON logs and event fields by @tao12345666333 in #66
  • chore: Normalize comment punctuation to use English period by @cryo-zd in #79
  • chore: Use (*OpenAIRouter)(nil) for interface compliance check by @cryo-zd in #77
  • pricing: add currency label and change the metric name to llm_model_cost_total by @tao12345666333 in #80
  • test: add go vet to CI by @cryo-zd in #81
  • feat(logging): adopt zap as unified logging library by @tao12345666333 in #83
  • docs: add python install setups in install-local by @yuluo-yx in #78
  • feat(config): watch config file and hot-reload router without restart by @tao12345666333 in #84
  • chore: remove GPU and model params in config. Backend and model aware optimization will be handled in the control plane by @rootfs in #93
  • chore: add go mod tidy check by @Xunzhuo in #99
  • fix: startup config for docker-compose by @liangyuanpeng in #73
  • fix: don't set reasoning effort for non-reasoning models by @rootfs in #97
  • chore: add github action badge in README by @yuluo-yx in #102
  • refactor: use slices.Contains for readability and consistency by @cryo-zd in #104
  • test: add more test cases and refactor SelectBestModelForCategory/SelectBestModelFromList/InitializeJailbreakClassifier for testability by @aeft in #101
  • docs: add github action badge for docs index by @yuluo-yx in #103
  • feat: add milvus persistent storage support by @rootfs in #105
  • Slight readme changes by @LysandreJik in #25
  • refactor: move classifier model init to classifier.go and unify the classifier model init logic by @aeft in #113
  • docs: add eslint check for docs website by @yuluo-yx in #114
  • Refactor: use worker pool for batch classification concurrency by @cryo-zd in #115
  • feat: add comprehensive unit tests for entropy-based routing. Tests c… by @rootfs in #112
  • docs: reasoning quickstart by @tao12345666333 in #110
  • o11y: Add TTFT and TPOT histograms for SLOs by @tao12345666333 in #126
  • docs: add markdown lint check and fix md lint style by @yuluo-yx in #117
  • Feature Enhancement: Batch Inference Support in candle-binding by @OneZero-Y in #71
  • infra: add yaml lint check and fix yaml style by @yuluo-yx in #131
  • perf: enable concurrent classification via Arc+clone by @cryo-zd in #127
  • feat: implement dataset-agnostic router reasoning benchmark by @rootfs in #125
  • o11y: Add request error counters by @tao12345666333 in #132
  • logging: unify stdlib log usage to pkg/observability (zap) by @tao12345666333 in #134
  • fix: add comments for readability by @JaredforReal in #135
  • docs(installation): update Go version requirement and add test tip for model downloads by @samzong in #146
  • docs: reorder the quickstart pages by @Xunzhuo in #143
  • project: add ack for kubernetes by @Xunzhuo in #141
  • docs: sync blog from official vLLM by @Xunzhuo in #142
  • infra: refactor makefile by @yuluo-yx in #149
  • infra: update Dockerfile.extproc by @yuluo-yx in #158
  • fix: use request id to locate the correct cache entry to update by @aeft in #154
  • feat: add codespell check and tidy linter check config files by @yuluo-yx in #159
  • fix: miss copy tools dir in dockerfile by @lengrongfu in #161
  • metrics: Add request-level token histograms by @tao12345666333 in #157
  • docs: add repo URL in docker/README.md by @cryo-zd in #163
  • [Docs] remove discarded fields from documents by @lengrongfu in #165
  • Correct tools directory copy command in Dockerfile by @yuluo-yx in #171
  • feat: add basic cache eviction policy: LRU/LFU/FIFO by @aeft in #166
  • docs: Model Performance Evaluation Guide by @JaredforReal in #136
  • api: add semantic route support by @Xunzhuo in #147
  • infra: update Dockerfile.extproc by @yuluo-yx in #169
  • chore: add just max token for different models in router bench by @rootfs in #137
  • feat: add more content for contribution docs by @yuluo-yx in #175
  • fix: avoid double counting cache hits by @cryo-zd in #177
  • docs(router.md): add error metrics and example queries for llm_request_errors_total by @samzong in #156
  • docs: add docker compose quickstart by @JaredforReal in #181
  • docs: add detailed category section by @Xunzhuo in #183
  • feat: fix precommit container error by @yuluo-yx in #182
  • feat: update rust version in docs by @yuluo-yx in #176
  • feat: add v1/models endpoint by @JaredforReal in #186
  • feat: when run make precommit-local, check container runtime by @yuluo-yx in #187
  • refactor: move use_reasoning to the model level from the category level to support non-reasoning models by @rootfs in #178
  • fix: fix the timing of precommit image build by @yuluo-yx in #188
  • feat: Update .gitignore for AI docs by @JaredforReal in #191
  • feat: Support generic categories and MMLR-Pro mapping by @tao12345666333 in #192
  • api: remove unused health-check path in configuration by @Xunzhuo in #201
  • feat: Implement testing profile with mock vllm in docker compose by @JaredforReal in #190
  • feat: add validation for vllm endpoint address by @Xunzhuo in #202
  • feat: add config validation to NewCacheBackend by @cryo-zd in #204
  • docs: add note around model name consistency by @Xunzhuo in #205
  • security: add security attributes related to root usage to container definitions by @fcanogab in #214
  • docs: add run precommit by docker or podman by @yuluo-yx in #218
  • fix: docker compose testing profile with mock-vllm failed to IPv4 validation by @JaredforReal in #219
  • docs: network tips by @JaredforReal in #208
  • feat: set up Grafana and Prometheus for Observability and Monitoring by @JaredforReal in #222
  • project: add promotion rules by @Xunzhuo in #212
  • feat: validate eviction policy in cache config by @cryo-zd in #223
  • docs: add tutorials for semantic cache by @Xunzhuo in #230
  • docs: refactor and reogranize the contents by @Xunzhuo in #235
  • docs: k8s quickstart and observability with k8s by @JaredforReal in #225
  • feat: when run test-vllm, get model from openai models api by @yuluo-yx in #236
  • infra: cache models in test-and-build GHA by @yuluo-yx in #237
  • infra: fix models cache GHA by @yuluo-yx in #238
  • feat: add mock vLLM infrastructure for lightweight e2e testing by @yossiovadia in #228
  • LLM-Katan Terminal animation demo in the readme files by @yossiovadia in #240
  • optimize: use openai go sdk ChatCompletion replace map struct by @yuluo-yx in #246
  • chore: correct misplaced comment for struct UnifiedClassifier by @cryo-zd in #247
  • fix: LoRA Model Training Configuration and Data Balance by @OneZero-Y in #233
  • infra: add GHA restore key by @yuluo-yx in #244
  • perf: optimize FindSimilarTools by early pruning by @cryo-zd in #248
  • metrics: Add TTFT/TPOT p95 dashboard by @tao12345666333 in #250
  • feat: enhance terminal demo with improved layout and OpenAI compatibility showcase by @yossiovadia in #249
  • ci: avoid HF 429 on PRs by caching models and downloading minimal mod… by @tao12345666333 in #252
  • ci: support running docker-release in upper case user fork by @Xunzhuo in #258
  • feat: add multi-architecture support for Envoy and Golang by @Aias00 in #264
  • feat: support domain level auto system prompt injection by @Xunzhuo in #257
  • Fix: Envoy ext_proc 500 error when both value and raw_value are set in HeaderValue by @ztang2370 in #255
  • feat: support kubernetes environment by @Xunzhuo in #245
  • metrics: TTFT in streaming mode by @tao12345666333 in #203
  • feat: containerize and auto-release llm-katan by @Xunzhuo in #259
  • test: Add unit test to ensure header mutations only set one of Value or RawValue fields by @ztang2370 in #271
  • docs(style): add theme switching to the document website by @yuluo-yx in #221
  • [Docs] Use Docsaurus style for admonitions in install-doc by @windsonsea in #262
  • feat: support respond vsr decision in header by @Xunzhuo in #273
  • fix: force install hf_transfer to avoid missing pkg by @rootfs in #287
  • Update README.md by @yossiovadia in #289
  • test: add test for ToolsDatabase by @cryo-zd in #284
  • docs: add mermaid modal by @yuluo-yx in #288
  • feat: enable E2E testing with LLM Katan - 00-client-request-test by @yossiovadia in #290
  • feat: implement comprehensive ExtProc testing with cache bypass by @yossiovadia in #292
  • feat: support /v1/models in direct response by @Xunzhuo in #283
  • feat: add stream mode support by @AkisAya in #282
  • feat: support injection system prompt response header by @Xunzhuo in #297
  • docs: Fix documentation links in README.md by @danchev in #298
  • feat: add Grafana+Prometheus in k8s by @JaredforReal in #294
  • chore: update misplaced comments by @cryo-zd in #300
  • e2e test: 02-router-classification: verify router classification by @yossiovadia in #302
  • 03 classification api test by @yossiovadia in #304
  • docs: use ts replace js in docs website by @yuluo-yx in #299
  • feat(infra): enhance Docker workflows with Buildx and QEMU setup by @Aias00 in #307
  • fix: broken link in readme by @Xunzhuo in #316
  • feat: add open webui pipe by @Xunzhuo in #315
  • feat: add system prompt toggle endpoint by @rootfs in #301
  • Fix/improve batch classification test by @yossiovadia in #319
  • fix: use unified classifier in intent classification API when available by @yossiovadia in #320
  • feat: add CI test for k8s core deployment by @JaredforReal in #317
  • Fix Envoy container health check by replacing wget with curl by @Copilot in #323
  • Fix API silent failures and add OpenAPI 3.0 spec with Swagger UI by @Copilot in #326
  • Add OpenTelemetry Distributed Tracing for Fine-Grained Observability by @Copilot in #322
  • fix: use both unified and legacy classifier to prevent failure by @rootfs in #332
  • fix: use classification unit test by @rootfs in #333
  • feat: add comprehensive PII detection test suite by @yossiovadia in #334
  • Feature/add jailbreak detection test by @yossiovadia in #331
  • Feature/improve pii extproc testing by @yossiovadia in #335
  • feat(app): add direct execution support for local development by @FeiDaLI in #341
  • feat: add reasoning rate & cost & refusal rates by @JaredforReal in #327
  • perf: optimize FindSimilar by tracking best match by @cryo-zd in #347
  • docs: container connectivity troubleshooting by @JaredforReal in #346
  • chore: optimize Docker CI for faster builds and multi-architecture support by @Aias00 in #349
  • Bench: Add more dataset in router evaluation by @rootfs in #270
  • fix: enhance llm-katan OpenAI API compatibility for issue #241 by @yossiovadia in #354
  • Refactor(FindSimiliar): MilvusCache to use Milvus Search API by @srini-abhiram in #352
  • add wiki article training by @joyful-ii-V-I in #353
  • chore: fix pre-commit failures in #353 by @rootfs in #357
  • fix: resolve streaming clients hanging on security blocks (issue #355) by @yossiovadia in #356
  • feat: add design spec for additional prompt classification by @rootfs in #358
  • docs: move proposals to site by @Xunzhuo in #361
  • refactor(headers): centralize custom HTTP headers into dedicated package by @Xunzhuo in #362
  • feat: refactor observability configs for Compose and add for Local by @JaredforReal in #351
  • docs: add NVIDIA Dynamo integration proposal by @Xunzhuo in #373
  • fix: keep memory cache metrics accurate by @cryo-zd in #372
  • OpenShift Deployment with GPU Support by @yossiovadia in #376
  • fix: resolve semantic cache hit streaming response format issue by @Xunzhuo in #378
  • feat: enhance CI pipeline with improved caching and multi-arch support by @Aias00 in #360
  • refactor(structure): deploy and tools by @JaredforReal in #377
  • Openshift observability by @yossiovadia in #381
  • Openshift openwebui integration clean by @yossiovadia in #384
  • feat: enrich open webui chain of thought by @Xunzhuo in #379
  • docs: update readme to add open-webui chat demo by @Xunzhuo in #387
  • chore: clean-up unused diagrams by @Xunzhuo in #386
  • fix: fix docs website dark theme promoton and team btn not show font bug by @yuluo-yx in #390
  • feat: add out-of-tree and mcp based classification support by @rootfs in #375
  • feat: Modern Dashboard MVP by @JaredforReal in #388
  • feat: support inferencepool v1 by @Xunzhuo in #393
  • fix: remove log tail limit in validation script for model loading detection by @yossiovadia in #392
  • docs(config): add accuracy/latency/token-efficiency recipes and guide by @tao12345666333 in #394
  • feat: publish and release dashboard image by @Xunzhuo in #395
  • feat(Istio): integrate with Istio gateway via extproc by @srampal in #229
  • feat: add dashboard landing page by @Xunzhuo in #396
  • feat: add auto to online demo by @Xunzhuo in #400
  • docs: Add the tag to the unclear mermaid diagrams by @yuluo-yx in #398
  • feat(dashboard): add comprehensive configuration editing UI by @Xunzhuo in #402
  • infra: add tx and tsx support for precommit hook by @yuluo-yx in #403
  • feat(dashboard): enhance UI with navigation improvements and layout by @Xunzhuo in #405
  • feat: k8s support and some fixes by @JaredforReal in #407
  • feat: add topology for vllm dash by @Xunzhuo in #409
  • project: add publication and talk sections by @Xunzhuo in #206
  • chore: add rootfs and yuluo-yx as website owners by @yuluo-yx in #399
  • docs: add missing observability articles to sidebar by @Xunzhuo in #412
  • refactor(config): move reasoning fields from Category to ModelScore by @Xunzhuo in #414
  • infra: add golangci lint check by @yuluo-yx in #401
  • refactor(config): remove models field from vLLM endpoints by @Xunzhuo in #413
  • fix(make): mark model downloads with .downloaded sentinel (#309) by @samzong in #410
  • feat: enable system prompt inject from mcp server based classifier by @rootfs in #408
  • Docs: Add integration proposal for PS and SR by @zerofishnoodles in #418
  • feat(dashboard): enhance UI with collapsible sidebar, improved monitoring, and docker-compose updates by @Xunzhuo in #422
  • feat: add mcp classification server doc and example embedding based mcp classification server by @rootfs in #417
  • fix: fix the torch dependency for doc build by @rootfs in #428
  • ux: add quickstart script by @Xunzhuo in #424
  • fix: stop returning expired in-memory cache hits by @cryo-zd in #423
  • feat: use decoder only model for mcp classification server by @rootfs in #427
  • feat(website): add YouTube dashboard demo section to homepage by @Xunzhuo in #433
  • feat: make llm-katan as default in docker compose up by @JaredforReal in #426
  • doc: add dashboard.md in overview & update README by @JaredforReal in #432
  • feat(website): add News page with articles about vLLM Semantic Router by @wangchen615 in #435
  • docs: add tentative bi-weekly community meetings schedule by @wangchen615 in #198
  • chore(e2e): remove legacy mock/real vLLM test modes and Makefile targets by @samzong in #421
  • deploy: update docker compose file by @yuluo-yx in #425
  • feat: add OpenShift demo scripts and documentation by @yossiovadia in #446
  • fix: add missing files in istio deployment by @srampal in #449
  • Enhancement: Use milvus vector database for mcp-classifier-server in examples by @JackLCL in #445
  • fix: CI error & pre-commit & add MiniLM-L12-v2 & docker-compose-down by @JaredforReal in #450
  • feat: add tracing to docker compose by @JaredforReal in #434
  • fix: python pre-commit error by @JaredforReal in #458
  • feat: standardize editor configs for cross-platform development by @yuluo-yx in #456
  • docs(readme): add Latest News and Previous News sections by @Xunzhuo in #460
  • feat(website): add new projects to acknowledgements section by @Xunzhuo in #461
  • fix: README by @JaredforReal in #463
  • fix:add binary attributes for image files to prevent line ending conversion by @OneZero-Y in #459
  • fix: fix docker build for the mock-vllm component and wrong vsr_base_url in vLLM Semantic Router Pipe by @carlory in #462
  • optimize: optimize makefile target help by @yuluo-yx in #455
  • chore: add docker makefile target help by @yuluo-yx in #467
  • feat: fine tune qwen3 for knowledge specialization by @rootfs in #447
  • docs: ddd error prompts when installing VSR using Docker Compose. by @yuluo-yx in #470
  • Openshift dashboard clean by @yossiovadia in #469
  • chore: limit make test to minimal model download by @cryo-zd in #472
  • feat: add support for MoM model name by @Xunzhuo in #474
  • project: add preview for mom request by @Xunzhuo in #475
  • feat: add knob for /v1/models to control if respond real models. by @Xunzhuo in #476
  • chore: Update test description from Math to General by @carlory in #483
  • feat: add HuggingChat support by @JaredforReal in #477
  • project: 2025 Q4 roadmap by @Xunzhuo in #487
  • feat: add shelleck precommit hook by @yuluo-yx in #488
  • project: add q4 roadmap news by @Xunzhuo in #495
  • fix missing shellcheck in pre-commit image by @carlory in #497
  • docs: update contributing docs by @yuluo-yx in #501
  • feat(demo): enhance OpenShift demo scripts with improved UX by @yossiovadia in #478
  • fix: fix precommit Argument list too long error by @yuluo-yx in #502
  • feat: enforce milvus dial timeout if set by @cryo-zd in #503
  • Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs by @Copilot in #506
  • Allow semantic cache similarity threshold to be set at the category level by @Copilot in #493
  • Allow jailbreak detection and threshold to be configured at the category level by @Copilot in #508
  • Allow PII detection threshold to be set at the category level by @Copilot in #510
  • Fix: The caller information points to the wrapper function instead of the actual call location by @carlory in #518
  • feat: Implement hybrid cache that use in-memory index and milvus based doc store by @rootfs in #504
  • feat: add dashboard & openwebui to k8s deploy by @JaredforReal in #411
  • refactor: Implement modular candle-binding architecture (#254) by @rootfs in #266
  • fix:cache test import error by @OneZero-Y in #515
  • webiste: add scroll top btn by @yuluo-yx in #535
  • Add more News Blogs by @Xunzhuo in #543
  • refactor: k8s ci by @JaredforReal in #540
  • fix(website/news): fix the author name for decoding semantic router blog by @psinghal20 in #544
  • fix:hnsw heap polarity by @cryo-zd in #550
  • chore: upgrade rust version to 1.90 in all related Dockerfiles by @carlory in #499
  • fix: /app/extproc-server: /lib64/libc.so.6: version GLIBC_2.39 not found by @carlory in #551
  • feat(routing): Implement in-tree keyword-based routing by @srini-abhiram in #546
  • fix(k8s ci): extend wait windows in the workflow by @JaredforReal in #553
  • fix: Resolve quickstart script failures and add automated testing by @yehudit1987 in #548
  • feat(llm-katan): add CPU quantization for faster inference by @yossiovadia in #556
  • Fix regression to Istio deployment caused by recent commits by @srampal in #558
  • docs: Add keyword classifier configuration guide by @srini-abhiram in #559
  • chore: add wikipedia_data to .gitignore by @carlory in #563
  • docs: update architecture and add req flow by @Xunzhuo in #562
  • feat: add qwen3 lora adapter support in candle-binding by @rootfs in #549
  • fix: make command warning & CI pre-commit error by @JaredforReal in #569
  • docs: fix the display of the mobile menu. by @yuluo-yx in #570
  • refactor(core): restructure project architecture by @Xunzhuo in #572
  • refactor(config): reorganize configuration structure with hierarchical grouping by @Xunzhuo in #574
  • fix: building on non-cuda platforms without nvcc by @NickJLange in #576
  • refactor(config): restructure config to use nested model objects by @Xunzhuo in #577
  • paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads by @Xunzhuo in #578
  • feat(router): add intent-aware LoRA routing support by @Xunzhuo in #579
  • test(e2e): expand classification coverage and fix cache test issues by @yossiovadia in #585
  • chore: help command for the makefile rollback by @yuluo-yx in #583
  • fix: fix of deployment on openshift huggingface cli issues by @cooktheryan in #588
  • feat(llm-d): integrate vsr with llm-d by @srampal in #589
  • fix: correct HNSW frontier comparisons in hybrid cache by @cryo-zd in #587
  • [Docs] Add production stack integration tutorial by @zerofishnoodles in #592
  • refactor: k8s aigw deploy mode by @Xunzhuo in #597
  • feat: add integration with vLLM AIBrix by @Xunzhuo in #599
  • refactor: router core by @Xunzhuo in #601
  • fix: resolve classify_unified_batch interior mutability issue by @OneZero-Y in #596
  • fix(tests): resolve skipped BERT similarity model tests (Section 1/5) by @yehudit1987 in #600
  • fix: resolve LoRA training accuracy regression (issue #584) by @yossiovadia in #590
  • Add Blog for Modular LoRA by @Xunzhuo in #534
  • [Blog]: Semantic Tool Selection by @Xunzhuo in #604
  • feat(website): simplify publications page UI and optimize mobile display by @Xunzhuo in #605
  • docs: redirect kubernetes installation page to ai-gateway guide by @Xunzhuo in #603
  • [Docs] Simplify estimation data content by @Xunzhuo in #607
  • fix(tests): enable all 5 Milvus hybrid cache tests (Section 2/5) by @yehudit1987 in #602
  • fix: correct yaml linting hook to call yaml-lint instead of markdown-lint by @yossiovadia in #609
  • feat: add embedding model continuous batching scheduler by @rootfs in #564
  • Revert "fix: correct yaml linting hook to call yaml-lint instead of markdown-lint" by @rootfs in #610
  • chore: fix milvus cache unit test by @rootfs in #612
  • fix: correct yaml linting hook and fix trailing spaces/comment spacing by @yossiovadia in #611
  • Feat: fix-issue-336: Implement In-Tree Embedding Similarity Matching by @Sophie8 in #606
  • feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. by @szedan-rh in #593
  • chore: fix cache unit test by @rootfs in #613
  • fix:Memory Management in FFI Error Handling by @OneZero-Y in #614
  • fix: parse Milvus snake_case config fields correctly by @cryo-zd in #616
  • feat: add helm support deploy support by @yuluo-yx in #532
  • infra(ci): add GHA exec condition by @yuluo-yx in #619
  • [Refactor] Remove ClassifyCategory and add embedding classifier config by @Xunzhuo in #620
  • fix(tests): Enable TestCandleBertTokensWithLabels and expose CI failures (Section 4/5) by @yehudit1987 in #621
  • [Doc]: update editUrl in docusaurus config to point to the correct website directory by @petecheslock in #622
  • fix: auto-generate lora_config.json in training script by @yossiovadia in #629
  • [Doc] Update the llm-d doc wording, use the official llm-d container image by @srampal in #631
  • test: Improve e2e-classification tests. by @yossiovadia in #630
  • feat: removes the dependency of once_cell by @htiennv in #633
  • [Doc] Reorganize intelligent routing tutorials into focused guides by @Xunzhuo in #636
  • Fix OpenShift Dashboard Playground OpenWebUI Connection by @szedan-rh in #634
  • fix(openshift): add ChatUI (HuggingChat) deployment with MongoDB support by @szedan-rh in #637
  • Test: Validate Unified Classifier correctly chooses between LoRA path and Traditional path for inference. by @yossiovadia in #639
  • [Feat]: VSR + public LLM/ OpenAI + local llm + istio + LLM-d deployment guide by @srampal in #643
  • ci(helm): add workflow to publish Helm chart to GHCR on merge by @Xunzhuo in #649
  • fix(helm): remove namespace template to resolve installation conflicts by @Xunzhuo in #651
  • [Misc] Reduce initial delay for liveness and readiness probes by @Xunzhuo in #652
  • [Doc] Migrate Helm README to helm-docs format and remove example values files by @Xunzhuo in #653
  • [Feat] Add automate e2e test framework for extensible integration tests by @Xunzhuo in #655
  • [Integration]: Add integration with Kserve functionality by @cooktheryan in #566
  • chore: enhance moderator by @rootfs in #670
  • Spam filter by @rootfs in #671
  • chore: refactor spam filter by @rootfs in #672
  • feat(e2e): enhance setup-only mode and add startup banner by @Xunzhuo in #673
  • [feat]: Add DeBERTa v3 prompt injection detection support by @yuezhu1 in #674
  • [CI/Build] Fail e2e tests when accuracy is 0% by @Xunzhuo in #676
  • ✨ feat(helm): add support for extra initContainer env variables. by @samzong in #679
  • feat: Implement ReDoS-safe regex scanning provider by @srini-abhiram in #644
  • fix(tests): resolve 3 skipped model directory tests (Section 3/5) by @yehudit1987 in #632
  • feat: add Jaeger tracing observability to OpenShift deployment by @szedan-rh in #646
  • [CI/Build] Fix compilerBrokenImport on macOS M1 by @carlory in #682
  • fix: Grafana monitoring page iframe embedding and dynamic cluster configuration by @szedan-rh in #642
  • chore: update community meeting calendar by @rootfs in #685
  • fix: fixed the font display issue on the team page in dark mode. by @yuluo-yx in #689
  • [Feat]: Signal-Decision Driven Semantic Routing with Dynamic Plugin Architecture by @Xunzhuo in #681
  • Add E2E tests for keyword routing (Issue #667) by @szedan-rh in #684
  • feat: Add aibrix profile for E2E testing framework by @yehudit1987 in #688
  • chore: Delete test_file.txt by @yuluo-yx in #697
  • infra(precommit): fix md precommit error by @yuluo-yx in #700
  • 📝 docs(gaie): add Gateway API inference extension docs (#664) by @samzong in #677
  • feat(e2e): Add comprehensive signal-decision engine test coverage by @yehudit1987 in #695
  • fix(647): enable LoRA PII auto-detection with minimal changes by @yossiovadia in #709
  • fix(api): expose actual PII confidence scores instead of hardcoded 0.9 by @yossiovadia in #718
  • [Bugfix] adjust istio config to align with new architecture by @srampal in #711
  • docs: add SEO config by @yuluo-yx in #719
  • doc: Fix lost documentation links by adding the missing sidebar entries by @samzong in #721
  • fix: keep existing InMemory HNSW nodes searchable after eviction by @cryo-zd in #722
  • 📝 doc(architecture): add gateway integrations overview by @samzong in #720
  • chore: adjust github ci exec condition by @yuluo-yx in #704
  • fix: Move keyword routing tests to e2e framework and validate matched_keywords by @szedan-rh in #694
  • fix the ci test for quickstart.sh script, In case we had failure in downloading embeddinggemma-300m, to fallback into minimal models by @szedan-rh in #737
  • feat: add LLM-D profile for E2E testing framework by @samzong in #705
  • feat: add RedisVL as new semantic cache storage by @rootfs in #734
  • docs(installation): update model_config examples and clarify vLLM backend setup by @samzong in #741
  • docs: add DeepWiki badge to README.md, enable auto refresh. by @samzong in #744
  • Bugfix: rename server_keyword.py.py to server_keyword.py by @samzong in #745
  • [feat]Support Qwen/Qwen3Guard-Gen-0.6B for prompt_guard by @yuezhu1 in #748
  • feat(e2e): add comprehensive E2E test coverage for MCP classifier by @szedan-rh in #743
  • feat: optimize cache, add checkConnection by @yuluo-yx in #739
  • [Bugfix]: owner-notification: checkout base repo (not PR head) to eli… by @samzong in #747
  • feat: Add istio profile for E2E testing framework by @asaadbalum in #728
  • [Feat] add model-downloader image and CI workflow for ghcr publishing by @samzong in #738
  • test: Redis CI bootstrap by @cryo-zd in #751
  • ✨ feat(observability): add configurable Prometheus metrics endpoint by @samzong in #740
  • [Fix] workflow(owner-notification): fix workflow error by @samzong in #756
  • test(e2e): add embedding signal E2E tests for CRDs by @yehudit1987 in #749
  • Proposal: add TruthLens for Hallucination Detection and Mitigation by @Xunzhuo in #758
  • [Misc]: 🔧 chore(ci): simplify precommit-publish workflow by removing nightly date tag generation by @samzong in #753
  • [Feat] helm: use downloader image and add global.imageRegistry support by @samzong in #759
  • [chore] Add Qwen3Guard category extraction support by @yuezhu1 in #761
  • [CI] refactor helm publish workflow fix PR test error by @samzong in #762
  • fix(pii): resolve inconsistent PII detection for EMAIL_ADDRESS by @yehudit1987 in #765
  • [CI] feat(ci): Optimize CI workflows with concurrency and path filtering by @samzong in #763
  • feat: fix podman supporting in docker-compose targets and quickstart.sh by @liavweiss in #772
  • fix(tests): add CI failure tolerance and fix 4 embedding tests (Section 5/5) by @yehudit1987 in #623
  • [Feat] Add HuggingFace Spaces playground for semantic router by @Xunzhuo in #779
  • [CI] 🔧 chore(ci): skip workflows for draft pull requests by @samzong in #776
  • feat: Add production-stack profile for E2E testing framework by @liavweiss in #767
  • [Doc] Add Signal-Decision Architecture blog to README news by @Xunzhuo in #783
  • feat(cache): implement O(1) eviction policies and O(k) TTL cleanup by @asaadbalum in #781
  • fix(ci): optimize docker integration tests with minimal compose by @noalimoy in #786
  • fix(dashboard): ensure devDependencies are installed during Docker build by @noalimoy in #780
  • [Misc] 🔧 chore(kube): generate kind config if missing before cluster creation by @samzong in #775
  • feat(classifier): enable LoRA auto-detection for intent classification by @yossiovadia in #726
  • [Feat] add time-windowed endpoint metrics for load balancing by @tao12345666333 in #742
  • Initial PR for performance test on integration test that running on CI by @szedan-rh in #778
  • [Doc]: correct minor typos and formatting in documentation files by @wilsonwu in #794
  • fix(test): correct relative path for PII LoRA model in auto-detection test by @yossiovadia in #788
  • docs: add redis cache doc to sidebar by @cryo-zd in #795
  • perf(e2e): reduce test case count to optimize CI execution time by @yossiovadia in #797
  • [feat] Fact Check Model Training by @yuezhu1 in #810
  • feat(deployment): add startupProbe for slow model loading by @noalimoy in #809
  • [Feat] Add reasoning mode evaluation benchmark (Issue #42) by @asaadbalum in #791
  • Move model storage to the /mnt directory on both the host and the Kin… by @liavweiss in #792
  • [Feat][Memory] Add OpenAI Response API support by @Xunzhuo in #802
  • Feat: Add Hallucination Detection Gatekeeper by @Xunzhuo in #799
  • Fix: ping dep version to make sure integration tests pass by @Xunzhuo in #815
  • [DOC]✨ feat(milvus): add Milvus deployment into Kubernetes and semantic cache support by @samzong in #773
  • [Feat]: Add Dynamo E2E test profile with GPU support by @abdallahsamabd in #789
  • feat(llm-katan): Add Kubernetes deployment support by @noalimoy in #710
  • Fix the perofrmacne test report by @szedan-rh in #801
  • feat(classifier): enable LoRA auto-detection for jailbreak classification by @yossiovadia in #812
  • [Doc] Add new cookbook category and common errors to troubleshooting by @samzong in #818
  • fix(ci): use minimal models for nightly performance baseline by @szedan-rh in #825
  • [Feat] Feature: New Python-based Model Manager by @samzong in #820
  • Add hybrid routing tests, Keyword → Embedding → BERT → MCP by @szedan-rh in #829
  • Add Entropy testing for reasnoning decision acccording to probabiliti… by @szedan-rh in #833
  • Disable the peformance comparision agaist baseline, keep just the per… by @szedan-rh in #836
  • update: Improve Model Manager Configuration and CI Integration by @JaredforReal in #830
  • [Misc] fix(dashboard): proxy Jaeger /dependencies route by @samzong in #839
  • Adding new tests for reasoning filter by @szedan-rh in #843
  • [CI] e2e: add Response API basic operations tests by @tao12345666333 in #826
  • Sponsor: Add AMD Partnership by @Xunzhuo in #847
  • feat: add hallucination bench by @rootfs in #838
  • Test: Add comprehensive tests for PII and TLS utility modules by @JaredforReal in #840
  • [Misc] [Dashboard/frontend] fix: regenerate package-lock.json with official npm registry by @samzong in #846
  • Feature: add finance factual benchmark for hallucination detection by @Sophie8 in #851
  • [Feat] [Dashboard/Frontend] Add configurable port support for Open WebUI iframe by @samzong in #844
  • [Feat]: add upstream request span and trace context propagation for distributed tracing by @HanFa in #852
  • refactor: remove unused MappingPath from FactCheckModelConfig by @Xunzhuo in #854
  • [Bugfix]: StatefulSet readiness detection and add Dynamo demo video by @abdallahsamabd in #856
  • fix: resolve empty/wrong domain classifications by @yehudit1987 in #827
  • [Feat] All-in-One Docker image for single-container by @samzong in #845
  • feat: dashboard playground tab connection failure by @liavweiss in #850
  • [Misc] 🔧 chore(docker-stack.yml): disable arm64 build in docker-stack workflow due to buildx limitations by @samzong in #859
  • [Feat] Add dashboard checks and CI workflow by @samzong in #861
  • fix: refactor documentation and improve clarity across multiple doc files by @wilsonwu in #865
  • feat(hf-playground): add more models to hf playground by @Xunzhuo in #864
  • Created comprehensive test coverage in req_filter_tools_test.go with … by @szedan-rh in #848
  • [CI] ci/optimize e2e profile matrix by @samzong in #870
  • [CI] fix(ci): remove paths-ignore in integration test dynamic workflow by @samzong in #873
  • [Feat] Implement VSR CLI tool for better user experience by @srini-abhiram in #824
  • [CI] Fix curl network errors by switching to official setup actions by @samzong in #860
  • [Bugfix]: enable kv cache for frontend in disaggregated router deployment and add more categories in classifier by @abdallahsamabd in #869
  • [Misc] 🔧 chore(build-cli): conditional rust build for build-cli by @samzong in #874
  • [Misc] extract C float-array conversion helper by @ErikJiang in #883
  • feat: Fix Playground admin signup: proxy OpenWebUI /workspace+/auth and route /api/v1 via dashboard by @liavweiss in #884
  • Bugfix: add config validation and fix state mutation by @henschwartz in #880
  • fix(tsconfig): add ignoreDeprecations option to TypeScript configuration by @wilsonwu in #885
  • refactor: mom models handling by @Xunzhuo in #862
  • Fix(CI): pass the dashboard build failures by @Xunzhuo in #887
  • ♻️ refactor(modeldownload): detect and use correct HuggingFace CLI by @samzong in #891
  • deploy(k8s): remove llmd-base default namespace by @scydas in #892
  • [CI] fix/llmd auth reviewer binding error and e2e ci-change filter by @samzong in #894
  • Feat: Add vLLM-SR PYPI Support by @Xunzhuo in #896
  • refactor(config): simplify external model configuration for guardrails by @Xunzhuo in #899
  • Feat(core): Add User Feedback Signals Support by @Xunzhuo in #900
  • feat(dashboard): replace external chat UI with native React component by @asaadbalum in #888
  • Project: Re-Organize the Layout by @Xunzhuo in #902
  • Feat(router): add preference-based Routing by @Xunzhuo in #912
  • [Misc] ✨ feat(website): add react‑icons and use icons on team page by @samzong in #914
  • Fix dashboard config validation and routing for partial updates (Issue #857) by @henschwartz in #909
  • [CI] 🔧 chore(ci): move all dockerfile to tools/docker and update Dockerfile paths by @samzong in #915
  • Docs: Update Outdated Contents by @Xunzhuo in #916
  • Docs: add hallucination detection guide to content safety tutorials by @Xunzhuo in #919
  • [Misc] Refactor embedding dimension validation by @ErikJiang in #876
  • Fix(CI): update decision engine to pass when no decision matched by @Xunzhuo in #923
  • [Misc] 📝 docs(pr-template): add CLI & Dashboard type to PR template by @samzong in #924
  • [Dashboard] ♻️ refactor(dashboard): drop OpenWebUI & ChatUI depends for dashboard by @samzong in #920
  • Chore: clean-up unused files by @Xunzhuo in #926
  • Feat(dashboard,router): add enhanced UI components and signal tracking by @Xunzhuo in #927
  • fix: inject chat_template_kwargs=false when use_reasoning is disabled (Qwen3/DeepSeek) by @liavweiss in #890
  • [Doc]: add NVIDIA Dynamo installation guide by @abdallahsamabd in #931
  • fix: streaming cache incremental chunks for cache hits + cache streaming responses by @liavweiss in #937
  • docs: fix memory values in embedding routing performance table by @liavweiss in #939
  • [CI/Build][Dashboard] Fix OpenShift dashboard build context by @nerdalert in #942
  • [CI/Build][Dashboard] Update dashboard build to Go 1.24.1 by @nerdalert in #941
  • feat(dashboard): align dashboard with vllm-sr CLI functionality by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/932
  • Project: Update Team with New Members by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/945
  • 💄 style(team): prevent company name wrap and fix spacing by @samzong in https://github.com/vllm-project/semantic-router/pull/947
  • feat(dashboard): corrent CSS class names and CLI command reference by @asaadbalum in https://github.com/vllm-project/semantic-router/pull/944
  • fix: regenerate response ID and timestamp for cache hits to enable proper observability by @liavweiss in https://github.com/vllm-project/semantic-router/pull/946
  • Chore: Add alias for Local Models by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/943
  • fix(dashboard): route chat completions through Envoy proxy by @yehudit1987 in https://github.com/vllm-project/semantic-router/pull/936
  • fix(cache): initialize embedding models before semantic cache (#928) by @noalimoy in https://github.com/vllm-project/semantic-router/pull/948
  • Feat: Support Path Suffix for LLM Endpoints by @Xunzhuo in https://github.com/vllm-project/semantic-router/pull/949

New Contributors

Full Changelog: https://github.com/vllm-project/semantic-router/commits/v0.1.0