Architecture for Jan to support multiple Inference Engines #1271
freelerobot
started this conversation in
Feature Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Previous thread: #771
Context
Solution
I envision an architecture in Jan that has the following:
Models Extension
/models
API endpointInference Extension
/chat/completions
, later/audio/speech
)model.json
)Extension for each Inference Engine
/chat/completions
endpointExample
File Tree
model.json
gpt4-32k-1603engine.json
example for NitroExecution Path
llama2-70b-intel-bigdl
Inference Extension
loads themodel.json
forllama2-70b-intel-bigdl
and sees engine isintel-bigdl
Inference Extension
routes it tointel-bigdl
Inference Engine Extensionintel-bigdl
Inference Engine Extension takes in/chat/completions
request, runs inference, and returns result through SSEBeta Was this translation helpful? Give feedback.
All reactions