Possibility of using Generate API after exporting for inference on device for a custom LLM model - Android? #819
Replies: 1 comment 7 replies
-
Yes, you can bring your own custom models or use models that have the same architecture as already-supported model architectures (e.g. Gemma, LLaMA, Phi, etc).
Yes, you can use your ONNX model and you are not bounded by the example models shown. To create the supporting files needed to run with ONNX Runtime GenAI (e.g. The model builder will try to create the |
Beta Was this translation helpful? Give feedback.
-
Still fairly new to this framework, but would like to thank the contributors for your effort for providing us with this tool for generative AI.
My question is whether it is possible to include our own custom models or models with Gemma, TinyLLama arhictecture which are exported for inference and then using this genai onnxruntime framework to run faster inference for text generation (for on device inference - Android)?
The repository states that the models supported are Gemma and LLama architectures, however there aren't many provided examples on this situation except for phi-3 but in this case we are downloading an already pretrained model.
In my case, I would like to export an already existing transformer causal LM model on device and then use it with this framework for on device inference.
Would that be possible if we have a model which produces the logits output from the graph? For example, if we already had a
inference_model.onnx
file already available that accepts token_ids as input and outputs the logits. Almost same structure thatphi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx
uses with input havinginput_ids
,attention_mask
and output withlogits
+ hidden states.Or are we only bounded by the models supported in the given example?
Thank you for your answer in advance:)
Beta Was this translation helpful? Give feedback.
All reactions