Skip to content

Latest commit

 

History

History
67 lines (51 loc) · 2.49 KB

TOPOLOGY.md

File metadata and controls

67 lines (51 loc) · 2.49 KB

Model topology configuration

Quantization and device mapping in one file.

Use a simple model topology to configure ISQ and device mapping for per-layer with a single YAML file (examples here)!

To support per-layer mix of ISQ, Mistral.rs supports loading a model topology YAML file. This YAML file is formatted as follows:

  1. Top-level keys are either:
    • A range of layers (start-end) where start < end. start is inclusive and end is exclusive
    • A single layer number
    1. The topology for the range or layer:
      • An optional key (isq) which maps to a single value, which can be any ISQ type. If not specified, there is no ISQ for this range of layers applied.
      • An optional key (device) which maps to a single value, which is one of the below. If not specified, the default loading deice will be used.
        • cpu
        • cuda[ORDINAL]
        • metal[ORDINAL]

Note that:

  • The topology for the range is expanded to fill the range
  • If ranges overlap, the range with the higher end layer takes precedence and will overwrite
  • Any layers which are not covered will have no topology mapping. They will inherit any other ISQ (e.g. with --isq/in_situ_quant) set.
  • Unless the layer is not covered by the topology, the topology value will override any other ISQ (e.g. with --isq/in_situ_quant).
  • The topology device mapping will override any other device mapping.
  • When using UQFF, only the device mapping is relevant.
0-8:
  isq: Q3K
  device: cuda[0]
8-16:
  isq: Q4K
  device: cpu
16-24:
  isq: Q6K
# Skip 24-28
28-32:
  isq: Q8_0
  device: cuda[0]

Model topologies may be applied to all model types.

CLI example

Note

You should replace --features ... with one of the features specified here, or remove it for pure CPU inference.

cargo run --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml   

HTTP server example

Note

You should replace --features ... with one of the features specified here, or remove it for pure CPU inference.

cargo run --features ... -- --port 1234 plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml   

Rust example

Example here.

Python example

Example here.