Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 316 Bytes

README.MD

File metadata and controls

5 lines (4 loc) · 316 Bytes

Rama - (Rust-Llama)

Experimental project for local inference. This project is intended as a testbed for improving on-device inference performance Llama.cpp is a huge codebase, with many features. Rama has one implementation running in fp32.

Current performance on M3 pro for Llama 2 7B - (oct 2024) - 0.2 tok/s.