Experimental project for local inference. This project is intended as a testbed for improving on-device inference performance Llama.cpp is a huge codebase, with many features. Rama has one implementation running in fp32.
Current performance on M3 pro for Llama 2 7B - (oct 2024) - 0.2 tok/s.