This is a pure C# port of Alfonso² Peterssen's Java port of Andrej Karpathy's awesome llama2.c, a very simple implementation to run inference of models with a Llama2-like transformer-based LLM architecture.
Requires the .NET 8 SDK.
The code expects tokenizer.bin
in the current directory.
The sample stories15M.bin
model can be found here
To build and run:
dotnet run -c Release stories15M.bin