Skip to content

Commit

Permalink
Merge branch 'main' into feat/use-llama-cpp-server
Browse files Browse the repository at this point in the history
  • Loading branch information
vansangpfiev authored Jan 7, 2025
2 parents 1903170 + 44412ee commit 7bbc7fe
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,4 @@ Table of parameters
|`flash_attn` | Boolean| To enable Flash Attention, default is true|
|`cache_type` | String| KV cache type: f16, q8_0, q4_0, default is f16|
|`use_mmap` | Boolean| To enable mmap, default is true|
|`ctx_shift` | Boolean| To enable context shift, default is true|
1 change: 1 addition & 0 deletions src/llama_engine.cc
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,7 @@ bool LlamaEngine::LoadModelImpl(std::shared_ptr<Json::Value> json_body) {
}
}

params.ctx_shift = json_body->get("ctx_shift", true).asBool();
params.n_gpu_layers =
json_body->get("ngl", 300)
.asInt(); // change from 100 -> 300 since llama 3.1 has 292 gpu layers
Expand Down

0 comments on commit 7bbc7fe

Please sign in to comment.