Support GGUF BF16 tensors #691

EricLBuehler · 2024-08-17T15:30:12Z

Refs EricLBuehler/candle#17, mirror of huggingface/candle#2387.

github-actions · 2024-08-17T15:31:11Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   11          102          101            0            1
 Python                 46         2018         1718           62          238
 TOML                   20          618          545           11           62
 YAML                    1            9            8            1            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               28         1961            0         1482          479
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 6          408          365           19           24
 |- TOML                 2           75           63            0           12
 (Total)                           2649          620         1501          528
-------------------------------------------------------------------------------
 Rust                  196        60138        54573         1104         4461
 |- Markdown           101          927           13          864           50
 (Total)                          61065        54586         1968         4511
===============================================================================
 Total                 311        65357        57367         2660         5330
===============================================================================

* Support GGUF bf16 tensors * Fix loading of bf16 ggml tensor * Fix dequant of bf16 * Use merged rev

@p-e-w

* Implement dry penalty * Add dry sampling params to requests * Handle it * Clippy * Review: "Implement DRY penalty" (#645) * Silence bogus Clippy warning Clippy's suggestion cannot be implemented because of borrowing issues * Get rid of unnecessary type annotations Interesting that Clippy doesn't catch this * Store default sequence breakers in a slice It's nicer when the length is not hardcoded * Make default sequence breakers private No need to leak this as it's not used elsewhere * Limit match length Avoids quadratic runtime and potential DoS with adversarial inputs Ref oobabooga/text-generation-webui#6047 * "Fix" sequence breaker tokenization Most tokenizers encode punctuation tokens differently depending on where they occur in the input, and which tokens surround them. With the default sequence breakers, the appropriate encoding usually corresponds to the encoding produced when the token occurs after a word, rather than by itself. To emulate this, prefix the token with "a" before encoding, and extract the final token of the result. See LostRuins/koboldcpp#982 for a correct solution to this problem. * Nicer * Even better * Complete merge * Fix saturating sub * Handle when no context * Make context the entire sequence and refactor * Remove slicing for all * Fix the bug with penalty Credit to @p-e-w for finding this! Co-authored-by: Philipp Emanuel Weidmann <[email protected]> * Add custom logits processor API (#702) * Add custom logits processor api * Typos * Nicer interface and update example * Fix doctest * Update docs * Update exports * Add Gemma 2 PagedAttention support (#704) * Add gemma2 paged attn support * Non cuda support? * Remove error * It works * Faster RmsNorm in gemma/gemma2 (#703) * Fix bug in metal isq (#706) * Support GGUF BF16 tensors (#691) * Support GGUF bf16 tensors * Fix loading of bf16 ggml tensor * Fix dequant of bf16 * Use merged rev * Softcapping, real batching + sliding window support for Flash Attention (#707) * Flash attention varlen kind of works * Seems to work * Now it's nice * Sliding window support and clippy * Remove warning * Support smollm * Update rev to match merged * Remove some usages of 'pub' in models (#708) * Support the Phi 3.5 V model (#710) * Update image_seq_len * Update the examples * Format * Implement the Phi 3.5 MoE model (#709) * Copy the model * Add most of it * Add the blocksparse moe parts * Clippy * Fix mscales * A batch of fixes * Correctly cast it * Handle isq on gate * Even more progress * Runs now * Clippy * Fix to use layernorm * Remove unused * Add docs * Add more docs * Apply review comments * Update readme --------- Co-authored-by: Philipp Emanuel Weidmann <[email protected]>

Support GGUF bf16 tensors

34f5370

EricLBuehler mentioned this pull request Aug 17, 2024

unknown dtype for tensor (BF16?) #663

Closed

EricLBuehler added 4 commits August 20, 2024 06:36

Fix loading of bf16 ggml tensor

086d73f

Fix dequant of bf16

4a8df85

Use merged rev

2d63e07

Merge into master

6acba23

EricLBuehler merged commit 34d0bb6 into master Aug 21, 2024
17 checks passed

EricLBuehler deleted the gguf_bf16 branch August 21, 2024 13:31

EricLBuehler added a commit that referenced this pull request Aug 24, 2024

Support GGUF BF16 tensors (#691)

bca949e

* Support GGUF bf16 tensors * Fix loading of bf16 ggml tensor * Fix dequant of bf16 * Use merged rev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GGUF BF16 tensors #691

Support GGUF BF16 tensors #691

EricLBuehler commented Aug 17, 2024

github-actions bot commented Aug 17, 2024 •

edited

Loading

Support GGUF BF16 tensors #691

Support GGUF BF16 tensors #691

Conversation

EricLBuehler commented Aug 17, 2024

github-actions bot commented Aug 17, 2024 • edited Loading

github-actions bot commented Aug 17, 2024 •

edited

Loading