Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the F8E4M3 dtype for CUDA and CPU #2546

Closed
wants to merge 94 commits into from

Commits on May 15, 2024

  1. Mistral.rs Squash Changes (#4)

    * Offset it
    
    * Freeze
    
    * Offset it
    
    * Offset it
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Try out vllm impl again
    
    * Remove debugs
    
    * Polish it up
    
    * Polish it up
    
    * Clippy
    
    * Remove test file
    
    * Add config for if neox
    
    * Fix bug
    
    * Fix bug
    
    * Cast cache type on rust side
    
    * Cast types
    
    * To dtype
    
    * Drop temp
    
    * Update casting
    
    * Update casting
    
    * Update casting
    
    * Create dtype in bf16
    
    * Check type
    
    * Debug
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Check dtype
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Check old method
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Use mistral slow rope impl
    
    * Reseting
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Remove debug
    
    * Debug
    
    * Debug
    
    * Remove debug
    
    * Remove debug
    
    * Debug
    
    * Remove debug
    
    * Debug
    
    * Remove debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Debug
    
    * Try to use 3dim rotemb fused
    
    * Try to use 3dim rotemb fused
    
    * Remove contig and debug
    
    * Check handling
    
    * Cleanup
    
    * Fix
    
    * Remove prints
    
    * Lower block dim
    
    * Use fused layernorm
    
    * Pass batch size
    
    * Simplify internal API
    
    * Simplify internal API
    
    * Try slow
    
    * Try candle layer norm
    
    * Try candle layer norm
    
    * Fix dep of candle layer norm
    
    * Reshape input for rank 2
    
    * Reshape input for rank 2
    
    * Fix ref
    
    * Code style
    
    * Make dep optional
    
    * Ensure contig
    
    * Ensure contig
    
    * Ensure contig
    
    * Debug contig dmmv error
    
    * Debug contig dmmv error
    
    * Debug contig dmmv error
    
    * Debug contig dmmv error
    
    * Try other method
    
    * Try other method
    
    * Try other method
    
    * Try other method
    
    * Try other method
    
    * Use typestate to optimize
    
    * Use typestate to optimize
    
    * Fixes
    
    * Fixes
    
    * Fixes
    
    * Fixes
    
    * Fixes
    
    * Debug via using slow rmsnorm
    
    * Debug via using slow rope
    
    * Remove debug
    
    * More debugging
    
    * Remove debug
    
    * Remove debug
    
    * Remove debug
    
    * Add better error enum
    
    * Fix diff marker
    
    * Fix some things
    
    * Fix some things
    
    * Fix some things
    
    * Fix dummy backends
    
    * Re add from storage noop
    
    * Fix removed kvconcat custom op
    
    * Fix erroneous feature gate
    
    * Complete metal backend refactoring
    
    * Check if calling
    
    * Check if calling
    
    * Update default for force dmmv
    
    * Load atomic
    
    * Debug
    
    * Use mmvq
    
    * Update
    
    * Add the empty functions
    
    * Add rope new_partial function
    
    * Make variant of qmatmul pub
    
    * Make variant of qmatmul pub
    
    * Add the varbuilder set_device function
    
    * Only link stdc++ if target has msvc
    
    * Only link stdc++ if target has msvc
    
    * Only link stdc++ if target has msvc
    
    * Only link stdc++ if target has msvc
    
    * Handle case of device mapping
    
    * Handle case of device mapping
    
    * Add getter
    
    * Fix
    
    * Fix
    
    * Support nvcc flags in flash attn
    
    * Support nvcc flags in flash attn
    
    * Support nvcc flags in flash attn
    
    * Support nvcc flags in flash attn
    
    * Support nvcc flags in flash attn
    
    * Fixes
    
    * Fixes
    
    * Fix the tests
    
    * Fix the tests
    EricLBuehler authored May 15, 2024
    Configuration menu
    Copy the full SHA
    83a9e88 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4e82fab View commit details
    Browse the repository at this point in the history

Commits on May 16, 2024

  1. Configuration menu
    Copy the full SHA
    37cafcc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5892fac View commit details
    Browse the repository at this point in the history

Commits on May 18, 2024

  1. Configuration menu
    Copy the full SHA
    9b151f5 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2024

  1. Remove candle-layer-norm (#6)

    * Support flash-attn in quantized phi3. (huggingface#2194)
    
    * Use flash-attn in gemma. (huggingface#2195)
    
    * Use flash-attn in gemma.
    
    * Fix flash-attn for head dim 256.
    
    * Remove candle-layer-norm
    
    ---------
    
    Co-authored-by: Laurent Mazare <[email protected]>
    EricLBuehler and LaurentMazare authored May 19, 2024
    Configuration menu
    Copy the full SHA
    ea49ea2 View commit details
    Browse the repository at this point in the history
  2. Merge

    EricLBuehler committed May 19, 2024
    Configuration menu
    Copy the full SHA
    38f8d9e View commit details
    Browse the repository at this point in the history

Commits on May 27, 2024

  1. Merge

    EricLBuehler committed May 27, 2024
    Configuration menu
    Copy the full SHA
    c10fc33 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2024

  1. Configuration menu
    Copy the full SHA
    527ebcc View commit details
    Browse the repository at this point in the history

Commits on May 29, 2024

  1. Configuration menu
    Copy the full SHA
    bfc197b View commit details
    Browse the repository at this point in the history

Commits on May 30, 2024

  1. Configuration menu
    Copy the full SHA
    0c2ac76 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2024

  1. Configuration menu
    Copy the full SHA
    cb3dbc2 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2024

  1. Add a set_dtype method

    EricLBuehler committed Jun 3, 2024
    Configuration menu
    Copy the full SHA
    faa9435 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    462d948 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2024

  1. Configuration menu
    Copy the full SHA
    5c06acd View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2024

  1. Configuration menu
    Copy the full SHA
    696acaa View commit details
    Browse the repository at this point in the history
  2. Implement unfold (#8)

    * Add unfold
    
    * Format
    EricLBuehler authored Jun 9, 2024
    Configuration menu
    Copy the full SHA
    0936406 View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2024

  1. Configuration menu
    Copy the full SHA
    636de1d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f52e234 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2024

  1. Add QTensor::quantize_onto (#12)

    * Add the quantize_onto api
    
    * Take ref
    
    * Clippy
    
    * Format
    
    * Add error checking
    EricLBuehler authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    bb8f6f0 View commit details
    Browse the repository at this point in the history
  2. implement Slice op (huggingface#2260)

    shua authored and EricLBuehler committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    5b04d96 View commit details
    Browse the repository at this point in the history
  3. Fix the fast bf16 gemm cublas kernels. (huggingface#2274)

    * Use flash-attn in gemma.
    
    * Fix for the fast bf16 cublas gemm.
    
    * Fix some clippy lints.
    
    * Fix another lint.
    
    * Proper clippy fix.
    LaurentMazare authored and EricLBuehler committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    f7095bb View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b55b360 View commit details
    Browse the repository at this point in the history
  5. Depth Anything v2 (huggingface#2279)

    * define structs
    
    * construct ResidualConvUnit
    
    * forward() for ResidualConvUnit
    
    * implement FeatureFusionBlock
    
    * implement Scratch
    
    * implement DPTHead
    
    * add identity module
    
    * implement forward for DTPHead
    
    * add get_intermediate_layers to DinoVisionTransformer
    
    * implement DepthAnythingV2
    
    * some minor tweaks
    
    * fix compile errors
    
    * fix var builder prefixes
    
    * setup initial example
    
    * use fixed patch size of 37 (518 / 14)
    
    * debugged until output
    
    * print min and max values
    
    * add some dynamism to the output location
    
    * scale input image
    
    * extract prep function
    
    * extract output path function
    
    * normalize image with magic mean and std
    
    * add spectral coloring
    
    * squeeze in the right place
    
    * make enterpolation optional
    
    * use bail instead of panic
    
    * omit unnecessary Shape call
    
    * remove empty curly braces
    
    * use bail instead of assert
    
    * use vb and pp
    
    * remove closures
    
    * extract config object
    
    * Apply rustfmt.
    
    * Fix some clippy lints.
    
    * More lints.
    
    * Use the array methods.
    
    ---------
    
    Co-authored-by: laurent <[email protected]>
    2 people authored and EricLBuehler committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    08e93a6 View commit details
    Browse the repository at this point in the history
  6. Adding Gemm and ArgMax operators to candle-onnx (huggingface#2231)

    * feat(gemm): implement Gemm operator in candle-onnx
    
    * feat(onnx): Add support for ArgMax operator in candle-onnx
    
    * Apply rustfmt.
    
    * Remove argmax as it was already present.
    
    ---------
    
    Co-authored-by: Laurent <[email protected]>
    2 people authored and EricLBuehler committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    5df1ae2 View commit details
    Browse the repository at this point in the history
  7. Add DINOv2Reg4 + PlantCLEF2024 (huggingface#2293)

    * Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 )
    
    * Remove extra files + update README to download them + remove extra lines
    
    * minor fix (README remove extra spaces)
    
    * minor fix (README: Fix image url)
    
    * Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions )
    
    * Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt'
    
    * Another clippy fix.
    
    ---------
    
    Co-authored-by: x-VEspit <[email protected]>
    Co-authored-by: laurent <[email protected]>
    3 people authored and EricLBuehler committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    0bb678c View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b438cba View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. Patch metal function

    EricLBuehler committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    b7a3e34 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2024

  1. Complete merge

    EricLBuehler committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    c967be9 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. Expose cublas handle

    EricLBuehler committed Jul 26, 2024
    Configuration menu
    Copy the full SHA
    9e09d7f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8b357f6 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Configuration menu
    Copy the full SHA
    2064fb0 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2024

  1. Configuration menu
    Copy the full SHA
    1a48767 View commit details
    Browse the repository at this point in the history
  2. Update docs

    EricLBuehler committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    7bbcf00 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1bf7101 View commit details
    Browse the repository at this point in the history
  4. Rename

    EricLBuehler committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    d6d3d18 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e20d85a View commit details
    Browse the repository at this point in the history
  6. Update sdpa function

    EricLBuehler committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    8d2f32a View commit details
    Browse the repository at this point in the history
  7. Add matmul_alpha

    EricLBuehler committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    9f144d6 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    c830f26 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2024

  1. Add it to mistral

    EricLBuehler committed Aug 5, 2024
    Configuration menu
    Copy the full SHA
    86d0876 View commit details
    Browse the repository at this point in the history
  2. Add it to q llama

    EricLBuehler committed Aug 5, 2024
    Configuration menu
    Copy the full SHA
    8d8889c View commit details
    Browse the repository at this point in the history
  3. Add attention benches

    EricLBuehler committed Aug 5, 2024
    Configuration menu
    Copy the full SHA
    d18eb13 View commit details
    Browse the repository at this point in the history
  4. Fixes

    EricLBuehler committed Aug 5, 2024
    Configuration menu
    Copy the full SHA
    d71b7d7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    412e9f4 View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Simplify things a bit

    EricLBuehler committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    27ca77e View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. Mistral.rs GPTQ dev PR (#14)

    * Add i32 dtype for cpu and cuda, with kernels
    
    * Fix cuda i32
    
    * Fix cpu i32
    
    * Add cuda map impls for i32
    
    * Start to add to metal
    
    * Add the kernels
    
    * Oops
    
    * Fix dtype cast in safetensors
    
    * Oops
    
    * Oops
    
    * Add bf16 to i32 and vice versa casts
    EricLBuehler authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    7ad6494 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. Fix on metal

    EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    6f0e190 View commit details
    Browse the repository at this point in the history
  2. Add the flux model for image generation. (huggingface#2390)

    * Add the flux autoencoder.
    
    * Add the encoder down-blocks.
    
    * Upsampling in the decoder.
    
    * Sketch the flow matching model.
    
    * More flux model.
    
    * Add some of the positional embeddings.
    
    * Add the rope embeddings.
    
    * Add the sampling functions.
    
    * Add the flux example.
    
    * Fix the T5 bits.
    
    * Proper T5 tokenizer.
    
    * Clip encoder path fix.
    
    * Get the clip embeddings.
    
    * No configurable weights in layer norm.
    
    * More weights related fixes.
    
    * Yet another shape fix.
    
    * DType fix.
    
    * Fix a couple more shape issues.
    
    * DType fixes.
    
    * Fix the latent dims.
    
    * Fix more shape issues.
    
    * Autoencoder fixes.
    
    * Get some generations out.
    
    * Bugfix.
    
    * T5 padding.
    
    * Clippy fix.
    
    * Add the decode only mode.
    
    * Fix.
    
    * More fixes.
    
    * Finally get some generations to work.
    
    * Add readme.
    LaurentMazare authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    ec55f58 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0a146d7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0f55c37 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    aef4eba View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c301efa View commit details
    Browse the repository at this point in the history
  7. add models support and example for THUDM/glm-4 (huggingface#2362)

    * add models support and example for THUDM/glm-4
    
    * fix the ci report
    
    * fmt
    
    * fix
    
    * Update README.org
    
    * Update README.org
    
    * fmt
    
    * Update README.org
    
    * README.md add codegeex4
    
    * README.md add glm4
    
    * Typo.
    
    * change expect into ?
    
    ---------
    
    Co-authored-by: Laurent Mazare <[email protected]>
    2 people authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    fd0e933 View commit details
    Browse the repository at this point in the history
  8. Add the MMDiT model of Stable Diffusion 3 (huggingface#2397)

    * add mmdit of stable diffusion 3
    
    lint
    
    add comments
    
    * correct a misplaced comment
    
    * fix cargo fmt
    
    * fix clippy error
    
    * use bail! instead of assert!
    
    * use get_on_dim in splitting qkv
    Czxck001 authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    f8e2b36 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    0e78d29 View commit details
    Browse the repository at this point in the history
  10. fix: usage of actions/checkout@v2 (huggingface#2403)

    * chore: changes from formatting on save
    
    * fix: usage of `actions/checkout@v2`
    hamirmahal authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    1b796b9 View commit details
    Browse the repository at this point in the history
  11. Fix issues in the encodec example README.md (huggingface#2407)

    Also squeeze the first dimension of the codes tensor in the example file to get the expected three dimensions.
    jnises authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    c9cdd54 View commit details
    Browse the repository at this point in the history
  12. Soft Non-Maximum Suppression (huggingface#2400)

    * Soft NMS with thresholds
    
    * NMS Test
    
    * Soft nms w/ boxes removed below threshold
    
    * Soft nms test
    
    * No longer removing bounding boxes to fit Soft-NMS focus
    
    * Initialize confidence
    
    * Added comments
    
    * Refactored out updating based on IOU/sigma
    
    * Score_threshold -> confidence_threshold for clarity
    
    * Remove bboxes below confidence threshold
    
    * Softnms basic functionality test
    
    * Softnms confidence decay test
    
    * Softnms confidence threshold test
    
    * Softnms no overlapping bbox test
    
    * Testing confidence after no overlap test
    
    * Single bbox and no bbox tests
    
    * Signify test completion
    
    * Handling result of test functions
    
    * Checking all pairs of bboxes instead of a forward pass
    
    * Equal confidence overlap test
    
    * Clarified tests for implementation
    
    * No longer dropping boxes, just setting to 0.0
    
    * Formatted w/ cargo
    onichmath authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    283a5cf View commit details
    Browse the repository at this point in the history
  13. Add documentation examples for Tensor::i and Tensor::narrow metho…

    …ds (huggingface#2308)
    
    * Add documentation examples for `Tensor` methods
    
    * Apply fmt.
    
    * Cosmetic tweaks.
    
    ---------
    
    Co-authored-by: Laurent <[email protected]>
    2 people authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    de719a2 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    2e72a3d View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d7a9bd0 View commit details
    Browse the repository at this point in the history
  16. Clippy fixes. (huggingface#2415)

    * Clippy fixes.
    
    * Bump the web_sys required version.
    LaurentMazare authored and EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    3d40ffc View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    c5c5d49 View commit details
    Browse the repository at this point in the history
  18. Build fixes

    EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    2386e4e View commit details
    Browse the repository at this point in the history
  19. Merge branch 'sdpa'

    EricLBuehler committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    a38053f View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2024

  1. Add GGUF BF16 support (#17)

    * Add GGUF bf16 type support
    
    * Add non avx impl for vec_dot_bf16
    
    * Fix from_u32
    
    * Fix loading
    
    * Fix dequant of bf16
    EricLBuehler authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    1b1974e View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2024

  1. Configuration menu
    Copy the full SHA
    36bd9f9 View commit details
    Browse the repository at this point in the history
  2. Complete merge

    EricLBuehler committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    6fbddd6 View commit details
    Browse the repository at this point in the history
  3. Add softcapping support to flash attention (#18)

    * Expose the softcap methods
    
    * Add some tests
    
    * Fix generics
    EricLBuehler authored Aug 22, 2024
    Configuration menu
    Copy the full SHA
    f706ef2 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Update kernels for metal bf16 (#19)

    * Update kernels for metal bf16
    
    * Fix typo
    
    * Check if have bfloat
    EricLBuehler authored Sep 2, 2024
    Configuration menu
    Copy the full SHA
    3c8e120 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. Configuration menu
    Copy the full SHA
    014f140 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2024

  1. Configuration menu
    Copy the full SHA
    f317df8 View commit details
    Browse the repository at this point in the history
  2. onnx: workaround pow with negative base (huggingface#2439)

    * onnx: workaround pow with negative base
    
    rather than fully defining pow in the cpu backend (as in huggingface#2318),
    this implements a much smaller change which is sufficient to evaluate silero-vad
    onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so
    evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`.
    
    * PR: use Tensor::powf insead
    
    powf correctly handles a negative base.
    shua authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    8a9d2be View commit details
    Browse the repository at this point in the history
  3. onnx: support negative index in Gather (huggingface#2440)

    index_select does not support negative indexing, but
    this change adds just enough workarounds in onnx to
    allow evaluating silero-vad models (which make use of
    negative indices).
    shua authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    a7142d3 View commit details
    Browse the repository at this point in the history
  4. silero-vad v5 example (huggingface#2321)

    * silero-vad v5 example
    
    This change adds an example of how to run silero-vad v5
    
    * PR: rename 'vad' to 'silero-vad'
    
    * Update README.md
    
    ---------
    
    Co-authored-by: Laurent Mazare <[email protected]>
    2 people authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    f62d7e8 View commit details
    Browse the repository at this point in the history
  5. Fix for parler-tts, do not add the last slice of padding tokens. (hug…

    …gingface#2442)
    
    * Fix for parler-tts, do not add the last slice of padding tokens.
    
    * Support for the mini model.
    LaurentMazare authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    ceab78e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5b4c593 View commit details
    Browse the repository at this point in the history
  7. fix: qwen2 lm_head loading huggingface#2443 (huggingface#2445)

    Co-authored-by: Yi Xu <[email protected]>
    2 people authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    ef9649c View commit details
    Browse the repository at this point in the history
  8. Update cudarc to 0.12. (huggingface#2451)

    * Update cudarc to 0.12.
    
    * Some cudnn tweaks.
    LaurentMazare authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    7412bd0 View commit details
    Browse the repository at this point in the history
  9. FastViT fixes. (huggingface#2452)

    * correct optional SE layer dimensions.
     * head_dim instead of num_heads is 32.
     * update test example output.
    janimo authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    8e39086 View commit details
    Browse the repository at this point in the history
  10. MobileCLIP models S1 and S2 (huggingface#2454)

    * Allow loading images with given std and mean
    
    * OpenCLIP text encoder component
    
    * Two MobileCLIP models
    
    * Clippy fixes.
    
    ---------
    
    Co-authored-by: Laurent <[email protected]>
    2 people authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    8632a2f View commit details
    Browse the repository at this point in the history
  11. Fix FLUX.1 weights (huggingface#2457)

    * fix FLUX.1 weights
    
    * added flux1-dev.safetensors
    eugenehp authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    f492c04 View commit details
    Browse the repository at this point in the history
  12. Clippy fixes for 1.81.0. (huggingface#2461)

    * Clippy fixes for 1.81.0.
    
    * Another fix.
    LaurentMazare authored and EricLBuehler committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    91e0c6e View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2024

  1. Improve candle_core::Error to make it more ergonomic (#21)

    * Bump the version to 0.6.1. (huggingface#2438)
    
    * onnx: workaround pow with negative base (huggingface#2439)
    
    * onnx: workaround pow with negative base
    
    rather than fully defining pow in the cpu backend (as in huggingface#2318),
    this implements a much smaller change which is sufficient to evaluate silero-vad
    onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so
    evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`.
    
    * PR: use Tensor::powf insead
    
    powf correctly handles a negative base.
    
    * onnx: support negative index in Gather (huggingface#2440)
    
    index_select does not support negative indexing, but
    this change adds just enough workarounds in onnx to
    allow evaluating silero-vad models (which make use of
    negative indices).
    
    * silero-vad v5 example (huggingface#2321)
    
    * silero-vad v5 example
    
    This change adds an example of how to run silero-vad v5
    
    * PR: rename 'vad' to 'silero-vad'
    
    * Update README.md
    
    ---------
    
    Co-authored-by: Laurent Mazare <[email protected]>
    
    * Fix for parler-tts, do not add the last slice of padding tokens. (huggingface#2442)
    
    * Fix for parler-tts, do not add the last slice of padding tokens.
    
    * Support for the mini model.
    
    * Add FastViT model. (huggingface#2444)
    
    * fix: qwen2 lm_head loading huggingface#2443 (huggingface#2445)
    
    Co-authored-by: Yi Xu <[email protected]>
    
    * Update cudarc to 0.12. (huggingface#2451)
    
    * Update cudarc to 0.12.
    
    * Some cudnn tweaks.
    
    * FastViT fixes. (huggingface#2452)
    
    * correct optional SE layer dimensions.
     * head_dim instead of num_heads is 32.
     * update test example output.
    
    * MobileCLIP models S1 and S2 (huggingface#2454)
    
    * Allow loading images with given std and mean
    
    * OpenCLIP text encoder component
    
    * Two MobileCLIP models
    
    * Clippy fixes.
    
    ---------
    
    Co-authored-by: Laurent <[email protected]>
    
    * Fix FLUX.1 weights (huggingface#2457)
    
    * fix FLUX.1 weights
    
    * added flux1-dev.safetensors
    
    * Clippy fixes for 1.81.0. (huggingface#2461)
    
    * Clippy fixes for 1.81.0.
    
    * Another fix.
    
    * Make Error::msg more in line with anyhow::Error::msg
    
    * Add context trait
    
    * Even more flexible
    
    * Format
    
    ---------
    
    Co-authored-by: Laurent Mazare <[email protected]>
    Co-authored-by: shua <[email protected]>
    Co-authored-by: Jani Monoses <[email protected]>
    Co-authored-by: ilookee <[email protected]>
    Co-authored-by: Yi Xu <[email protected]>
    Co-authored-by: Eugene Hauptmann <[email protected]>
    7 people authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    ad84486 View commit details
    Browse the repository at this point in the history
  2. Add API to get current device seed (#22)

    * Add api to get current seed
    
    * Remove cell for rwlock
    EricLBuehler authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    7f5a470 View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. Configuration menu
    Copy the full SHA
    9240d03 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8a99f7c View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2024

  1. Add the i16 dtype (2) (#26)

    * Add the i16 dtype
    
    * Added I16 and I32 to fix the missing arms issue (candle-onnx/eval)
    
    * Update rust-ci.yml
    
    * Update ci_cuda.yaml
    
    * fmt adjustment
    
    * Revert "Update rust-ci.yml"
    
    This reverts commit f659d36.
    
    * Revert "Update ci_cuda.yaml"
    
    This reverts commit 62a4b39.
    ro99 authored Sep 15, 2024
    Configuration menu
    Copy the full SHA
    9e31a19 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2024

  1. Configuration menu
    Copy the full SHA
    d08212c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c04861d View commit details
    Browse the repository at this point in the history
  3. Fix dtype cast

    EricLBuehler committed Oct 2, 2024
    Configuration menu
    Copy the full SHA
    156ebd1 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. Fix set_dtype

    EricLBuehler committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    20a57c4 View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2024

  1. Configuration menu
    Copy the full SHA
    121bdfd View commit details
    Browse the repository at this point in the history