-
Notifications
You must be signed in to change notification settings - Fork 498
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Improve FP8 BMM heuristic for large shapes and MoE E2E performance
cla signed
fb-exported
#3344
opened Nov 8, 2024 by
jiawenliu64
Loading…
Add new optimizer state
row_counter
for Adam [Backend]
cla signed
fb-exported
#3342
opened Nov 8, 2024 by
spcyppt
Loading…
Remove unused-variable in /fbcode/deeplearning/fbgemm/fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_table_batched_embeddings.cpp
cla signed
fb-exported
#3335
opened Nov 6, 2024 by
r-barnes
Loading…
Fix global namespace pollution in ATen/Dispatch.h
cla signed
fb-exported
#3334
opened Nov 6, 2024 by
slyfox3
Loading…
open-source SLL jagged_dense_bmm
cla signed
fb-exported
#3331
opened Nov 5, 2024 by
brad-mengchi
Loading…
Remove unused-variable in some generated code
cla signed
fb-exported
#3327
opened Nov 5, 2024 by
r-barnes
Loading…
Add support for
int32_t
indices in TBE training (2/N)
cla signed
fb-exported
#3326
opened Nov 5, 2024 by
q10
Loading…
Add template info into generated files
cla signed
fb-exported
#3325
opened Nov 5, 2024 by
q10
Loading…
- Kernel support for multiple buckets per rank
cla signed
fb-exported
#3323
opened Nov 5, 2024 by
dstaay-fb
Loading…
Update benchmark test for
Int32_t
Indicies
cla signed
fb-exported
#3317
opened Nov 3, 2024 by
q10
Loading…
Unitifed Prefetching API for CPU TBE
cla signed
fb-exported
#3314
opened Nov 2, 2024 by
excelle08
Loading…
Allow registering SSD prefetcher after initiliaztion of the TBE module
cla signed
fb-exported
#3313
opened Nov 2, 2024 by
excelle08
Loading…
Store the prefetching results in a queue in SSD prefetcher, retrieve later in .forward() method
cla signed
fb-exported
#3312
opened Nov 2, 2024 by
excelle08
Loading…
Add manual loop unroll for rocm devices in fwd pass
cla signed
module: rocm
#3309
opened Nov 1, 2024 by
avbokovoy
Loading…
Reorganize sparse block bucketize macros
cla signed
fb-exported
#3296
opened Oct 30, 2024 by
q10
Loading…
Refactor repeat code in sparse_block_bucketize
cla signed
fb-exported
#3295
opened Oct 30, 2024 by
q10
Loading…
Add large my_size support in _block_bucketize_pooled_sparse_features_cuda_kernel2
cla signed
fb-exported
#3294
opened Oct 30, 2024 by
sryap
Loading…
Use
c10::irange
in deeplearning/fbgemm/BUCK +10
cla signed
fb-exported
#3288
opened Oct 29, 2024 by
q10
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.