rdspring1

Ryan Spring rdspring1

I contribute to PyTorch, Lightning-AI Thunder, and Nvidia/Fuser.

90 followers · 44 following

Achievements

x3 x3

Achievements

x3 x3

Organizations

Pinned Loading

NVIDIA/Fuser Public

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 314 55
RUSH-LAB/LSH_Memory Public

One-Shot Learning using Nearest-Neighbor Search (NNS) and Locality-Sensitive Hashing LSH

Python 73 16
PyTorch_GBW_LM Public

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Python 123 20
Count-Sketch-Optimizers Public

A compressed adaptive optimizer for training large-scale deep learning models using PyTorch

Python 27 13
LSH-Mutual-Information Public

Use LSH Sampling for Mutual Information Estimation

Python 5
lightning-thunder Public

Forked from Lightning-AI/lightning-thunder

Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.

Python

536 contributions in the last year

Learn how we count contributions

Less

Activity overview

Contributed to NVIDIA/Fuser, Lightning-AI/lightning-thunder, rdspring1/lightning-thunder and 1 other repository

Contribution activity

March 2025

Created 6 commits in 1 repository

NVIDIA/Fuser 6 commits

Created a pull request in NVIDIA/Fuser that received 9 comments

Mar 18

Enforce shared memory alignment for TMA LoadStoreOps

This PR enforces the bytes alignment requirements for TMA LoadStoreOps, which prevents IMA and incorrect results. If TMA LoadStoreOp is not detecte…

+51 −6 lines changed • 9 comments

Opened 4 other pull requests in 1 repository

NVIDIA/Fuser 3 open 1 merged

Add silu and bias epilogue matmul tests
This contribution was made on Mar 18
[RFC] Create a basic binding for CPP Fusion in python frontend using AI Coding Tools
This contribution was made on Mar 14
Load Epilogue Inputs with LdMatrix in Hopper Matmul Scheduler
This contribution was made on Mar 13
Enable hard-coded index for LdMatrix and create basic copy tutorial
This contribution was made on Mar 7

Reviewed 21 pull requests in 1 repository

NVIDIA/Fuser 21 pull requests

Deprecate ParallelType::MisalignedVectorize
This contribution was made on Mar 24
Introduce use_stmatrix parameter to MatmulParams
This contribution was made on Mar 19
Change c10::irange to iota, part 1
This contribution was made on Mar 19
Load Epilogue Inputs with LdMatrix in Hopper Matmul Scheduler
This contribution was made on Mar 19
Mark supports_segmentation=False in test_issue1273
This contribution was made on Mar 17
Add Blackwell MMA macros
This contribution was made on Mar 17
Check that warps are only accessing the subpartition of TMem that it can access
This contribution was made on Mar 14
indexAccumulate python api
This contribution was made on Mar 14
TMem check the stride of outer dims
This contribution was made on Mar 14
add register count checks for warp specialization with register sharing
This contribution was made on Mar 14
Fix C++23 backport of zip and enumerate
This contribution was made on Mar 13
Make Hopper mma tests sparse
This contribution was made on Mar 12
register sharing, add launch bound, disable tests with illegal paras
This contribution was made on Mar 12
Indexing for TMem ld and st
This contribution was made on Mar 11
Tensor memory 32x32b data path pattern matching
This contribution was made on Mar 11
Enable hard-coded index for LdMatrix and create basic copy tutorial
This contribution was made on Mar 11
redo register sharing PR-3972
This contribution was made on Mar 10
Translate MatmulOp and LinearOp on Hopper without AxisMapping
This contribution was made on Mar 10
Automatically save MatmulParams in extra_info in benchmarks
This contribution was made on Mar 7
Update Hopper default matmul heuristic
This contribution was made on Mar 6
format toString for SetMaxNReg and Return
This contribution was made on Mar 3

Opened 1 issue in 1 repository

NVIDIA/Fuser 1 open

Refactor IndexLowering::handle(const LoadStoreOp* ldst)
This contribution was made on Mar 11

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ryan Spring rdspring1

Achievements

Achievements

Organizations

Block or report rdspring1

Pinned Loading

536 contributions in the last year

Activity overview

Contribution activity

March 2025

Created a pull request in NVIDIA/Fuser that received 9 comments

Enforce shared memory alignment for TMA LoadStoreOps

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat