Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling on the GPU for as long as possible #626

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented Jul 25, 2024

Currently, we apply all sampling:

  • Sequentially
  • On the CPU

This is super slow. This PR is going to refactor the sampling system to do as much sampling work on the GPU, in parallel, as much as possible until we need to copy the final token & logprobs to the CPU. Only then is the final GPU <> CPU sync done.

Copy link

github-actions bot commented Jul 25, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   11          102          101            0            1
 Python                 41         1586         1368           46          172
 TOML                   19          564          498           11           55
-------------------------------------------------------------------------------
 Jupyter Notebooks       2            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               24         1832            0         1382          450
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 6          407          364           19           24
 |- TOML                 2           75           63            0           12
 (Total)                           2519          619         1401          499
-------------------------------------------------------------------------------
 Rust                  168        54909        49845          983         4081
 |- Markdown            90          850           13          787           50
 (Total)                          55759        49858         1770         4131
===============================================================================
 Total                 270        59504        52234         2422         4848
===============================================================================
  

@EricLBuehler
Copy link
Owner Author

Pending some resolution of huggingface/candle#2361, otherwise we still have to do a huge GPU <> CPU sync early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant