torch.compile and profile #295

mahmoudhas · 2025-06-26T14:17:30Z

Refactors the pipeline to use standalone functions for different steps and applies torch.compile where possible. Observed the following improvements using torch.compile:

the UNet forward pass went down from 200 ms to 100 ms (per denoising step)
the VAE forward pass went down from 250 ms to 110 ms (per inference chunk)
The ImageProcessor (face and mask detection) time went down from 290 ms to 230 ms (per inference chunk)

The compilation takes around 1 minute.

TODO for production readiness:

re-use the python process to serve multiple requests, saving compilation time.
remove the torch profiler
pad the last inference chunk so it doesn't trigger a recompilation

torch.compile and profile

b0c97aa

Deecash1 approved these changes Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.compile and profile #295

torch.compile and profile #295

Uh oh!

mahmoudhas commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

torch.compile and profile #295

Are you sure you want to change the base?

torch.compile and profile #295

Uh oh!

Conversation

mahmoudhas commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants