Time IDs in SDXL training #9459

christopher5106 · 2024-09-18T09:12:21Z

christopher5106
Sep 18, 2024

I was trying to understand these time ids in SDXL training script. It's an array for each example, that could be [1024, 1034, 0, 10, 1024, 1024]. It's directly fed into Timesteps forward. I'm just wondering what's happening here, how each of these integers original_height, original_width, crop_coord_top, crop_coord_left, target_resolution, target_resolution get a kind of exponential embedding... and what is the expected output of this, there is something I'm missing, thanks for your help !

asomoza · 2024-09-18T16:40:16Z

asomoza
Sep 18, 2024
Maintainer

cc: @sayakpaul

0 replies

christopher5106 · 2024-09-18T20:48:15Z

christopher5106
Sep 18, 2024
Author

Important to me, because I'm training a classifier on these latents, for classifier guidance, but I can only train it on crops (because i don't have full data). Then I need to be sure that when I apply the classifier to the whole image, time ids will be the same as in the crop if at the same place in the image.
From reading the code, I can't infer what they do exactly.

0 replies

sayakpaul · 2024-09-19T02:24:41Z

sayakpaul
Sep 19, 2024
Maintainer

These are micro-conditions that were introduced in the SDXL paper. You could read more about them in the Section 2.2 of the paper: https://arxiv.org/abs/2307.01952.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time IDs in SDXL training #9459

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Time IDs in SDXL training #9459

christopher5106 Sep 18, 2024

Replies: 3 comments

asomoza Sep 18, 2024 Maintainer

christopher5106 Sep 18, 2024 Author

sayakpaul Sep 19, 2024 Maintainer

christopher5106
Sep 18, 2024

asomoza
Sep 18, 2024
Maintainer

christopher5106
Sep 18, 2024
Author

sayakpaul
Sep 19, 2024
Maintainer