about training 240 frames~ #97

WuZhongQing · 2024-10-23T08:46:34Z

hi, thanks for your open source again. i just find there has no difference between 16 frames yaml and 61 frames yaml except sc_attn_index, so i'm wondering that if i can training 240 frames just change model sc_attn_index ? Looking forward to your reply ~

flymin · 2024-10-23T09:42:29Z

It is possible. Actually, we are limited by GPU memory (80G A800), so we only train up to 60 frames.

WuZhongQing · 2024-10-23T09:46:46Z

It is possible. Actually, we are limited by GPU memory (80G A800), so we only train up to 60 frames.

thanks for your reply, and have you ever think about reduce the requirement of GPU memory ?

WuZhongQing · 2024-10-23T09:57:12Z

and another question is did you ever think about use SVD to generate video ? is the quality of SVD generated video not good enough ?

flymin · 2024-10-23T10:38:34Z

thanks for your reply, and have you ever think about reduce the requirement of GPU memory ?

We've done a lot to save GPU memory. You may check the details of our implementation.

and another question is did you ever think about use SVD to generate video ? is the quality of SVD generated video not good enough ?

Currently, you can refer to Vista, which is based on SVD but without fine-grained controllability. In our new work, we will discuss the related problem. The new paper will come out soon. Stay tuned.

WuZhongQing · 2024-10-24T01:36:37Z

thanks for your reply, and have you ever think about reduce the requirement of GPU memory ?

We've done a lot to save GPU memory. You may check the details of our implementation.

and another question is did you ever think about use SVD to generate video ? is the quality of SVD generated video not good enough ?

Currently, you can refer to Vista, which is based on SVD but without fine-grained controllability. In our new work, we will discuss the related problem. The new paper will come out soon. Stay tuned.

thanks a lot.

WuZhongQing · 2024-10-24T01:48:41Z

It is possible. Actually, we are limited by GPU memory (80G A800), so we only train up to 60 frames.

and can i ask what's the difference between video generate and image generate ? just increase the number of batch size ?

flymin · 2024-10-24T02:46:58Z

and can i ask what's the difference between video generate and image generate ? just increase the number of batch size ?

They are fundamentally different. Images are 2D, but videos are 3D (with temporal dim). From the resources' perspective, one simple example is that, many high-res image generations only support training with batch size=1. The training/inference consumption of video can easily explode, and the model needs to gain more capability.

WuZhongQing · 2024-10-24T03:03:08Z

and can i ask what's the difference between video generate and image generate ? just increase the number of batch size ?

They are fundamentally different. Images are 2D, but videos are 3D (with temporal dim). From the resources' perspective, one simple example is that, many high-res image generations only support training with batch size=1. The training/inference consumption of video can easily explode, and the model needs to gain more capability.

thanks for your answer, and i found that you didn't use any image to generate latents (you just use bev) to generate video, and my questions is that how about use 8 images + 8 random latents to generate 16 frames video, can this help to increase the continuity in time to generate long video ?

flymin · 2024-10-24T05:07:21Z

thanks for your answer, and i found that you didn't use any image to generate latents (you just use bev) to generate video, and my questions is that how about use 8 images + 8 random latents to generate 16 frames video, can this help to increase the continuity in time to generate long video ?

I think you want to ask about future frame prediction. This can be thought of as a downstream task of the video generation model. There are some inference tricks to do so, similar to image inpainting. Anyway, it relies on the ability of the video generation model.

WuZhongQing · 2024-10-24T05:55:21Z

thanks for your answer, and i found that you didn't use any image to generate latents (you just use bev) to generate video, and my questions is that how about use 8 images + 8 random latents to generate 16 frames video, can this help to increase the continuity in time to generate long video ?

I think you want to ask about future frame prediction. This can be thought of as a downstream task of the video generation model. There are some inference tricks to do so, similar to image inpainting. Anyway, it relies on the ability of the video generation model.

thank you very much

WuZhongQing · 2024-10-24T08:04:17Z

thanks for your answer, and i found that you didn't use any image to generate latents (you just use bev) to generate video, and my questions is that how about use 8 images + 8 random latents to generate 16 frames video, can this help to increase the continuity in time to generate long video ?

I think you want to ask about future frame prediction. This can be thought of as a downstream task of the video generation model. There are some inference tricks to do so, similar to image inpainting. Anyway, it relies on the ability of the video generation model.

thank you very much

Could you please offer me some help about those inferences tricks, or give some link, i want to try ~

github-actions · 2024-10-31T16:45:13Z

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

WuZhongQing changed the title ~~about traing 220 frames~~~ about training 220 frames~ Oct 23, 2024

WuZhongQing changed the title ~~about training 220 frames~~~ about training 240 frames~ Oct 23, 2024

github-actions bot added the stale label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about training 240 frames~ #97

about training 240 frames~ #97

WuZhongQing commented Oct 23, 2024 •

edited

Loading

flymin commented Oct 23, 2024

WuZhongQing commented Oct 23, 2024

WuZhongQing commented Oct 23, 2024

flymin commented Oct 23, 2024

WuZhongQing commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

flymin commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024 •

edited

Loading

flymin commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

github-actions bot commented Oct 31, 2024

about training 240 frames~ #97

about training 240 frames~ #97

Comments

WuZhongQing commented Oct 23, 2024 • edited Loading

flymin commented Oct 23, 2024

WuZhongQing commented Oct 23, 2024

WuZhongQing commented Oct 23, 2024

flymin commented Oct 23, 2024

WuZhongQing commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

flymin commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024 • edited Loading

flymin commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

WuZhongQing commented Oct 24, 2024

github-actions bot commented Oct 31, 2024

WuZhongQing commented Oct 23, 2024 •

edited

Loading

WuZhongQing commented Oct 24, 2024 •

edited

Loading