Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments with video tokenization. #37

Open
NilanEkanayake opened this issue Oct 15, 2024 · 2 comments
Open

Experiments with video tokenization. #37

NilanEkanayake opened this issue Oct 15, 2024 · 2 comments

Comments

@NilanEkanayake
Copy link

NilanEkanayake commented Oct 15, 2024

I made some changes to the model (3D convs) and trained the small one with 128 tokens on 128p 16-frame videos pre-compressed with CogvideoX's VAE and MSE loss.
Turned out better than I expected considering how fast the training was on consumer hardware (couple hours).

There's a lot of potential here, and I think I can improve the performance a lot further.
Untitled
Untitled-1
Untitled-2
Untitled-5
Untitled-4

@tanzheen
Copy link

@NilanEkanayake I would be extremely interested in this because I am currently trying to tokenise sign language videos to input into an LLM here for translation tasks!

@NilanEkanayake
Copy link
Author

NilanEkanayake commented Oct 15, 2024

@NilanEkanayake I would be extremely interested in this because I am currently trying to tokenise sign language videos to input into an LLM here for translation tasks!

It compresses fixed-length videos, so not sure how well it would work for that. You'd have to string multiple tokenized videos together depending on the length of the input.

You might have better luck training a custom model from scratch, where the model takes in the videos and produces a translation, instead of using an LLM with a video tokenizer on top.

Have you tried using pose estimation methods to feed to the LLM instead? Would bypass the tokenizer quality and be a lot more flexible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants