Training code for audio_encoder & connector

Hi VITA team,

Thanks for open sourcing this - I've learnt a bunch from it.

<img width="644" alt="Image" src="https://github.com/user-attachments/assets/439f83cd-d5f9-48a7-8871-5128a36924e6" />

Do you have the training code for how you trained the audio encoder and connector (doesn't have to be neat - can just be a code dump of whatever you have)? Trying to reproduce but having trouble. Have questions like - Did you align audio with Qwen by freezing Qwen and only training the encoder or connector? Or did you fine-tune some of the Qwen model to align with the encoder or connector.

It seems like all the scripts freeze the audio_encoder so I'm assuming it's not in the repo.

<img width="423" alt="Image" src="https://github.com/user-attachments/assets/22a59d59-07d2-4ce0-a620-cf084dce582f" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training code for audio_encoder & connector #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training code for audio_encoder & connector #121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions