Add CNNs like ResNets, ConvNeXt, etc.

Thank the authors and contributors for your novel, inspiring works. These works are well-presented. 

Here is my question: I just wondered is there any possibility that SAE can be used to interpret the intermediate layer activations of sheer vision models like ResNet? 

Compared to ViT, ResNet activations are far less discriminative, or to put it another way, far simpler. But as the authors found, “SAEs enable systematic observation of learned features, revealing fundamental differences between models”. ResNet should have its own traits, and what would it be like if we use SAE to understand it?

Additionally, I noticed that SAE should be trained on very large scale datasets, but why? If we train SAE only on very small scale datasets, say CIFAR-100, what would it be like?

(I understand that text-modal supervison tends to be important, so this is just a question out of nowhere and general or coarse-grained answers would be fine. Hope it does not look silly :D)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CNNs like ResNets, ConvNeXt, etc. #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add CNNs like ResNets, ConvNeXt, etc. #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions