Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set the co-occurrence matrix #25

Open
rouge012 opened this issue Nov 28, 2022 · 15 comments
Open

set the co-occurrence matrix #25

rouge012 opened this issue Nov 28, 2022 · 15 comments

Comments

@rouge012
Copy link

    hi @rouge012,

the co-occurrence matrix $A\in R^{N_v \times N_o} $ is a two dimension matrix, where $N_v$ indicates the length of verb categories and $N_o$ indicate the length of object categories. We can initialize $A$ as a zero matrix. For each object, there are annotated verbs. We can set the corresponding position of the matrix $A$ as 1. For each example, if the apple is combinable with "eat", "cut" in the dataset, we set corresponding position of and in $A$ as 1.

Feel free to post if you have further questions

Regards,

Originally posted by @zhihou7 in #4 (comment)

@rouge012 rouge012 changed the title hi @rouge012, set the co-occurrence matrix Nov 28, 2022
@zhihou7
Copy link
Owner

zhihou7 commented Nov 28, 2022 via email

@rouge012
Copy link
Author

rouge012 commented Dec 4, 2022

Hi! When I run the tools/Train_ATL_HICO.py. I got the below error:
TypeError: head_to_tail_ho() takes 7 positional arguments but 9 were given

Please help.

@zhihou7
Copy link
Owner

zhihou7 commented Dec 5, 2022

Hi @rouge012,

Thanks for your comments. It seems like because the released code base is a bit different from my local code in some functions. I have updated it and upload the new code. Feel free to ask if you have further questions.

Regards,
Zhi Hou

@rouge012
Copy link
Author

rouge012 commented Dec 6, 2022

Thank You for the quick response! I had a new error when I run the tools/Train_ATL_HICO.py. :
ValueError: The passed save_path is not a valid checkpoint: ./Weights/res101_faster_rcnn_iter_1190000

Please help. Thank You in advance.

@zhihou7
Copy link
Owner

zhihou7 commented Dec 6, 2022

Hi, You should download the pre-trained weights as instructed in

gdown 1IbR4kiWgLF8seaKjOMmwaHs0Bfwl5Dq1 -O Weights/res50_faster_rcnn_iter_1190000.ckpt.data-00000-of-00001
into the directory "./Weights"

gdown 0B1_fAEgxdnvJR1N3c1FYRGo1S1U -O Weights/coco_900-1190k.tgz

and untar it.

Regards,

@rouge012
Copy link
Author

rouge012 commented Dec 7, 2022

Why is the total loss nan when I run Train_ATL_HICO.py? I have downloaded the hico_20160224_det dataset. I used Python3.7,Tensorflow 1.14.0 and Cuda11.1
image

Please help. Thank You!

@zhihou7
Copy link
Owner

zhihou7 commented Dec 7, 2022

Hi, does the loss start nan from the beginning or after thousands of iterations? I remember it is not nan in the beginning. Empirically, it is normal if it occasionally appears nan.

image

@rouge012
Copy link
Author

rouge012 commented Dec 7, 2022

Thank You for the quick response! I found it is nan after two hundreds of iterations. I didn't download the V-COCO dataset, does it have anything to do with this?

@zhihou7
Copy link
Owner

zhihou7 commented Dec 8, 2022

That's confusing. I use a similar environment to you.

cuda/10.0.130, python 3.7.2, tensorflow 1.14.1, V100 16Gb

According to your log, it seems like many errors during the optimization.

Regards,

@Harzva
Copy link

Harzva commented Mar 24, 2023

Hi, I'm very interested in co-occurrence matrices, can you elaborate on how he gets them, in fact how the infeasible interactions or combinations are culled, and is the culling strategy learned in class, following the model end to end training? Or do we get the co-occurrence matricest in advance to send to the network, and if so how do we get the co-occurrence matrices? Many thanks.

@zhihou7
Copy link
Owner

zhihou7 commented Mar 24, 2023

Hi @Harzva,
Thanks for your interest. We do not pre-define a co-occurrence matrix. Actually, we learn the co-occurrence matrix from the data. In each iteration during optimization, we can get the predictions of all the composite HOI features. We then use the predictions to update the concept confidence matrix according to the verb and object categories of the composite HOIs. Specifically, we update the concept matrix in a running mean manner, that we keep a matrix to stat the counts of the pairs and average the concept confidence with previous values in each iteration.

For self-compositional learning, we utilize the confidence matrix to build pseudo labels for the composite HOI features to avoid bias to known concepts. If we treat it as a Positive-unlabeled learning approach, self-compositional learning makes use of the unlabeled composite HOI features.

Feel free to contact me if you have further questions.

@Harzva
Copy link

Harzva commented Mar 24, 2023

Thank you for answering the above questions, I still have a few more to ask you, sorry for the inconvenience.
Is the fabricator piece with gan mentioned in the paper as mlp, I see that gan is used in the code, is it that mlp can also have good results. Also how does the concept matrix come about?LCL, Lhoi and Lhoi sp are binary cross entropy losses.Are they all three binary classifications? Is it just a judgement of true or false? I think it's a good idea to add the fake features to minibatch and keep a balanced ratio of fakes to truths, but are these features prepared in advance by fake, or are they trained end-to-end, because as I understand it minibatch data is prepared in advance.

from paper"Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL.Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. "Why not just joint training here? Is this stage of multiple is a significant effect improvement?

co-occurrence matrices This should consume computational resources, why not set it up a priori? It's just a good way to fix the feasibility matrix that the compostion is 0 or 1. For example, is it now possible to use like gpt2 or 3 to replace the computation of the feasibility judgement?

@zhihou7
Copy link
Owner

zhihou7 commented Mar 25, 2023

Hi @Harzva,
Thanks for your questions. We actually do not use adversarial training. We leverage the MLP to generate the object feature, and combined it with verb features to optimize the network jointly. I think this mainly balances the distribution. According to our observation, the quality of generated object features does matter.

In FCL, we directly use the label space to build the concept matrix, that is predefined, but missing a lot of reasonable concepts. Therefore, in the last paper, we introduce to discover the reasonable concepts.

L_{cl} L_hoi, L_hoi_sp are three binary losses because the labels are multi-hot. 117 dimension for verb categories.

For the optimization step, the multiple step strategy is just for the long-tailed HOI detection method. As I metioned in the paper, it is difficult to train the network to achieve a better result (it does mean one-step does not work). From my current point, it is quite tricky. Frankly speaking, I think it is because I was too naive at that moment. For the zero-shot HOI detection, we observe one-step is better.

You are right. Frankly speaking, I recently suffer from this question a lot. If we just want to achieve a good occurrence matrix, I think the large language model is a good way to complete the co-occurrence matrix. GPT is amazingly strong! I even doubt a lot of vision problems are meaningless after the GPT emerges. But mining the knowledge from pure visual data is also valuable for developing or understanding deep neural networks. From the perspective of learning (judge the perception ability of neural networks), I think it is valuable to complete the co-occurrence matrix from the visual data only since human beings do not infer the reasonable concepts from prior knowledge, but reason it by the object similarity or something like that.

Thanks for your questions. feel free to ask if you have further questions.

@Harzva
Copy link

Harzva commented Apr 1, 2023

Yes,as far as I know, there is no method to judge the feasibility of combinations using pure visual information in Compositional Zero-Shot Learning, and most methods borrow NLP techniques to determine feasibility. I think your work is also very meaningful and opens up another technical route.it‘s a great inspiration to me. However, you can also try to take advantage of NLP techniques in HOI, especially the latest and most effective ones such as the GPT series. If your technical route is based on pure visual information, once you incorporate multimodal information like CLIP or NLP techniques, you can use the best GPT models available.Or don't use it, just use the best one.

@zhihou7
Copy link
Owner

zhihou7 commented Apr 3, 2023

Yes. Thanks for your comment. I think it is valuable to mine knowledge from pure visual information because the knowledge base of LLM is also largely from the visual world but extracted by human beings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants