-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set the co-occurrence matrix #25
Comments
https://github.com/zhihou7/HOI-CL/blob/master/misc/hoi_to_obj.pkl
https://github.com/zhihou7/HOI-CL/blob/master/misc/hoi_to_vb.pkl
The two files. By the way, it is from the dataset, not including all reasonable HOI categories.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: rouge012 ***@***.***>
Sent: Tuesday, November 29, 2022 12:26:50 AM
To: zhihou7/HOI-CL ***@***.***>
Cc: Zhi Hou ***@***.***>; Mention ***@***.***>
Subject: Re: [zhihou7/HOI-CL] hi @rouge012, (Issue #25)
Thanks for the quick and detailed clarification, and I am wondering where I can find the code for setting the co-occurrence matrix. Thank you!
—
Reply to this email directly, view it on GitHub<#25 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQGPLYBHGJILVUKTBDOVERTWKSXJVANCNFSM6AAAAAASNKSQMM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi! When I run the tools/Train_ATL_HICO.py. I got the below error: Please help. |
Hi @rouge012, Thanks for your comments. It seems like because the released code base is a bit different from my local code in some functions. I have updated it and upload the new code. Feel free to ask if you have further questions. Regards, |
Thank You for the quick response! I had a new error when I run the tools/Train_ATL_HICO.py. : Please help. Thank You in advance. |
Hi, You should download the pre-trained weights as instructed in HOI-CL/misc/download_dataset.sh Line 89 in 85e15d3
and untar it. Regards, |
Thank You for the quick response! I found it is nan after two hundreds of iterations. I didn't download the V-COCO dataset, does it have anything to do with this? |
That's confusing. I use a similar environment to you. cuda/10.0.130, python 3.7.2, tensorflow 1.14.1, V100 16Gb According to your log, it seems like many errors during the optimization. Regards, |
Hi, I'm very interested in co-occurrence matrices, can you elaborate on how he gets them, in fact how the infeasible interactions or combinations are culled, and is the culling strategy learned in class, following the model end to end training? Or do we get the co-occurrence matricest in advance to send to the network, and if so how do we get the co-occurrence matrices? Many thanks. |
Hi @Harzva, For self-compositional learning, we utilize the confidence matrix to build pseudo labels for the composite HOI features to avoid bias to known concepts. If we treat it as a Positive-unlabeled learning approach, self-compositional learning makes use of the unlabeled composite HOI features. Feel free to contact me if you have further questions. |
Thank you for answering the above questions, I still have a few more to ask you, sorry for the inconvenience. from paper"Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL.Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. "Why not just joint training here? Is this stage of multiple is a significant effect improvement? co-occurrence matrices This should consume computational resources, why not set it up a priori? It's just a good way to fix the feasibility matrix that the compostion is 0 or 1. For example, is it now possible to use like gpt2 or 3 to replace the computation of the feasibility judgement? |
Hi @Harzva, In FCL, we directly use the label space to build the concept matrix, that is predefined, but missing a lot of reasonable concepts. Therefore, in the last paper, we introduce to discover the reasonable concepts. L_{cl} L_hoi, L_hoi_sp are three binary losses because the labels are multi-hot. 117 dimension for verb categories. For the optimization step, the multiple step strategy is just for the long-tailed HOI detection method. As I metioned in the paper, it is difficult to train the network to achieve a better result (it does mean one-step does not work). From my current point, it is quite tricky. Frankly speaking, I think it is because I was too naive at that moment. For the zero-shot HOI detection, we observe one-step is better. You are right. Frankly speaking, I recently suffer from this question a lot. If we just want to achieve a good occurrence matrix, I think the large language model is a good way to complete the co-occurrence matrix. GPT is amazingly strong! I even doubt a lot of vision problems are meaningless after the GPT emerges. But mining the knowledge from pure visual data is also valuable for developing or understanding deep neural networks. From the perspective of learning (judge the perception ability of neural networks), I think it is valuable to complete the co-occurrence matrix from the visual data only since human beings do not infer the reasonable concepts from prior knowledge, but reason it by the object similarity or something like that. Thanks for your questions. feel free to ask if you have further questions. |
Yes,as far as I know, there is no method to judge the feasibility of combinations using pure visual information in Compositional Zero-Shot Learning, and most methods borrow NLP techniques to determine feasibility. I think your work is also very meaningful and opens up another technical route.it‘s a great inspiration to me. However, you can also try to take advantage of NLP techniques in HOI, especially the latest and most effective ones such as the GPT series. If your technical route is based on pure visual information, once you incorporate multimodal information like CLIP or NLP techniques, you can use the best GPT models available.Or don't use it, just use the best one. |
Yes. Thanks for your comment. I think it is valuable to mine knowledge from pure visual information because the knowledge base of LLM is also largely from the visual world but extracted by human beings. |
the co-occurrence matrix $A\in R^{N_v \times N_o} $ is a two dimension matrix, where$N_v$ indicates the length of verb categories and $N_o$ indicate the length of object categories. We can initialize $A$ as a zero matrix. For each object, there are annotated verbs. We can set the corresponding position of the matrix $A$ as 1. For each example, if the apple is combinable with "eat", "cut" in the dataset, we set corresponding position of and in $A$ as 1.
Feel free to post if you have further questions
Regards,
Originally posted by @zhihou7 in #4 (comment)
The text was updated successfully, but these errors were encountered: