Missing modalities are a common challenge in real-world multimodal learning scenarios, occurring during both training and testing. Existing methods for managing missing modalities often require the design of separate prompts for each modality or missing case, leading to complex designs and a substantial increase in the number of parameters to be learned. As the number of modalities grows, these methods become increasingly inefficient due to parameter redundancy. To address these issues, we propose Evidence-based Parameter-Efficient Prompting (EPE-P), a novel and parameter-efficient method for pretrained multimodal networks. Our approach introduces a streamlined design that integrates prompting information across different modalities, reducing complexity and mitigating redundant parameters. Furthermore, we propose an Evidence-based Loss function to better handle the uncertainty associated with missing modalities, improving the model’s decision-making.
- Download the datasets by the instruction
vilt/datasets/DATA.md
and preprocess them bymake_arrow.py
. - Download the pre-trained baseline model ViLT weights from here.
- Prepare for the environment
pip install -r requirements.txt
. - Run parameters are displayed in
vilt/config.py
, then runpython run.py --[parameters_chosen]
. - For testing, set
test_only=True
which isFalse
in default for training.
Our code is based on missing-aware-prompts (MAP) and Evidential Deep Learning (EDL).