Skip to content

humansensinglab/POET-continual-action-recognition

Repository files navigation

POET: Prompt Offset Tuning for Continual Human Action Adaptation

ECCV 2024, Oral Presentation | Project Page

Authors: Prachi Garg, K J Joseph, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, and Fernando De La Torre

method_poet

Abstract

POET enables users to personalize their experience by adding new action classes efficiently and continually whenever they want.

We demonstrate the efficacy of prompt tuning a significantly lightweight backbone, pretrained exclusively on the base class data. We propose a novel spatio-temporal learnable prompt offset tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks.

We contribute two new benchmarks for our new problem setting in human action recognition: (i) NTU RGB+D dataset for activity recognition, and (ii) SHREC-2017 dataset for hand gesture recognition.

🚀 Release Overview and Updates

⬜ Code for Gesture Recognition benchmark on SHREC 2017, DG-STA graph transformer backbone.

✅ [Jan 3, 2025] Released our 10+1 sets few-shots splits of NTU RGB+D 60 skeleton joints dataset for full reproducibility here.

✅ Released POET training and evaluation code for our Activity Recognition benchmark on NTU RGB+D dataset. We use the CTR-GCN backbone.

✅ Additionally, this release includes (i) the base step model checkpoints, (ii) a few-shot data file.

📌 Note, additional code for adaptation of various baselines and ablations can be made available upon request.

Installation

  1. Create a new conda environment:
conda create -n poet python=3.8
conda activate poet
  1. Install PyTorch and CUDA toolkit:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Install dependencies from requirements.txt:
pip install -r requirements.txt
  1. Clone the repository:
git clone <repository-url>

Dataset Preparation

TL;DR of our continual learning task -

 - We divide the 60 daily action classes in NTU RGB+D skeleton action recognition dataset into 40 base classes and 20 incremental classes (5x4). We train base model with full supervision and initalize prompts.

 - We add 5 new classes to the model sequentially, over 4 continual user sessions, each class trained using only 5 training samples. We fine-tune only expanded classifier and prompt components (prompt pool, prompt keys and query adapter) freezing the rest of the network.

 - Our privacy-aware setting is rehearsal-free and does not store any previous class samples or exemplars. Hence, POET is a prompt tuning only solution which acts like a plug and play into most graph convolutional and graph transformer architectures. 
  1. We had downloaded the NTU RGB+D 60 dataset and preprocessed it following the instructions in the original CTR-GCN repository. Sample few-shot data file is here.

  2. Provide path to data files inside temp_24nov.yaml -> feeder -> data_path and few_shot_data_file variables.

Training

The POET_final_10run.sh script performs incremental learning over 4 user sessions (steps):

  • Step 1: Classes 40-45
  • Step 2: Classes 45-50
  • Step 3: Classes 50-55
  • Step 4: Classes 55-60

To run training:

# Run for a specific few-shot data file
./POET_train.sh 1  # For set1

# Run for multiple sets
./POET_train.sh 1 2 3 4 5 6 7 8 9 10

This script will:

  1. Train on each incremental step
  2. Evaluate performance: (A) average of all classes; (B) old-only average class accuracy; (C) new-only average class accuracy; (D) HM of Old and New.
  3. Save model checkpoints and evaluation metrics

To run only evaluation:

# Run for a specific few-shot data file
./POET_eval.sh 1  # For set1

# Run for multiple sets
./POET_eval.sh 1 2 3 4 5 6 7 8 9 10

The per-run performance is in this file for reproducibility and comparison.

Key Parameters

📌 Note, the parameters that are experiment-specific and stay fixed across different continual sessions are specified in this config file. The arguments that are dynamic across continual sessions (E.g. new class labels being added in a session) irrespective of the experiment are in the argparser_continual.py.

  • --k_shot: Number of samples per class for few-shot learning (default: 5)
  • --prompt_layer: Which layer to add prompts (default: 1)
  • --device: GPU device ID to use
  • --save_name_args: Experiment name for saving results
  • --prompt_sim_reg: Enable prompt similarity regularization
  • --classifier_average_init: Initialize new classifier weights as average of old ones

Results are saved in:

work_dir/ntu60/csub/ctrgcn_prompt/
├── checkpoints/
├── logs/
├── plots/
└── results.csv

Acknowledgements

We thank authors of CTR-GCN and Learning to Prompt for Continual Learning and their Pytorch reimplementation. These were useful starting points for our project.

Citation

If you find our work useful for your project, please consider citing our work:

@inproceedings{garg2024poet,
  title={POET: Prompt Offset Tuning for Continual Human Action Adaptation},
  author={Garg, Prachi and Joseph, KJ and Balasubramanian, Vineeth N and Camgoz, Necati Cihan and Wan, Chengde and Kin, Kenrick and Si, Weiguang and Ma, Shugao and De La Torre, Fernando},
  booktitle={European Conference on Computer Vision},
  pages={436--455},
  year={2024},
  organization={Springer}
}