🎬⏮️ "Previously on ..." From Recaps to Story Summarization

S08E23_recap.mp4

📑 Contents

About
Setting up the repository
1. Create a virtual environment
2. Update the config template
Feature Extraction
Downloading and Setting up the data directories
1. PlotSnap features
2. TaleSumm pre-trained weights
Train TaleSumm with different configurations
Inference on TaleSumm to create summaries
License
Bibtex

🤖About

This is the official code repository for CVPR-2024 accepted paper "Previously on ..." From Recaps to Story Summarization. This repository contains the implementation of TaleSumm, a Transformer-based hierarchical model on our proposed dataset PlotSnap. TaleSumm processes entire episodes by creating compact shot 🎞️ and dialog 🗣️ representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Our model leverages multiple modalities, including visual and dialog features, to capture a comprehensive understanding of important shots in complex movie environments. Additionally, we provide the pre-trained weights for the TaleSumm as well as all the pre-trained feature backbones used in feature extraction. On top of that, we provide pre-extracted features for episodes (per-frame embeddings using DenseNet169, CLIP, and MViT), and dialog features (with finetuned RoBERTa backbone).

⚙️Setting up the repository

🐍Create a `Python`-virtual environment

Clone the repository and change the working directory to be project's root.

$ git clone https://github.com/katha-ai/RecapStorySumm-CVPR2024
$ cd RecapStorySumm-CVPR2024

This project strictly requires python==3.8.

Create a virtual environment using Conda.

$ conda create -n storysumm python=3.8
$ conda activate storysumm
(storysumm) $ pip install -r requirements.txt

OR

Create a virtual environment using pip (make sure you have Python3.8 installed)

$ python3.8 -m pip install virtualenv
$ python3.8 -m virtualenv storysumm
$ source storysumm/bin/activate
(storysumm) $ pip install -r requirements.txt

🛠️Configure the `configs/base.yaml` file

Add the absolute paths to the project directory in configs/base.yaml

E.g., If you have cloned the repository at /home/user/RecapStorySumm-CVPR2024, and want to download model checkpoints and the data features, then the path variables in configs/base.yaml would be-

root: "/home/user/RecapStorySumm-CVPR2024"
# Save PlotSnap data features here
data_path: "${root}/data"
split_dir: "${root}/configs/data_configs/splits"
# To save dialog (and vision) backbones
cache_dir: "${root}/cache/"
# use the following for model checkpoints 
ckpt_path: "${root}/checkpoints/storysumm"

Refer to configs/trainer_config.yaml and configs/inference_config.yaml for the default parameter configuration while training and inferencing, respectively.

🔍Feature Extraction

Follow the instructions in feature_extractors/README.md [WIP] to extract required features from any given video and prepare it summarization.

Note that we have already provided the pre-extracted features for PlotSnap below.

📥Download

🗃️PlotSnap features

You can also use wget to download these files-

# Download the features (as mentioned below into data/ folder)
LINK="https://iiitaphyd-my.sharepoint.com/:u:/g/personal/makarand_tapaswi_iiit_ac_in/EdEsWTvAEg5Iuo1cAUNmVq4Bipauv5nGdTdXAtMidWR5GA?e=dLWkNo"
wget -O data $LINK

File name	Contents	Comments
24	Contains total of 8 seasons (S02 to S09). Each season then consists of 24 episodes, except S09 that has 12 episodes. Each episode consists of: `encodings/`: Consist video and dialog encodings `scores/`: Different form of labels for both video and dialog. `videvents/`: files that consists of starting and ending timing of constituent shots of recap and episode. `SXXEXX.dfd`: Per-frame scores denoting the possibility of shot-boundary `SXXEXX.matidx`: per-frame info on shot-index, frame-index, time (seconds:microseconds) `SXXEXX.srt`: Dialog File (for visualization) `shot_frames/`: 3 frames from each shot.	Contains `S02` to `S09` directories which will occupy 92GB of disk space.
Prison Break	Contains total of 2 seasons (`S02` & `S03`). They consists of 22 and 13 episodes, respectively. The episodes follow the same directory stucture as TV Show `24`.	This occupy 22GB of disk space.

🦾TaleSumm pre-trained weights

# Create the checkpoints folder `checkpoints/storysumm` in the project's root folder if not present already and put all checkpoints one-by-one in them.
mkdir -p <absolute_path_to_root>/checkpoints/storysumm

# OR (simply do the following).

# Now download the pre-trained weights (as mentioned below into ckpts/ folder)
LINK="https://iiitaphyd-my.sharepoint.com/:u:/g/personal/makarand_tapaswi_iiit_ac_in/ES91ZF90ArJGiXkEa53-kJABNytKOyOSQlr03dnTf6bKKg?e=PN1Gir"
wget -O checkpoints $LINK

File name	Comments	Training command
`TaleSumm-IntraCVT\|S[1,2,3,4,5]`	IntraCVT split `i=0,1,2,3,4` checkpoint of TaleSumm	`(storysumm) $ python -m trainer split_type='intra-loocv'`
`TaleSumm-Final`	Final checkpoint of TaleSumm to be used in production	`(storysumm) $ python -m trainer split_type='final-split.yaml'`

🏋️‍♂️Train

After completing the above, now you can train Talesumm on a 12GB Nvidia-2080 RTX-Ti GPU! You can also use the pre-trained weights provided in the Download section.

Note: It is recommended to use wandb to log & track your experiments

Using the default values given in the config_base.yaml

To train TaleSumm for PlotSnap, use the default config (no argument required)
```
(storysumm) $ python -m trainer
```
To train Talesumm with a specific modality (valid keywords- vid, dia, both)
```
(storysumm) $ python -m trainer modality=both
```
To train Talesumm on a specific series (valid keywords- 24, prison-break, all)
```
(storysumm) $ python -m trainer series='24'
```
To change the split type to be used for training (valid keywords- cross-series, intra-loocv, inter-loocv, default-split.yaml, fandom-split.yaml)
```
(storysumm) $ python -m trainer split_type=cross-series
```
To choose which visual features to train on, create a list of the features to be used (valid keywords- imagenet, mvit, clip)
```
(storysumm) $ python -m trainer visual_features=['imagenet','mvit','clip']
```
To choose the fusion style of the visual features (valid keywords- concat, stack, mul)
```
(storysumm) $ python -m trainer feat_fusion_style=concat
```
To choose the type of attention in the model (valid keywords- sparse, full)
```
(storysumm) $ python -m trainer attention_type=sparse
```
To disable Group tokens from the model
```
(storysumm) $ python -m trainer withGROUP=False
```
NOTE : If withGROUP is True then computeGROUPloss needs to be True as well

To enable wandb logging (recommended)

(storysumm) $ python -m trainer wandb.logging=True

NOTE : We have used 4 GPUs while training that is why the gpus parameter in the configuration is set to [0,1,2,3]. If you plan to more or less GPUs, please enter their GPU id's accordingly

Inference

To summarise a new video using Talesumm, please follow the following commands

(storysummm) $ python -m inference <overrides for inference_config.yaml>

NOTE : We have used 4 GPUs while training that is why the gpus parameter in the configuration is set to [0,1,2,3]. If you plan to more or less GPUs, please enter their GPU id's accordingly

📜License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

📍Cite

If you find any part of this repository useful, please cite the following paper!

@inproceedings{singh2024previously,
title={{"Previously on ..." From Recaps to Story Summarization}}, 
author={Aditya Kumar Singh and Dhruv Srivastava and Makarand Tapaswi},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
analysis		analysis
assets		assets
configs		configs
data		data
dataloader		dataloader
feature_extractor/video		feature_extractor/video
models/talesumm		models/talesumm
reports/html_folders		reports/html_folders
utils		utils
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬⏮️ "Previously on ..." From Recaps to Story Summarization

S08E23_recap.mp4

📑 Contents

🤖About

⚙️Setting up the repository

🐍Create a `Python`-virtual environment

🛠️Configure the `configs/base.yaml` file

🔍Feature Extraction

📥Download

🗃️PlotSnap features

🦾TaleSumm pre-trained weights

🏋️‍♂️Train

Inference

📜License

📍Cite

About

Releases

Packages

Languages

License

katha-ai/RecapStorySumm-CVPR2024

Folders and files

Latest commit

History

Repository files navigation

🎬⏮️ "Previously on ..." From Recaps to Story Summarization S08E23_recap.mp4

📑 Contents

🤖About

⚙️Setting up the repository

🐍Create a Python-virtual environment

🛠️Configure the configs/base.yaml file

🔍Feature Extraction

📥Download

🗃️PlotSnap features

🦾TaleSumm pre-trained weights

🏋️‍♂️Train

Inference

📜License

📍Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

🎬⏮️ "Previously on ..." From Recaps to Story Summarization

S08E23_recap.mp4

🐍Create a `Python`-virtual environment

🛠️Configure the `configs/base.yaml` file

Packages