Background subtraction (BGS) is a fundamental task in computer vision with applications in video surveillance, object tracking, and people recognition. Despite recent advancements, many deep learning-based BGS algorithms rely on large models to extract high-level representations, demanding significant computational resources and leading to inefficiencies in processing video streams. To address these limitations, we introduce the Sequential Feature Feedback Network (SeqFeedNet), a novel supervised algorithm for BGS in unseen videos that operates without additional pre-processing models. SeqFeedNet innovatively incorporates time-scale diverse sequential features and employs a feedback mechanism for each iteration. Moreover, we propose the Sequential Fit Training (SeqFiT) technique, enhancing model convergence during training. Evaluated on the CDNet 2014 dataset, SeqFeedNet not only achieves
| Previous | SeqFeedNet(Porposed) |
|---|---|
-
Comparison of methods according to the per-category F-Measure for unseen videos from CDnet 2014.
Without the pre-trained segmentation model, SeqFeedNet achieves the best BGS on unseen videos. -
Qualitative practical demand comparison for BGS.

SeqFeedNet aims to achieve all the key factors for the practical application and deployment.
-
FPS is calculated using PyTorch 2.1 implementation on a single Nvidia GeForce RTX 3090 GPU.
Note: ZBS was not included in the speed tests due to its higher setup costs, although it reportedly uses C++ language and reaches about 20 FPS on an A100 GPU according to its publication.SeqFeedNet is ~5 times faster than the best supervised BGS algorithm.
Overall, SeqFeedNet become a leading BGS algorithm for real-world application scenarios.
Follow the link to download the evaluation models mention on paper: https://drive.google.com/drive/folders/1GljFqnQp7vxh_96-moHWdUFEdTriTvh2?usp=sharing
- Cross-validation of overall and per-category results of SeqFeedNet on the CDnet 2014 dataset.
| Model | baseline | cameraJitter | badWeather | dynamicBackground | intermittentObjectMotion | lowFramerate | nightVideos | PTZ | shadow | thermal | turbulence | Overall | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cv1 | 0.9719 | 0.8808 | 0.8846 | 0.9353 | 0.9111 | 0.1843 | 0.6239 | 0.1541 | 0.9493 | 0.7675 | 0.8938 | 0.7415 | |
| cv2 | 0.9685 | 0.8874 | 0.9432 | 0.3776 | 0.8228 | 0.9826 | 0.7419 | 0.9151 | 0.9133 | 0.9423 | 0.7864 | 0.8437 | |
| cv3 | 0.9881 | 0.9129 | 0.7645 | 0.9674 | 0.9004 | 0.9589 | 0.6953 | 0.8387 | 0.9457 | 0.9384 | 0.9816 | 0.8993 | |
| cv4 | 0.9701 | 0.9509 | 0.7823 | 0.8166 | 0.7478 | 0.9201 | 0.9336 | 0.8876 | 0.9438 | 0.9426 | 0.8330 | 0.8844 | |
| Overall | 0.9747 | 0.9080 | 0.8437 | 0.7743 | 0.8455 | 0.7615 | 0.7487 | 0.6989 | 0.9380 | 0.8977 | 0.8737 | 0.8422 |
-
The findings reveal that increasing the number of sequential loops by including additional frame groups significantly boosts the performance of SeqFeedNet.
Test on python 3.10, cuda 12.1, and torch 2.1.1
pip3 install -r requirements.txt
pip3 install torch torchvision-
Download the CDnet 2014 Dataset: http://jacarini.dinf.usherbrooke.ca/static/dataset/dataset2014.zip
-
Download the model trained using both an empty background and a recent background as provided by M. Ozan Tezcan et al. at the following link: https://drive.google.com/drive/folders/1fskxV1paCsoZvqTVLjnlAdPOCHk1_XmF
-
Prepare the folder structure as follow:
Data ├── currentFr -> cdnet2014/dataset/ ├── emptyBg -> BSUV-Net-2.0_Training_Data/emptyBg └── recentBg -> BSUV-Net-2.0_Training_Data/recentBg
cross_validation_set=<number_of_set>
python3 training.py --device 0 -epochs 200 --batch_size 9 -workers 9 -cv $cross_validation_set -imghw 224-224 -use-t2val -opt Adam -out $cross_validation_setThe model's weight will be saved to the out/ directory.
cross_validation_set=<number_of_set>
# generate measurement csv
python3 testing.py -cv $cross_validation_set --device 0 -weight <model_weight>
# generate measurement csv & video result
python3 testing.py -cv $cross_validation_set --device 0 -weight <model_weight> -saveThe order of the first row, from left to right, is as follows: input, label, prediction mask(threshold=0.5), and prediction probability.
The second row displays the sequential features' visual results.
- PTZ/intermittenPan link: https://drive.google.com/file/d/12-dvboDZkgFxo4dM-YxM1wBMwJJWCbB1
| No. Frame | Sequential Features |
|---|---|
| 1 | ![]() |
| 2 | ![]() |
| 3 | ![]() |
From Frame 1 (initialize) to Frame 3, it is obvious that the sequential features can be adapted quickly.
- shadow/peopleInShade link: https://drive.google.com/file/d/1WbR_CjdeGmQn2NpPwPVL57v_eL2RmEK1/view?usp=drive_link
| Judgment of shadow | Results |
|---|---|
| As FG | ![]() |
| As BG | ![]() |
The result shows that the sequential features can divide the shadow shape to prevent misjudgment of the shadow as foreground.





