The PyTorch implementation of our paper:
Chenchen Zhao, Yeqiang Qian, and Ming Yang. Monocular Pedestrian Orientation Estimation Based on Deep 2D-3D Feedforward. Pattern Recognition 2020
[paper] [learn more]
We propose a test-time monocular 2D pedestrian orientation estimation model. The model receives the image features and the 2D & 3D (train-time) dimension information as inputs, and outputs the estimated orientation of each pedestrian object. The model is ranked 9/162 on the KITTI Pedestrian Orientation Estimation Evaluation Benchmark
Code inspired by Deep3DBox
3.5 years later...
- Rewrite the code
- Add high-freq embedding strategies to the input 2D & 3D dimensions. The strategies are similar to the timestep embedding strategy in diffusion models
Run conda env create -f environment.yaml && conda activate ffnet
to create and activate a conda virtual environment named ffnet
Run python main.py train
to train a model
Run python main.py val
to validate the performance of the model on the validation split
Run python main.py test
to test the performance on the test set, and record the results in the format for the benchmark. Valid 2D pedestrian object detection results are required
The authors appreciate the authors of LED who provide the 2D results, and use the exact results for the benchmark
Modify args.py
for customized experimental settings
Model | Moderate % | Easy % | Hard % | Runtime /s | Ranking |
---|---|---|---|---|---|
FFNet (ours) | 59.17 | 69.17 | 54.95 | 0.22 | 2nd |
SubCNN 1 | 66.28 | 78.33 | 61.37 | 2.00 | 1st |
Mono3D 2 | 58.12 | 68.58 | 54.94 | 4.20 | 3rd |
MonoPSR 3 | 56.30 | 70.56 | 49.84 | 0.20 | 4th |
Shift R-CNN 4 | 48.81 | 65.39 | 41.05 | 0.25 | 5th |
FFNet backbone | 38.92 | 46.36 | 35.63 | 0.21 | 6th |
@article{zhao2020monocular,
title={Monocular pedestrian orientation estimation based on deep 2{D}-3{D} feedforward},
author={Zhao, Chenchen and Qian, Yeqiang and Yang, Ming},
journal={Pattern Recognition},
volume={100},
pages={107182},
year={2020},
publisher={Elsevier}
}