L4P: Towards Unified Low-Level 4D Vision Perception

Abhishek Badki^* Hang Su^* Bowen Wen Orazio Gallo

* indicates equal contribution

L4P is a feed-forward foundational model designed for multiple low-level 4D vision perception tasks. Given a monocular video without camera poses, L4P jointly solves several tasks using a shared video encoder backbone and lightweight, task-specific heads. The model is currently trained to predict depth, optical flow, 2D/3D point tracking, dynamic motion segmentation, and camera pose estimation, and can be extended to support additional tasks.

News

[2025/11] Paper is accepted at 3DV 2026 (Oral).
[2025/9] We released the inference code.

Setup

The codebase is based on Pytorch Lightning and Lightning CLI.

Conda

conda create -n l4p python=3.10
conda activate l4p
pip install -r env/requirements.txt

You might need to install ffmpeg for mediapy. Follow instructions here.

Docker

The following assumes that Docker and NVIDIA Container are properly installed. Everything needed to build Docker locally is provided here: env

To build docker image for local use run: docker build . -t l4p:local -f env/Dockerfile. This will set up everything, including additional functionality for development using Docker and VSCode. If you get any issues due to viser library, try building the image again.

To run on VSCode with docker, use the provided devcontainer file: .devcontainer/devcontainer.json. Depending on your needs, you may want to update the mount paths based on where you store your data, results, SSH, and config files. This can be done by modifying mounts section in .devcontainer/devcontainer.json.

Once inside docker container, to activate conda environment, use source /workspace/miniconda3/bin/activate l4p.

Demo

We provide a demo showing several examples of running the model on all the tasks we support.

Download weights and sample data using:

cd weights
bash download.sh
cd -
cd demo/data
bash download.sh
cd -

Run the demo notebook demo/demo.ipynb or run the python file cd demo; python demo.py. If you get an OutOfMemoryError error, you could set this flag limit_gpu_mem_usage=True.

Below are example visualizations from our model for depth, flow and 2D tracks.

Because we estimate camera poses (with or without input intrinsics), we can visualize depth, camera poses, and 3D tracks within a consistent reference frame.

Limitations and Future improvements

Our approach is limited to 224 x 224 resolution.
The depth, camera poses and 3D tracks are generated using different heads in a feedforward manner. So they might not be perfectly consistent with each other.
Our current implementation of pose-alignment between overlapping windows is done on CPU, so a bit slow. A faster version is coming soon.

References

The sample results shown above are from:

Perazzi et al., A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
Gao et al., Monocular Dynamic View Synthesis: A Reality Check, In Advances in Neural Information Processing Systems (NeurIPS), 2022.

BibTeX

@inproceedings{badki2026l4p,
  title={{L4P}: {T}owards Unified Low-Level {4D} Vision Perception},
  author={Badki, Abhishek and Su, Hang and Wen, Bowen and Gallo, Orazio},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
configs		configs
demo		demo
env		env
l4p		l4p
media		media
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

L4P: Towards Unified Low-Level 4D Vision Perception

News

Setup

Conda

Docker

Demo

Limitations and Future improvements

References

BibTeX

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

NVlabs/L4P

Folders and files

Latest commit

History

Repository files navigation

L4P: Towards Unified Low-Level 4D Vision Perception

News

Setup

Conda

Docker

Demo

Limitations and Future improvements

References

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages