Pixel-Perfect Depth

Gangwei Xu^1,2,* · Haotong Lin^3,* · Hongcheng Luo² · Xianqi Wang¹ · Jingfeng Yao¹
Lianghui Zhu¹ · Yuechuan Pu² · Cheng Chi² · Haiyang Sun^2,† · Bing Wang²
Guang Chen² · Hangjun Ye² · Sida Peng³ · Xin Yang^1,†,✉️

¹HUST ²Xiaomi EV ³Zhejiang University

*co-first author †project leader ✉️ corresponding author

This work presents Pixel-Perfect Depth, a monocular depth estimation model with pixel-space diffusion transformers. Compared to existing discriminative and generative models, its estimated depth maps can produce high-quality, flying-pixel-free point clouds.

Overview of Pixel-Perfect Depth. We perform diffusion generation directly in pixel space without using any VAE.

🌟 Features

Pixel-space diffusion generation (operating directly in image space, without VAE or latent representations), capable of producing flying-pixel-free point clouds from estimated depth maps.
Our model integrates the discriminative representation (ViT) into generative modeling (DiT), fully leveraging the strengths of both paradigms.
Our network architecture is purely transformer-based, containing no convolutional layers.
Although our model is trained at a fixed resolution of 1024×768, it can flexibly support various input resolutions and aspect ratios during inference.

News

2025-12-20: We release the training code for PPD, featuring a two-stage pipeline: 512×512 pre-training followed by 1024×768 fine-tuning.
2025-12-01: We release a new PPD model together with its weights, which leverage MoGe2 to provide semantics and deliver a 20–30% improvement on zero-shot benchmarks.
2025-10-01: Paper, project page, code, models, and demo are all released.

Benchmarks

Pre-trained Models

Our pretrained models are available on the huggingface hub:

Model	Semantics	Params	Checkpoint	Training Resolution
PPD	DA2	500M	Download	1024×768
PPD	MoGe2	500M	Download	1024×768

Usage

Prepraration

git clone https://github.com/gangweix/pixel-perfect-depth
cd pixel-perfect-depth
pip install -r requirements.txt

Download our pretrained model ppd.pth and put it under the checkpoints/ directory. In addition, you also need to download the pretrained model depth_anything_v2_vitl.pth (or moge2.pt) and put it under the checkpoints/ directory.

Running depth on images

python run.py

Running point cloud on images

Generating point clouds requires metric depth and camera intrinsics from MoGe. Please download the pretrained model moge2.pt and place it under the checkpoints/ folder.

python run_point_cloud.py --save_pcd

Training

Our training strategy follows a two-stage curriculum:

Stage 1: Pre-training. Conducted at 512×512 resolution on the Hypersim dataset.
```
python main.py --cfg_file ppd/configs/train_pretrain.yaml pl_trainer.devices=8
```
Stage 2: Fine-tuning. Conducted at 1024×768 resolution on a mixture of five datasets.
```
python main.py --cfg_file ppd/configs/train_finetune.yaml pl_trainer.devices=8
```

Qualitative Comparisons with Previous Methods

Our model preserves more fine-grained details than Depth Anything v2 and MoGe 2, while demonstrating significantly higher robustness compared to Depth Pro.

Acknowledgement

We are grateful to the Depth Anything V2, MoGe and DiT teams for their code and model release. We would also like to sincerely thank the NeurIPS reviewers for their appreciation of this work (ratings: 5, 5, 5, 5).

Citation

If you find this project useful, please consider citing:

@article{xu2025pixel,
  title={Pixel-perfect depth with semantics-prompted diffusion transformers},
  author={Xu, Gangwei and Lin, Haotong and Luo, Hongcheng and Wang, Xianqi and Yao, Jingfeng and Zhu, Lianghui and Pu, Yuechuan and Chi, Cheng and Sun, Haiyang and Wang, Bing and others},
  journal={arXiv preprint arXiv:2510.07316},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
assets		assets
ppd		ppd
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.py		run.py
run_point_cloud.py		run_point_cloud.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pixel-Perfect Depth

🌟 Features

News

Benchmarks

Pre-trained Models

Usage

Prepraration

Running depth on images

Running point cloud on images

Training

Qualitative Comparisons with Previous Methods

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

gangweix/pixel-perfect-depth

Folders and files

Latest commit

History

Repository files navigation

Pixel-Perfect Depth

🌟 Features

News

Benchmarks

Pre-trained Models

Usage

Prepraration

Running depth on images

Running point cloud on images

Training

Qualitative Comparisons with Previous Methods

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages