Official implementation for
Efficient 3D Semantic Segmentation with Superpoint Transformer (ICCV 2023)
Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering (3DV 2024 Oral)
If you β€οΈ or simply use this project, don't forget to give the repository a β,
it means a lot to us !
@article{robert2023spt,
title={Efficient 3D Semantic Segmentation with Superpoint Transformer},
author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
@article{robert2024scalable,
title={Scalable 3D Panoptic Segmentation as Superpoint Graph Clustering},
author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
journal={Proceedings of the IEEE International Conference on 3D Vision},
year={2024}
}
Superpoint Transformer (SPT) is a superpoint-based transformer π€ architecture that efficiently β‘ performs semantic segmentation on large-scale 3D scenes. This method includes a fast algorithm that partitions 𧩠point clouds into a hierarchical superpoint structure, as well as a self-attention mechanism to exploit the relationships between superpoints at multiple scales.
β¨ SPT in numbers β¨ |
---|
π S3DIS 6-Fold (76.0 mIoU) |
π KITTI-360 Val (63.5 mIoU) |
π DALES (79.6 mIoU) |
π¦ 212k parameters (PointNeXt Γ· 200, Stratified Transformer Γ· 40) |
β‘ S3DIS training in 3h on 1 GPU (PointNeXt Γ· 7, Stratified Transformer Γ· 70) |
β‘ Preprocessing x7 faster than SPG |
SuperCluster is a superpoint-based architecture for panoptic segmentation of (very) large 3D scenes π based on SPT. We formulate the panoptic segmentation task as a scalable superpoint graph clustering task. To this end, our model is trained to predict the input parameters of a graph optimization problem whose solution is a panoptic segmentation π‘. This formulation allows supervising our model with per-node and per-edge objectives only, circumventing the need for computing an actual panoptic segmentation and associated matching issues at train time. At inference time, our fast parallelized algorithm solves the small graph optimization problem, yielding object instances π₯. Due to its lightweight backbone and scalable formulation, SuperCluster can process scenes of unprecedented scale at once, on a single GPU π, with fewer than 1M parameters π¦.
β¨ SuperCluster in numbers β¨ |
---|
π S3DIS 6-Fold (55.9 PQ) |
π S3DIS Area 5 (50.1 PQ) |
π ScanNet Val (58.7 PQ) |
π KITTI-360 Val (48.3 PQ) |
π DALES (61.2 PQ) |
π¦ 212k parameters (PointGroup Γ· 37) |
β‘ S3DIS training in 4h on 1 GPU |
β‘ 7.8kmΒ² tile of 18M points in 10.1s on 1 GPU |
- 27.06.2024 Released our Superpoint Transformer π§βπ« tutorial slides, notebook, and video. Check these out if you are getting started with the project !
- 21.06.2024 Damien will be giving a π§βπ« tutorial on Superpoint Transformer on π 27.06.2024 at 1pm CEST. Make sure to come if you want to gain some hands-on experience with the project ! Registration here.
- 28.02.2024 Major code release for panoptic segmentation, implementing
Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering.
This new version also implements long-awaited features such as lightning's
predict()
behavior, voxel-resolution and full-resolution prediction. Some changes in the dependencies and repository structure are not backward-compatible. If you were already using anterior code versions, this means we recommend re-installing your conda environment and re-running the preprocessing or your datasetsβ - 15.10.2023 Our paper Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering was accepted for an oral presentation at 3DV 2024 π₯³
- 06.10.2023 Come see our poster for Efficient 3D Semantic Segmentation with Superpoint Transformer at ICCV 2023
- 14.07.2023 Our paper Efficient 3D Semantic Segmentation with Superpoint Transformer was accepted at ICCV 2023 π₯³
- 15.06.2023 Official release π±
This project was tested with:
- Linux OS
- 64G RAM
- NVIDIA GTX 1080 Ti 11G, NVIDIA V100 32G, NVIDIA A40 48G
- CUDA 11.8 and 12.1
- conda 23.3.1
Simply run install.sh
to install all dependencies in a new conda environment
named spt
.
# Creates a conda env named 'spt' env and installs dependencies
./install.sh
Note: See the Datasets page for setting up your dataset path and file structure.
βββ superpoint_transformer
β
βββ configs # Hydra configs
β βββ callbacks # Callbacks configs
β βββ data # Data configs
β βββ debug # Debugging configs
β βββ experiment # Experiment configs
β βββ extras # Extra utilities configs
β βββ hparams_search # Hyperparameter search configs
β βββ hydra # Hydra configs
β βββ local # Local configs
β βββ logger # Logger configs
β βββ model # Model configs
β βββ paths # Project paths configs
β βββ trainer # Trainer configs
β β
β βββ eval.yaml # Main config for evaluation
β βββ train.yaml # Main config for training
β
βββ data # Project data (see docs/datasets.md)
β
βββ docs # Documentation
β
βββ logs # Logs generated by hydra and lightning loggers
β
βββ media # Media illustrating the project
β
βββ notebooks # Jupyter notebooks
β
βββ scripts # Shell scripts
β
βββ src # Source code
β βββ data # Data structure for hierarchical partitions
β βββ datamodules # Lightning DataModules
β βββ datasets # Datasets
β βββ dependencies # Compiled dependencies
β βββ loader # DataLoader
β βββ loss # Loss
β βββ metrics # Metrics
β βββ models # Model architecture
β βββ nn # Model building blocks
β βββ optim # Optimization
β βββ transforms # Functions for transforms, pre-transforms, etc
β βββ utils # Utilities
β βββ visualization # Interactive visualization tool
β β
β βββ eval.py # Run evaluation
β βββ train.py # Run training
β
βββ tests # Tests of any kind
β
βββ .env.example # Example of file for storing private environment variables
βββ .gitignore # List of files ignored by git
βββ .pre-commit-config.yaml # Configuration of pre-commit hooks for code formatting
βββ install.sh # Installation script
βββ LICENSE # Project license
βββ README.md
Note: See the Datasets page for further details on
data/
.
Note: See the Logs page for further details on
logs/
.
See the Datasets page to set up your datasets.
Use the following command structure for evaluating our models from a checkpoint
file checkpoint.ckpt
, where <task>
should be semantic
for using SPT and panoptic
for using
SuperCluster:
# Evaluate for <task> segmentation on <dataset>
python src/eval.py experiment=<task>/<dataset> ckpt_path=/path/to/your/checkpoint.ckpt
Some examples:
# Evaluate SPT on S3DIS Fold 5
python src/eval.py experiment=semantic/s3dis datamodule.fold=5 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SPT on KITTI-360 Val
python src/eval.py experiment=semantic/kitti360 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SPT on DALES
python src/eval.py experiment=semantic/dales ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SuperCluster on S3DIS Fold 5
python src/eval.py experiment=panoptic/s3dis datamodule.fold=5 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SuperCluster on S3DIS Fold 5 with {wall, floor, ceiling} as 'stuff'
python src/eval.py experiment=panoptic/s3dis_with_stuff datamodule.fold=5 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SuperCluster on ScanNet Val
python src/eval.py experiment=panoptic/scannet ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SuperCluster on KITTI-360 Val
python src/eval.py experiment=panoptic/kitti360 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SuperCluster on DALES
python src/eval.py experiment=panoptic/dales ckpt_path=/path/to/your/checkpoint.ckpt
Note:
The pretrained weights of the SPT and SPT-nano models for S3DIS 6-Fold, KITTI-360 Val, and DALES are available at:
The pretrained weights of the SuperCluster models for S3DIS 6-Fold, S3DIS 6-Fold with stuff, ScanNet Val, KITTI-360 Val, and DALES are available at:
Use the following command structure for train our models on a 32G-GPU,
where <task>
should be semantic
for using SPT and panoptic
for using
SuperCluster:
# Train for <task> segmentation on <dataset>
python src/train.py experiment=<task>/<dataset>
Some examples:
# Train SPT on S3DIS Fold 5
python src/train.py experiment=semantic/s3dis datamodule.fold=5
# Train SPT on KITTI-360 Val
python src/train.py experiment=semantic/kitti360
# Train SPT on DALES
python src/train.py experiment=semantic/dales
# Train SuperCluster on S3DIS Fold 5
python src/train.py experiment=panoptic/s3dis datamodule.fold=5
# Train SuperCluster on S3DIS Fold 5 with {wall, floor, ceiling} as 'stuff'
python src/train.py experiment=panoptic/s3dis_with_stuff datamodule.fold=5
# Train SuperCluster on ScanNet Val
python src/train.py experiment=panoptic/scannet
# Train SuperCluster on KITTI-360 Val
python src/train.py experiment=panoptic/kitti360
# Train SuperCluster on DALES
python src/train.py experiment=panoptic/dales
Use the following to train on a 11G-GPU πΎ (training time and performance may vary):
# Train SPT on S3DIS Fold 5
python src/train.py experiment=semantic/s3dis_11g datamodule.fold=5
# Train SPT on KITTI-360 Val
python src/train.py experiment=semantic/kitti360_11g
# Train SPT on DALES
python src/train.py experiment=semantic/dales_11g
# Train SuperCluster on S3DIS Fold 5
python src/train.py experiment=panoptic/s3dis_11g datamodule.fold=5
# Train SuperCluster on S3DIS Fold 5 with {wall, floor, ceiling} as 'stuff'
python src/train.py experiment=panoptic/s3dis_with_stuff_11g datamodule.fold=5
# Train SuperCluster on ScanNet Val
python src/train.py experiment=panoptic/scannet_11g
# Train SuperCluster on KITTI-360 Val
python src/train.py experiment=panoptic/kitti360_11g
# Train SuperCluster on DALES
python src/train.py experiment=panoptic/dales_11g
Note: Encountering CUDA Out-Of-Memory errors ππΎ ? See our dedicated troubleshooting section.
Note: Other ready-to-use configs are provided in
configs/experiment/
. You can easily design your own experiments by composing configs:# Train Nano-3 for 50 epochs on DALES python src/train.py datamodule=dales model=nano-3 trainer.max_epochs=50
See Lightning-Hydra for more information on how the config system works and all the awesome perks of the Lightning+Hydra combo.
Note: By default, your logs will automatically be uploaded to Weights and Biases, from where you can track and compare your experiments. Other loggers are available in
configs/logger/
. See Lightning-Hydra for more information on the logging options.
Both SPT and SuperCluster inherit from LightningModule
and implement predict_step()
, which permits using
PyTorch Lightning's Trainer.predict()
mechanism.
from src.models.semantic import SemanticSegmentationModule
from src.datamodules.s3dis import S3DISDataModule
from pytorch_lightning import Trainer
# Predict behavior for semantic segmentation on S3DIS
dataloader = S3DISDataModule(...)
model = SemanticSegmentationModule(...)
trainer = Trainer(...)
batch, output = trainer.predict(model=model, dataloaders=dataloader)
This, however, still requires you to instantiate a Trainer
, a DataLoader
,
and a model with relevant parameters.
For a little more simplicity, all our datasets inherit from
LightningDataModule
and implement predict_dataloader()
by pointing to their
corresponding test set by default. This permits directly passing a datamodule to
PyTorch Lightning's Trainer.predict()
without explicitly instantiating a DataLoader
.
from src.models.semantic import SemanticSegmentationModule
from src.datamodules.s3dis import S3DISDataModule
from pytorch_lightning import Trainer
# Predict behavior for semantic segmentation on S3DIS
datamodule = S3DISDataModule(...)
model = SemanticSegmentationModule(...)
trainer = Trainer(...)
batch, output = trainer.predict(model=model, datamodule=datamodule)
For more details on how to instantiate these, as well as the output format
of our model, we strongly encourage you to play with our
demo notebook and have a look at the src/eval.py
script.
By design, our models only need to produce predictions for the superpoints of
the
At inference time, however, we often need the predictions on the voxels of the
See our demo notebook for more details on these.
For running a pretrained model on your own point cloud, please refer to our tutorial slides, notebook, and video.
Our hierarchical superpoint partition is computed at preprocessing time. Its construction involves several steps whose parametrization must be adapted to your specific dataset and task. Please refer to our tutorial slides, notebook, and video for better understanding this process and tuning it to your needs.
One specificity of SuperCluster is that the model is not trained to explicitly do panoptic segmentation, but to predict the input parameters of a superpoint graph clustering problem whose solution is a panoptic segmentation.
For this reason, the hyperparameters for this graph optimization problem are selected after training, with a grid search on the training or validation set. We find that fairly similar hyperparameters yield the best performance on all our datasets (see our paper's appendix). Yet, you may want to explore these hyperparameters for your own dataset. To this end, see our demo notebook for parameterizing the panoptic segmentation.
We provide notebooks to help you get started with manipulating our core data structures, configs loading, dataset and model instantiation, inference on each dataset, and visualization.
In particular, we created an interactive visualization tool β¨ which can be used to produce shareable HTMLs. Demos of how to use this tool are provided in the notebooks. Additionally, examples of such HTML files are provided in media/visualizations.7z
Location | Content |
---|---|
README | General introduction to the project |
docs/data_structures |
Introduction to the core data structures of this project: Data , NAG , Cluster , and InstanceData |
docs/datasets |
Introduction to our implemented datasets, to our BaseDataset class, and how to create your own dataset inheriting from it |
docs/logging |
Introduction to logging and the project's logs/ structure |
docs/visualization |
Introduction to our interactive 3D visualization tool |
Note: We endeavoured to comment our code as much as possible to make this project usable. If you don't find the answer you are looking for in the
docs/
, make sure to have a look at the source code and past issues. Still, if you find some parts are unclear or some more documentation would be needed, feel free to let us know by creating an issue !
Here are some common issues and tips for tackling them.
Our default configurations are designed for a 32G-GPU. Yet, SPT and SuperCluster can run on an 11G-GPU πΎ, with minor time and performance variations.
We provide configs in configs/experiment/semantic
for
training SPT on an 11G-GPU πΎ:
# Train SPT on S3DIS Fold 5
python src/train.py experiment=semantic/s3dis_11g datamodule.fold=5
# Train SPT on KITTI-360 Val
python src/train.py experiment=semantic/kitti360_11g
# Train SPT on DALES
python src/train.py experiment=semantic/dales_11g
Similarly, we provide configs in configs/experiment/panoptic
for
training SuperCluster on an 11G-GPU πΎ:
# Train SuperCluster on S3DIS Fold 5
python src/train.py experiment=panoptic/s3dis_11g datamodule.fold=5
# Train SuperCluster on S3DIS Fold 5 with {wall, floor, ceiling} as 'stuff'
python src/train.py experiment=panoptic/s3dis_with_stuff_11g datamodule.fold=5
# Train SuperCluster on ScanNet Val
python src/train.py experiment=panoptic/scannet_11g
# Train SuperCluster on KITTI-360 Val
python src/train.py experiment=panoptic/kitti360_11g
# Train SuperCluster on DALES
python src/train.py experiment=panoptic/dales_11g
Having some CUDA OOM errors ππΎ ? Here are some parameters you can play with to mitigate GPU memory use, based on when the error occurs.
Parameters affecting CUDA memory.
Legend: π‘ Preprocessing | π΄ Training | π£ Inference (including validation and testing during training)
Parameter | Description | When |
---|---|---|
datamodule.xy_tiling |
Splits dataset tiles into xy_tiling^2 smaller tiles, based on a regular XY grid. Ideal square-shaped tiles Γ la DALES. Note this will affect the number of training steps. | π‘π£ |
datamodule.pc_tiling |
Splits dataset tiles into 2^pc_tiling smaller tiles, based on a their principal component. Ideal for varying tile shapes Γ la S3DIS and KITTI-360. Note this will affect the number of training steps. | π‘π£ |
datamodule.max_num_nodes |
Limits the number of |
π΄ |
datamodule.max_num_edges |
Limits the number of |
π΄ |
datamodule.voxel |
Increasing voxel size will reduce preprocessing, training and inference times but will reduce performance. | π‘π΄π£ |
datamodule.pcp_regularization |
Regularization for partition levels. The larger, the fewer the superpoints. | π‘π΄π£ |
datamodule.pcp_spatial_weight |
Importance of the 3D position in the partition. The smaller, the fewer the superpoints. | π‘π΄π£ |
datamodule.pcp_cutoff |
Minimum superpoint size. The larger, the fewer the superpoints. | π‘π΄π£ |
datamodule.graph_k_max |
Maximum number of adjacent nodes in the superpoint graphs. The smaller, the fewer the superedges. | π‘π΄π£ |
datamodule.graph_gap |
Maximum distance between adjacent superpoints int the superpoint graphs. The smaller, the fewer the superedges. | π‘π΄π£ |
datamodule.graph_chunk |
Reduce to avoid OOM when RadiusHorizontalGraph preprocesses the superpoint graph. |
π‘ |
datamodule.dataloader.batch_size |
Controls the number of loaded tiles. Each train batch is composed of batch_size *datamodule.sample_graph_k spherical samplings. Inference is performed on entire validation and test tiles, without spherical sampling. |
π΄π£ |
datamodule.sample_segment_ratio |
Randomly drops a fraction of the superpoints at each partition level. | π΄ |
datamodule.sample_graph_k |
Controls the number of spherical samples in the train batches. | π΄ |
datamodule.sample_graph_r |
Controls the radius of spherical samples in the train batches. Set to sample_graph_r<=0 to use the entire tile without spherical sampling. |
π΄ |
datamodule.sample_point_min |
Controls the minimum number of |
π΄ |
datamodule.sample_point_max |
Controls the maximum number of |
π΄ |
callbacks.gradient_accumulator.scheduling |
Gradient accumulation. Can be used to train with smaller batches, with more training steps. | π΄ |
- This project was built using Lightning-Hydra template.
- The main data structures of this work rely on PyTorch Geometric
- Some point cloud operations were inspired from the Torch-Points3D framework, although not merged with the official project at this point.
- For the KITTI-360 dataset, some code from the official KITTI-360 was used.
- Some superpoint-graph-related operations were inspired from Superpoint Graph
- The hierarchical superpoint partition and graph clustering are computed using Parallel Cut-Pursuit
If your work uses all or part of the present code, please include the following a citation:
@article{robert2023spt,
title={Efficient 3D Semantic Segmentation with Superpoint Transformer},
author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
@article{robert2024scalable,
title={Scalable 3D Panoptic Segmentation as Superpoint Graph Clustering},
author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
journal={Proceedings of the IEEE International Conference on 3D Vision},
year={2024}
}
You can find our SPT paper π and SuperCluster paper π on arxiv.
Also, if you β€οΈ or simply use this project, don't forget to give the repository a β, it means a lot to us !