Skip to content

Latest commit

 

History

History
65 lines (48 loc) · 3.08 KB

README.md

File metadata and controls

65 lines (48 loc) · 3.08 KB

Finding the Next Best View for Object Recognition through Maximum Entropy Viewpoint Selection

A collection of scripts related to my Master's thesis - a method for finding the most informative camera positions for multiview object recognition.

Thesis Report PDF: https://drive.google.com/file/d/1bxV0k1IZEmBeeNDRbrTXAjw1fXbQj_C8/view (soon to be published at https://fse.studenttheses.ub.rug.nl/31411/)

There are two methods: one based on differentiable rendering and one based on point clouds (see thesis report for details). They are both implemented in PyTorch.

Setup 🧑‍🔧

  1. Create a conda environment:

    conda create --name nbv_mevs_env python=3.8
    conda activate nbv_mevs_env
  2. Install PyTorch (the important part is to use some version that has the entr function):

    pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
  3. Install other dependencies using pip and conda (I would have preferred to use only conda, but the neural_renderer is not available there):

    pip install -r requirements_pip.txt
    conda install --file requirements.txt

You might need a separate conda environment for the method based on point clouds, since it was developed in a different version of PyTorch. For that, check out the am/thesis branch on my fork of PointNet2_PyTorch. The pipeline script (below) will work on either environment. The graph triangulation script build_graph_from_spherical_coords (which uses stripy) might also need a separate environment.

Usage 🧑‍💻

The main script is classification_pipeline in the pipeline directory. Given the paths of the object mesh, the checkpoint files and the desired method, it will run the pipeline for the given method. For more info, run:

python3 pipeline/classification_pipeline.py --help

The script evaluate_pipeline was used to run the pipeline on the entire test set.

Datasets are not tracked and can be obtained by running the scripts in the generate_datasets directory. You can get more info by running each script with the --help flag, e.g.:

python3 generate_datasets/generate_view_dataset.py --help

Note that you need ModelNet10 downloaded and extracted, (classification_pipeline assumes in ~/datasets/ModelNet10). You can get it here.

When training, caching is used (with lmdb) to speed up loading data. However, the cache database files can get quite large for images (in my experience, 10-20 times the size of a png dataset), so make sure you have plenty of disk space.

You might need to prepend PYTHONPATH=. to the commands for the imports to work.

Visualization 📊

The figures in the thesis can be generated using the scripts in the visualization directory. The output will be saved in the assets directory.

License

MIT License