This project presents a pipeline for robot navigation that uses the ClipSeg and Segment Anything models to generate masks for traversable paths in images. This approach is well suited for paths that have a high contrast and are already visible. To retrieve a new path, a pipeline prompting ClipSeg and Stable Diffusion is implemented.
Original image | Final mask |
---|---|
Original image | Final mask |
---|---|
- Clone the repository locally and pip install
navigate-with-image-language-model
with:
git clone https://github.com/DmblnNicole/Navigation-with-image-language-model.git
pip install -e .
- Install Dependecies
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/facebookresearch/segment-anything.git
- Download the checkpoint for Segment Anything model type vit_h here: ViT-H SAM model and save it in the root folder of the repository.
The file pipeline/eval.py
runs the pipeline and contains all adjustable information like textprompts, paths to image data and the model types.
- Choose your model type and specify if output masks should be visualized.
if __name__ == '__main__':
main('sam', visualize=False)
A new folder called output
will save the masks if visualize==True
.
Change text prompts and upload your own dataset.
- Upload image data and specify path
data_path='../data/images/hike/edge'
- Upload ground truth masks and specify path
GT_dir = '../data/GT/GT_hike'
- Choose your text prompt
word_mask='A bright photo of a road to walk on'
The pipeline, comprising ClipSeg and Segment Anything, was evaluated on a dataset extracted from YouTube videos as shown above. This dataset consists of images with visible paths and high contrast. While the primary objective is to segment already visible paths with ClipSeg and Segment Anything, the method also produces results for images of forest terrain where no clear path is visible.
Final Mask | Final mask |
---|---|