Skip to content

Project supervised by Robotic Systems Lab in the context of Perception and Learning at ETH 2023.

Notifications You must be signed in to change notification settings

DmblnNicole/Navigation-with-image-language-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Navigation with image language model

This project presents a pipeline for robot navigation that uses the ClipSeg and Segment Anything models to generate masks for traversable paths in images. This approach is well suited for paths that have a high contrast and are already visible. To retrieve a new path, a pipeline prompting ClipSeg and Stable Diffusion is implemented.

ClipSeg and Segment Anything

Original image Final mask
orig_youtube img_000238

ClipSeg and Stable Diffusion

Original image Final mask
orig_hike sd_final_mask_hike

Installation

  1. Clone the repository locally and pip install navigate-with-image-language-model with:
git clone https://github.com/DmblnNicole/Navigation-with-image-language-model.git
pip install -e .
  1. Install Dependecies
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/facebookresearch/segment-anything.git
  1. Download the checkpoint for Segment Anything model type vit_h here: ViT-H SAM model and save it in the root folder of the repository.

Getting started

The file pipeline/eval.py runs the pipeline and contains all adjustable information like textprompts, paths to image data and the model types.

  • Choose your model type and specify if output masks should be visualized.
if __name__ == '__main__':
    main('sam', visualize=False)

A new folder called output will save the masks if visualize==True.

Optional

Change text prompts and upload your own dataset.

  • Upload image data and specify path
data_path='../data/images/hike/edge'
  • Upload ground truth masks and specify path
GT_dir = '../data/GT/GT_hike'
  • Choose your text prompt
word_mask='A bright photo of a road to walk on'

Experimental Results

The pipeline, comprising ClipSeg and Segment Anything, was evaluated on a dataset extracted from YouTube videos as shown above. This dataset consists of images with visible paths and high contrast. While the primary objective is to segment already visible paths with ClipSeg and Segment Anything, the method also produces results for images of forest terrain where no clear path is visible.

Final Mask Final mask
wide_angle_camera_front_1677756688_627165488 wide_angle_camera_front_1677756728_599394188

About

Project supervised by Robotic Systems Lab in the context of Perception and Learning at ETH 2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published