facebookresearch · anuroopsriram · Dec 4, 2023
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -23,13 +23,13 @@ include:
 Examples of unacceptable behavior by participants include:
 
 * The use of sexualized language or imagery and unwelcome sexual attention or
-  advances
+advances
 * Trolling, insulting/derogatory comments, and personal or political attacks
 * Public or private harassment
 * Publishing others' private information, such as a physical or electronic
-  address, without explicit permission
+address, without explicit permission
 * Other conduct which could reasonably be considered inappropriate in a
-  professional setting
+professional setting
 
 ## Our Responsibilities
 
@@ -52,14 +52,10 @@ project e-mail address, posting via an official social media account, or acting
 as an appointed representative at an online or offline event. Representation of
 a project may be further defined and clarified by project maintainers.
 
-This Code of Conduct also applies outside the project spaces when there is a
-reasonable belief that an individual's behavior may have a negative impact on
-the project or its community.
-
 ## Enforcement
 
 Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported by contacting the project team at <opensource-conduct@fb.com>. All
+reported by contacting the project team at <conduct@pytorch.org>. All
 complaints will be reviewed and investigated and will result in a response that
 is deemed necessary and appropriate to the circumstances. The project team is
 obligated to maintain confidentiality with regard to the reporter of an incident.
@@ -77,4 +73,4 @@ available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.ht
 [homepage]: https://www.contributor-covenant.org
 
 For answers to common questions about this code of conduct, see
-https://www.contributor-covenant.org/faq
+https://www.contributor-covenant.org/faq
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,31 +1,32 @@
-# Contributing to segment-anything
+# Contributing to segment-anything-fast
 We want to make contributing to this project as easy and transparent as
 possible.
 
+
 ## Pull Requests
 We actively welcome your pull requests.
 
 1. Fork the repo and create your branch from `main`.
 2. If you've added code that should be tested, add tests.
 3. If you've changed APIs, update the documentation.
 4. Ensure the test suite passes.
-5. Make sure your code lints, using the `linter.sh` script in the project's root directory. Linting requires `black==23.*`, `isort==5.12.0`, `flake8`, and `mypy`.
+5. Make sure your code lints.
 6. If you haven't already, complete the Contributor License Agreement ("CLA").
 
 ## Contributor License Agreement ("CLA")
 In order to accept your pull request, we need you to submit a CLA. You only need
-to do this once to work on any of Facebook's open source projects.
+to do this once to work on any of Meta's open source projects.
 
 Complete your CLA here: <https://code.facebook.com/cla>
 
 ## Issues
 We use GitHub issues to track public bugs. Please ensure your description is
 clear and has sufficient instructions to be able to reproduce the issue.
 
-Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe
 disclosure of security bugs. In those cases, please go through the process
 outlined on that page and do not file a public issue.
 
 ## License
-By contributing to segment-anything, you agree that your contributions will be licensed
-under the LICENSE file in the root directory of this source tree.
+By contributing to `segment-anything-fast`, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.
diff --git a/LICENSE b/LICENSE
@@ -1,4 +1,4 @@
-                                 Apache License
+                                Apache License
                            Version 2.0, January 2004
                         http://www.apache.org/licenses/
 
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [yyyy] [name of copyright owner]
+   Copyright [2023] Lightning AI
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
@@ -198,4 +198,4 @@
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
-   limitations under the License.
+   limitations under the License.
diff --git a/README.md b/README.md
@@ -1,171 +1,72 @@
-# Segment Anything
+# Segment anything ... Fast
 
-**[Meta AI Research, FAIR](https://ai.facebook.com/research/)**
+This work is based on a fork of https://github.com/facebookresearch/segment-anything
 
-[Alexander Kirillov](https://alexander-kirillov.github.io/), [Eric Mintun](https://ericmintun.github.io/), [Nikhila Ravi](https://nikhilaravi.com/), [Hanzi Mao](https://hanzimao.me/), Chloe Rolland, Laura Gustafson, [Tete Xiao](https://tetexiao.com), [Spencer Whitehead](https://www.spencerwhitehead.com/), Alex Berg, Wan-Yen Lo, [Piotr Dollar](https://pdollar.github.io/), [Ross Girshick](https://www.rossgirshick.info/)
+The corresponding blog post is https://pytorch.org/blog/accelerating-generative-ai/
 
-[[`Paper`](https://ai.facebook.com/research/publications/segment-anything/)] [[`Project`](https://segment-anything.com/)] [[`Demo`](https://segment-anything.com/demo)] [[`Dataset`](https://segment-anything.com/dataset/index.html)] [[`Blog`](https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/)] [[`BibTeX`](#citing-segment-anything)]
-
-![SAM design](assets/model_diagram.png?raw=true)
-
-The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a [dataset](https://segment-anything.com/dataset/index.html) of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.
-
-<p float="left">
-  <img src="assets/masks1.png?raw=true" width="37.25%" />
-  <img src="assets/masks2.jpg?raw=true" width="61.5%" /> 
-</p>
 
 ## Installation
 
-The code requires `python>=3.8`, as well as `pytorch>=1.7` and `torchvision>=0.8`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
 
-Install Segment Anything:
+Step 1
 
-```
-pip install git+https://github.com/facebookresearch/segment-anything.git
-```
-
-or clone the repository locally and install with
-
-```
-git clone [email protected]:facebookresearch/segment-anything.git
-cd segment-anything; pip install -e .
-```
+Get latest PyTorch nightly
 
-The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. `jupyter` is also required to run the example notebooks.
 
+For example:
 ```
-pip install opencv-python pycocotools matplotlib onnxruntime onnx
+pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
 ```
 
-## <a name="GettingStarted"></a>Getting Started
+Installation instructions vary by platform. Please see the website https://pytorch.org/
 
-First download a [model checkpoint](#model-checkpoints). Then the model can be used in just a few lines to get masks from a given prompt:
 
-```
-from segment_anything import SamPredictor, sam_model_registry
-sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
-predictor = SamPredictor(sam)
-predictor.set_image(<your_image>)
-masks, _, _ = predictor.predict(<input_prompts>)
-```
+Step 2
 
-or generate masks for an entire image:
+Install the package
 
 ```
-from segment_anything import SamAutomaticMaskGenerator, sam_model_registry
-sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
-mask_generator = SamAutomaticMaskGenerator(sam)
-masks = mask_generator.generate(<your_image>)
+pip install git+https://github.com/pytorch-labs/segment-anything-fast.git
 ```
 
-Additionally, masks can be generated for images from the command line:
+## Usage
 
-```
-python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type <model_type> --input <image_or_folder> --output <path/to/output>
-```
+The package acts like a drop-in replacement for segment-anything.
 
-See the examples notebooks on [using SAM with prompts](/notebooks/predictor_example.ipynb) and [automatically generating masks](/notebooks/automatic_mask_generator_example.ipynb) for more details.
+So, for example, if you're currently doing `from segment_anything import sam_model_registry` you should be able to do `from segment_anything_fast import sam_model_registry`.
 
-<p float="left">
-  <img src="assets/notebook1.png?raw=true" width="49.1%" />
-  <img src="assets/notebook2.png?raw=true" width="48.9%" />
-</p>
+However, you're likely here because you want to try a fast, inference version. So we also created a `sam_model_fast_registry` that automatically applies
+- Sets `eval` mode
+- Uses `bfloat16`
+- Enables torch.compile with max-autotune
+- Uses a custom Triton kernel that implements SDPA for relative positional encodings for long sequence lengths
 
-## ONNX Export
+The custom Triton kernel in particular was written for A100. If you're not using an A100, we will try to rerun autotuning on your device and locally save the best configs.
+You might still run into performance issues, so you can disable the kernel by setting the environment variable `SEGMENT_ANYTHING_FAST_USE_FLASH_4=0`
 
-SAM's lightweight mask decoder can be exported to ONNX format so that it can be run in any environment that supports ONNX runtime, such as in-browser as showcased in the [demo](https://segment-anything.com/demo). Export the model with
+Please also note that the first time you're running this model you'll likely need to wait a bit for it to compile.
 
-```
-python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type <model_type> --output <path/to/output>
-```
+If you'd like to see the details on how to reproduce all results, please see the README in the experiments folder above.
 
-See the [example notebook](https://github.com/facebookresearch/segment-anything/blob/main/notebooks/onnx_model_example.ipynb) for details on how to combine image preprocessing via SAM's backbone with mask prediction using the ONNX model. It is recommended to use the latest stable version of PyTorch for ONNX export.
+Please don't be shy to open a Github issue if you're missing functionality or find an issue. Thank you.
 
-### Web demo
+## Results
 
-The `demo/` folder has a simple one page React app which shows how to run mask prediction with the exported ONNX model in a web browser with multithreading. Please see [`demo/README.md`](https://github.com/facebookresearch/segment-anything/blob/main/demo/README.md) for more details.
+The results show a waterfall of techniques.
 
-## <a name="Models"></a>Model Checkpoints
+Left to right these techniques are combined.
 
-Three model versions of the model are available with different backbone sizes. These models can be instantiated by running
+That means the very last bar is the combination of
+- bfloat16
+- torch.compile with max-autotune
+- [torch.scaled_dot_product_attention](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention.html)
+- A custom Triton kernel that implements SDPA for relative positional encodings for long sequence lengths
+- NestedTensors
+- Dynamic int8 symmetric quantization
+- 2:4 sparse format
 
-```
-from segment_anything import sam_model_registry
-sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
-```
-
-Click the links below to download the checkpoint for the corresponding model type.
-
-- **`default` or `vit_h`: [ViT-H SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)**
-- `vit_l`: [ViT-L SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth)
-- `vit_b`: [ViT-B SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth)
-
-## Dataset
-
-See [here](https://ai.facebook.com/datasets/segment-anything/) for an overview of the datastet. The dataset can be downloaded [here](https://ai.facebook.com/datasets/segment-anything-downloads/). By downloading the datasets you agree that you have read and accepted the terms of the SA-1B Dataset Research License.
-
-We save masks per image as a json file. It can be loaded as a dictionary in python in the below format.
-
-```python
-{
-    "image"                 : image_info,
-    "annotations"           : [annotation],
-}
-
-image_info {
-    "image_id"              : int,              # Image id
-    "width"                 : int,              # Image width
-    "height"                : int,              # Image height
-    "file_name"             : str,              # Image filename
-}
-
-annotation {
-    "id"                    : int,              # Annotation id
-    "segmentation"          : dict,             # Mask saved in COCO RLE format.
-    "bbox"                  : [x, y, w, h],     # The box around the mask, in XYWH format
-    "area"                  : int,              # The area in pixels of the mask
-    "predicted_iou"         : float,            # The model's own prediction of the mask's quality
-    "stability_score"       : float,            # A measure of the mask's quality
-    "crop_box"              : [x, y, w, h],     # The crop of the image used to generate the mask, in XYWH format
-    "point_coords"          : [[x, y]],         # The point coordinates input to the model to generate the mask
-}
-```
-
-Image ids can be found in sa_images_ids.txt which can be downloaded using the above [link](https://ai.facebook.com/datasets/segment-anything-downloads/) as well.
-
-To decode a mask in COCO RLE format into binary:
-
-```
-from pycocotools import mask as mask_utils
-mask = mask_utils.decode(annotation["segmentation"])
-```
-
-See [here](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py) for more instructions to manipulate masks stored in RLE format.
+![High level results](experiments/bar_chart.svg)
 
 ## License
 
-The model is licensed under the [Apache 2.0 license](LICENSE).
-
-## Contributing
-
-See [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).
-
-## Contributors
-
-The Segment Anything project was made possible with the help of many contributors (alphabetical):
-
-Aaron Adcock, Vaibhav Aggarwal, Morteza Behrooz, Cheng-Yang Fu, Ashley Gabriel, Ahuva Goldstand, Allen Goodman, Sumanth Gurram, Jiabo Hu, Somya Jain, Devansh Kukreja, Robert Kuo, Joshua Lane, Yanghao Li, Lilian Luong, Jitendra Malik, Mallika Malhotra, William Ngan, Omkar Parkhi, Nikhil Raina, Dirk Rowe, Neil Sejoor, Vanessa Stark, Bala Varadarajan, Bram Wasti, Zachary Winstrom
-
-## Citing Segment Anything
-
-If you use SAM or SA-1B in your research, please use the following BibTeX entry.
-
-```
-@article{kirillov2023segany,
-  title={Segment Anything},
-  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
-  journal={arXiv:2304.02643},
-  year={2023}
-}
-```
+`segment-anything-fast` is released under the [Apache 2.0](https://github.com/pytorch-labs/segment-anything-fast/main/LICENSE) license.
diff --git a/amg_example/README.md b/amg_example/README.md
@@ -0,0 +1,5 @@
+To run this example you need to download the vit_h checkpoint and put it into a local folder named checkpoints
+
+You can find the checkpoint for vit_h here: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+
+To read the image you also need to install opencv-python: https://pypi.org/project/opencv-python/
diff --git a/amg_example/amg_example.py b/amg_example/amg_example.py
@@ -0,0 +1,78 @@
+import numpy as np
+import torch
+import matplotlib.pyplot as plt
+import cv2
+import torch.utils.benchmark as benchmark
+
+def profiler_runner(path, fn, *args, **kwargs):
+    with torch.profiler.profile(
+            activities=[torch.profiler.ProfilerActivity.CPU,
+                        torch.profiler.ProfilerActivity.CUDA],
+            record_shapes=True) as prof:
+        result = fn(*args, **kwargs)
+    print(f"Saving trace under {path}")
+    prof.export_chrome_trace(path)
+    return result
+
+def show_anns(anns):
+    if len(anns) == 0:
+        return
+    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
+    ax = plt.gca()
+    ax.set_autoscale_on(False)
+
+    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
+    img[:,:,3] = 0
+    for ann in sorted_anns:
+        m = ann['segmentation']
+        color_mask = np.concatenate([np.random.random(3), [0.35]])
+        img[m] = color_mask
+    ax.imshow(img)
+
+image = cv2.imread('dog.jpg')
+image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+
+
+from segment_anything_fast import sam_model_registry, sam_model_fast_registry, SamAutomaticMaskGenerator
+
+sam_checkpoint = "checkpoints/sam_vit_h_4b8939.pth"
+model_type = "vit_h"
+device = "cuda"
+
+sam = sam_model_fast_registry[model_type](checkpoint=sam_checkpoint)
+sam.to(device=device)
+mask_generator = SamAutomaticMaskGenerator(sam, process_batch_size=8)
+
+# Run thrice for warmup
+masks = mask_generator.generate(image)
+masks = mask_generator.generate(image)
+masks = mask_generator.generate(image)
+
+# Save an example
+plt.figure(figsize=(image.shape[1]/100., image.shape[0]/100.), dpi=100)
+plt.imshow(image)
+show_anns(masks)
+plt.axis('off')
+plt.tight_layout()
+plt.savefig('dog_mask_fast.png', format='png')
+
+# Benchmark
+torch.cuda.synchronize()
+start_event = torch.cuda.Event(enable_timing=True)
+end_event = torch.cuda.Event(enable_timing=True)
+start_event.record()
+for _ in range(10):
+    masks = mask_generator.generate(image)
+end_event.record()
+torch.cuda.synchronize()
+print(start_event.elapsed_time(end_event) / 10.)
+
+# Save a GPU trace
+profiler_runner(f"amg_example_trace.json.gz", mask_generator.generate, image)
+
+# Write out memory usage
+max_memory_allocated_bytes = torch.cuda.max_memory_allocated()
+_, total_memory = torch.cuda.mem_get_info()
+max_memory_allocated_percentage = int(100 * (max_memory_allocated_bytes / total_memory))
+max_memory_allocated_bytes = max_memory_allocated_bytes >> 20
+print(f"memory(MiB): {max_memory_allocated_bytes} memory(%): {max_memory_allocated_percentage}")
diff --git a/amg_example/amg_example_trace.json.gz b/amg_example/amg_example_trace.json.gz
diff --git a/notebooks/images/dog.jpg → amg_example/dog.jpg b/notebooks/images/dog.jpg → amg_example/dog.jpg
diff --git a/amg_example/dog_mask.png b/amg_example/dog_mask.png
diff --git a/amg_example/dog_mask_fast.png b/amg_example/dog_mask_fast.png
diff --git a/assets/masks1.png b/assets/masks1.png
diff --git a/assets/masks2.jpg b/assets/masks2.jpg
diff --git a/assets/minidemo.gif b/assets/minidemo.gif
diff --git a/assets/model_diagram.png b/assets/model_diagram.png
diff --git a/assets/notebook1.png b/assets/notebook1.png
diff --git a/assets/notebook2.png b/assets/notebook2.png