This is a U-Net model that is designed to perform semantic segmentation. The model has been trained on the CamVid dataset from scratch using PyTorch framework. Training used median frequency balancing for class weighing. For details about the original floating-point model, check out U-Net: Convolutional Networks for Biomedical Image Segmentation.
The model input is a blob that consists of a single image of 1x3x368x480
in the BGR order. The pixel values are integers in the [0, 255] range.
The model output for unet-camvid-onnx-0001
is the per-pixel probabilities of each input pixel belonging to one of the 12 classes of the CamVid dataset.
Metric | Value |
---|---|
GFlops | 260.1 |
MParams | 31.03 |
Source framework | PyTorch* |
The quality metrics were calculated on the CamVid validation dataset. The unlabeled
class had been ignored during metrics calculation.
Metric | Value |
---|---|
mIoU | 71.95% |
IOU=TP/(TP+FN+FP)
, where:TP
- number of true positive pixels for given classFN
- number of false negative pixels for given classFP
- number of false positive pixels for given class
Image, shape - 1,3,368,480
, format is B,C,H,W
where:
B
- batch sizeC
- channelH
- heightW
- width
Channel order is BGR
Semantic segmentation class probabilities map, shape -1,12,368,480
, output data format is B,C,H,W
where:
B
- batch sizeC
- predicted probabilities of input pixel belonging to classC
in the [0, 1] rangeH
- horizontal coordinate of the input pixelW
- vertical coordinate of the input pixel
[*] Other names and brands may be claimed as the property of others.