In the past few years, a number of neural networks for image segmentation were designed with considerable success. One of the most prominent attempts was the U-Net by Ronneberger et al. (2015) [1]. The network follows a symmetric shape in its architecture, from which it gets its name.
CONTRACTING PATH The contracting path is a convolutional network which doubles the channels of features to be learned in each downsampling step. Each of these steps applies two 3x3 unpadded convolutions in combination with ReLU layers. Afterwards, a 2x2 max pooling layer with a stride of 2 is used.
EXPANSIVE PATH
In the expansive part, the data is upsampled using a 2x2 up-convolution (transposed convolution). Afterwards, two 3x3 convolutional layers (+ ReLU) are performed. In contrast to the downsampling, they halve the number of feature channels. In order to give the neural network more high resolution information, skip connections (gray arrows) between the contracting and expansive are utilized. Those are realized by cropping the result from the last convolution of the according contracting step (white box) to the same feature size as the result of the up-convolution (blue box) in the current expansive step. Then both results are concatenated and used as the input for 3x3 convolutions. Finally, the segmented output is generated by a 1x1 convolution mapping to the defined number of classes / segments.
The original network structure as outlined by the paper was maintained, with a few minor changes to improve results:
- Batch normalization was added after each convolution step in the model, except for the final layer.
- Padding was performed on the up tensor to resolve dimensionality issues. This results in the output of the network being the same size as the input.
REFERENCES [1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.