This is a text spotting composite model that simultaneously detects and recognizes text. The model detects symbol sequences separated by space and performs recognition without a dictionary. The model is built on top of the Mask-RCNN framework with additional attention-based text recognition head.
Alphabet is alphanumeric: abcdefghijklmnopqrstuvwxyz0123456789
.
Metric | Value |
---|---|
Word spotting hmean ICDAR2015, without a dictionary | 64.81% |
Source framework | PyTorch* |
Hmean Word spotting is defined and measured according to the Incidental Scene Text (ICDAR2015) challenge.
The text-spotting-0003-detector model is a Mask-RCNN-based text detector with ResNet50 backbone and additional text features output.
Metric | Value |
---|---|
GFlops | 184.495 |
MParams | 27.010 |
- Name:
im_data
, shape: [1x3x768x1280]. An input image in the [1xCxHxW] format. The expected channel order is BGR. - Name:
im_info
, shape: [1x3]. Image information: processed image height, processed image width, and processed image scale with respect to the original image resolution.
- Name:
labels
, shape: [100]. Contiguous integer class ID for every detected object,0
is for text class. - Name:
boxes
, shape: [100x5]. Bounding boxes around every detected object in the (top_left_x, top_left_y, bottom_right_x, bottom_right_y, confidence) format. - Name:
masks
, shape: [100x28x28]. Text segmentation masks for every output bounding box. - Name:
text_features.0
, shape [100x64x28x28]. Text features that are fed to a text recognition head.
The text-spotting-0003-recognizer-encoder model is a fully-convolutional encoder of text recognition head.
Metric | Value |
---|---|
GFlops | 2.082 |
MParams | 1.328 |
Name: input
, shape: [1x64x28x28]. Text recognition features obtained from detection part.
Name: output
, shape: [1x256x28x28]. Encoded text recognition features.
Metric | Value |
---|---|
GFlops | 0.106 |
MParams | 0.283 |
- Name:
encoder_outputs
, shape: [1x(28*28)x256]. Encoded text recognition features. - Name:
prev_symbol
, shape: [1x1]. Index in alphabet of previously generated symbol. - Name:
prev_hidden
, shape: [1x1x256]. Previous hidden state of GRU.
- Name:
output
, shape: [1x38]. Encoded text recognition features. Indices starting from 2 correspond to symbols from the alphabet. The 0 and 1 are special Start of Sequence and End of Sequence symbols correspondingly. - Name:
hidden
, shape: [1x1x256]. Current hidden state of GRU.
[*] Other names and brands may be claimed as the property of others.