forked from rykov8/ssd_keras
-
Notifications
You must be signed in to change notification settings - Fork 86
/
TODO.txt
129 lines (99 loc) · 4.01 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
fix data pipeline
fix load_weights, det by_name=False, rec by_name=True
detector that only detects polygon
prior_wh / 2 when input size is increased from 512 to 1024
docu, input shape, type and scale of models
update readme
images on top
input generator with dataset skips last incomplet batch
class weights?
plot_log, plot_history
plotpath = os.path.join(checkpath, 'plots')
os.makedirs(plotpath, exist_ok=True)
plt.savefig(os.path.join(plotpath, 'history_loss.png'), bbox_inches='tight') # jpg/png/pgf
ssd decoding layer, https://github.com/pierluigiferrari/ssd_keras/blob/master/keras_layers/keras_layer_DecodeDetections.py
DONE
replace model.fit_generator, eager training
end2end py and ipynb files
update to tf2, no sessions
rename sl_model > det_model
rename crnn_graph > rec_model
do recognition for all text instances from one image in one recognition sample
use separable convolution
requirements.txt
train models with natural aspect ratio
cleanup and update debugging notebooks
detection and recognition in one model?
change naming
true_labels, pred_labels
input/image size?
'plot' in 'draw'
'inter layer segments', inter vs within
gt_util_... vs gtu_...
GTUtility.split, GTUtility.merge
nms_thresh nms_threshold / iou_threshold
keep_top_k top_k
good_rboxs good_rboxes
.hdf5 > .h5
data augmentation
augmentation and batch size in validation?
add random rotation, translation, zero padding
SSD
use numba, check if numba is available and use conditional decorator
reduce memory usage
check if prior boxes are always at the same location in the receptive field
is it related to bad results on small objects?
convert fc6, fc7 vgg layers to conv layers
training time is much longer, is it caused by the missing pretrained fc layers?
more native image aspect ratio 128 * (4, 3) = (512, 384), but largest box is not in center
ResNet https://github.com/hzm8341/SSD_keras_restnet/blob/master/SSD_ResNet/SSD_resnet.py
MobileNet https://github.com/tanakataiki/ssd_keras/blob/master/ssd300MobileNet.py
do not crop boxes outside image
try input normalisation
DONE
remove flip option for aspect ratios
use softmax for each class, not categorical cross entropy
DSOD
precision, recall, f-mean / f-measure
GT background class
sample images from generator
https://github.com/scardine/image_size
no one_hot in gt_util
CustomCallbacks in ssd_traing, for learning rate and logging
try focal loss, with alpha = 1 and gamma=3 can boost 1.4% mAP on pascal voc 2007
custom epoch size, checkpoint callback, test, eval interval?
improve decoding performance, sparse, faster nms
hard negative mining?, fix it
virtual batch_size
SegLink
width of segment ground truth at the left and right edge?
directed links to get text orientation
training data, on icdar combined words to lines lead to better results
display overlap of segments
training
lambdas = 1
first train segments, then linking
parameters for sl loss and focal loss
debug
check if parameters get properly loaded
why some boxes are wrong
prior box is assigned to ground truth with smallest distance (if it lies in multiple gt boxes), is this the case?
DONE
grid search
confidence of combined boxes is the mean conf of the segments weighted by there area
more natural input ratio
TextBoxes / TextBoxes++
use recognition score to reduce false positives in detection
DONE
write a layer for decoding and cropping TBPP output
CRNN
larger / variable input width
more units
RGB input?
check synthtext? 'small caps' etc
normalize input, subtract mean
DONE
train fully convolutional CNN, should be faster then RNN
larger alphabet
GRU
editdistance / levenshtein distance