-
Notifications
You must be signed in to change notification settings - Fork 0
/
tutorial3_feature_fusion.sh
402 lines (321 loc) · 14.9 KB
/
tutorial3_feature_fusion.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
#!/bin/bash
__doc__='
This shell script serves as an executable example for how to train and evaluate
a fusion model on SMART project data.
This tutorial assumes you have:
1. Setup the project DVC repo
2. Have registered the location of your DVC repo with geowatch_dvc.
4. Have pulled the appropriate dataset (in this case Drop4)
and have unzipped the annotations.
3. Have a script that predicts features you would like to test.
5. Have the IARPA metrics code installed:
# Clone this repo and pip install it to your watch environment
https://gitlab.kitware.com/smart/metrics-and-test-framework
See these docs for details:
../docs/getting_started_dvc.rst
../docs/access_dvc_repos.rst
../docs/using_geowatch_dvc.rst
This tutorial will cover:
1. Predicting your features.
2. Training a fusion model with your features.
3. Packaging your fusion model checkpoints.
4. Evaluating your fusion model against the baseline.
'
DATA_DVC_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
EXPT_DVC_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)
echo "
EXPT_DVC_DPATH=$EXPT_DVC_DPATH
DATA_DVC_DPATH=$DATA_DVC_DPATH
"
__doc_compute_feature__='
Your predict command must specify:
1. the path to the input kwcoco file
2. the path to the output kwcoco file which will contain your features
(Avoid requiring that other output paths are specified. Use default
paths that are relative to the directory of the output kwcoco file)
3. the path to your model(s)
4. any other CLI parameters to configure details of feature prediction.
You will have to specify the exact details for your features, but as an example we
provide a script that will work to predict invariant features if your machine has
enough resources (you need over 100GB of RAM as of 2022-12-21; we would like to fix
this in the future).
You will also need to ensure the referenced model is pulled from the experiments DVC repo.
'
compute_features(){
# A bash function that runs invariant prediction on a kwcoco file.
SRC_KWCOCO_FPATH=$1
DST_KWCOCO_FPATH=$1
python -m geowatch.tasks.invariants.predict \
--input_kwcoco="$SRC_KWCOCO_FPATH" \
--output_kwcoco="$DST_KWCOCO_FPATH" \
--pretext_package_path="$EXPT_DVC_DPATH"/models/uky/uky_invariants_2022_12_05/TA1_pretext_model/pretext_package.pt \
--input_space_scale=30GSD \
--window_space_scale=30GSD \
--patch_size=256 \
--do_pca 0 \
--patch_overlap=0.0 \
--num_workers="2" \
--write_workers 2 \
--tasks before_after pretext
}
# Compute your features on the train and validation dataset
compute_features \
"$DATA_DVC_DPATH"/Drop4-BAS/data_train.kwcoco.json
"$DATA_DVC_DPATH"/Drop4-BAS/data_train_invariants.kwcoco.json
compute_features \
"$DATA_DVC_DPATH"/Drop4-BAS/data_vali.kwcoco.json
"$DATA_DVC_DPATH"/Drop4-BAS/data_vali_invariants.kwcoco.json
# After your model predicts the outputs, you should be able to use the
# geowatch visualize tool to inspect your features. The specific channels you
# select will depend on the output of your predict script.
python -m geowatch visualize "$DATA_DVC_DPATH"/Drop4-BAS/data_vali_invariants.kwcoco.json \
--channels "invariants.5:8,invariants.8:11,invariants.14:17" --stack=only --workers=avail --animate=True \
--draw_anns=False
# shellcheck disable=SC2016
__doc_data_splits__='
Because only some of the regions actually need 100GB to compute the invariants,
it is possible to split the train and validation kwcoco files into a single
kwcoco file per-video and run the compute_features function on the output
individually.
.. code:: bash
python -m geowatch.cli.split_videos \
--src "$DATA_DVC_DPATH/Drop4-BAS/data_train.kwcoco.json" \
"$DATA_DVC_DPATH/Drop4-BAS/data_vali.kwcoco.json" \
--dst_dpath "$DATA_DVC_DPATH/Drop4-BAS/"
In fact, if your feature prediction script is registered with the
prepare_teamfeats tool, then you can schedule prediction to run on all of them
individually. You can specify a pattern as the input to this tool.
.. code:: bash
python -m geowatch.cli.queue_cli.prepare_teamfeats \
--base_fpath \
"$DATA_DVC_DPATH/Drop4-BAS/data_train_*.kwcoco.json" \
"$DATA_DVC_DPATH/Drop4-BAS/data_vali_*.kwcoco.json" \
--expt_dpath="$EXPT_DVC_DPATH" \
--with_landcover=0 \
--with_materials=0 \
--with_invariants=0 \
--with_invariants2=1 \
--with_depth=0 \
--gres=0, --workers=1 --backend=tmux --run=1
You can then union any custom set of regions into a train and validation kwcoco
file for the subsequent steps.
.. code:: bash
DATA_DVC_DPATH=$(geowatch_dvc --tags=phase2_data --hardware=auto)
EXPT_DVC_DPATH=$(geowatch_dvc --tags=phase2_expt --hardware=auto)
kwcoco union \
--src $DATA_DVC_DPATH/Drop4-BAS/*_train_*_uky_invariants*.kwcoco.json \
--dst $DATA_DVC_DPATH/Drop4-BAS/combo_train_I2.kwcoco.json
kwcoco union \
--src $DATA_DVC_DPATH/Drop4-BAS/*_vali_*_uky_invariants*.kwcoco.json \
--dst $DATA_DVC_DPATH/Drop4-BAS/combo_vali_I2.kwcoco.json
We recognize that this is currently a pain-point, but we hope that the existing
tools make it somewhat easier to solve or work around problems, and we hope
that our tooling improves to make this even easier in the future.
'
__doc_run_fusion__='
Now that we have a train and validation kwcoco dataset that contain our
computed features we can train or fine-tune a fusion model.
The following is a set of baseline settings that you should start with. We
also encourage you to try other hyperparameter settings to maximize the
effectiveness of your features. But you should at least train once with this
configuration as a baseline.
'
# Set according to your hardware requirements
# TODO: expose the unused GPU script and use that.
export CUDA_VISIBLE_DEVICES=0
DATA_DVC_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware='auto')
EXPT_DVC_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
DATASET_CODE=Drop4-BAS
KWCOCO_BUNDLE_DPATH=$DATA_DVC_DPATH/$DATASET_CODE
# You should specify a unique name for your experiment.
# This name will be the default in reports generated by the watch mlops
EXPERIMENT_NAME=Drop4_BAS_my_feature_experiment_$(date --iso-8601)
# These are the paths to the kwcoco files that should contain your features
TRAIN_FPATH=$KWCOCO_BUNDLE_DPATH/data_train_invariants.kwcoco.json
VALI_FPATH=$KWCOCO_BUNDLE_DPATH/data_vali_invariants.kwcoco.json
# The pretrained state should be checked out of DVC. This is the best BAS
# model as of 2022-12-21, we will partially initialize a subset of the network
# with these weights.
PRETRAINED_STATE="$EXPT_DVC_DPATH"/models/fusion/Drop4-BAS/packages/Drop4_TuneV323_BAS_30GSD_BGRNSH_V2/package_epoch0_step41.pt.pt
# You can use the model_stats command to inspect details about any fusion model.
geowatch model_stats "$PRETRAINED_STATE"
# shellcheck disable=SC2016
__doc_channel_conf__='
When training a fusion model, you must specify a channel configuration.
By default we recommend imputing your features as a separate "stream" in
addition to the original six raw bands.
Remember, early fused channels are separated with a pipe (|) and late fused
channel groups are separated with a comma. This means in the sensorchan
configuration, separate your channels from the raw channels with a comma. E.g.
blue|green|red|nir|swir16|swir22,invariants.0:17
By default each channel assumes it exists in each sensor. You can specify
which channels belong to what sensors by prefixing a group. For instance:
(S2,L8):(blue|green|red|nir|swir16|swir22),(S2):(invariants.0:17)
The above uses S2 and L8 raw bands, but only adds the invariants from
Sentinel-2 images.
You may try early fusing your features with the RGB channels, or any more
complex input channel scheme, but you must train the simple late fused network
as a baseline.
'
CHANNELS="(S2,L8):(blue|green|red|nir|swir16|swir22),(S2,L8):(invariants.0:17)"
# We recommend this training directory layout to differentiate
# training runs on different machines / from different people.
WORKDIR=$EXPT_DVC_DPATH/training/$HOSTNAME/$USER
DEFAULT_ROOT_DIR=$WORKDIR/$DATASET_CODE/runs/$EXPERIMENT_NAME
MAX_STEPS=10000
TARGET_LR=5e-5
python -m geowatch.tasks.fusion fit --config "
data:
num_workers : 3
train_dataset : $TRAIN_FPATH
vali_dataset : $VALI_FPATH
channels : '$CHANNELS'
time_steps : 5
chip_dims : '224,224'
batch_size : 2
window_space_scale : 10GSD
input_space_scale : 10GSD
output_space_scale : 10GSD
dist_weights : false
neg_to_pos_ratio : 0.1
time_sampling : soft2-contiguous-hardish3
time_span : '3m-6m-1y'
use_centered_positives : true
normalize_inputs : 128
temporal_dropout : 0.5
resample_invalid_frames : 1
quality_threshold : 0.8
model:
class_path: MultimodalTransformer
init_args:
name : $EXPERIMENT_NAME
arch_name : smt_it_stm_p8
tokenizer : linconv
decoder : mlp
stream_channels : 16
saliency_weights : 1:70
class_loss : focal
saliency_loss : focal
global_change_weight : 0.00
global_class_weight : 0.00
global_saliency_weight : 1.00
lr_scheduler:
class_path: torch.optim.lr_scheduler.OneCycleLR
init_args:
max_lr: $TARGET_LR
total_steps: $MAX_STEPS
anneal_strategy: linear
pct_start: 0.05
optimizer:
class_path: torch.optim.Adam
init_args:
lr: $TARGET_LR
weight_decay: 1e-3
betas:
- 0.9
- 0.99
trainer:
accumulate_grad_batches: 8
default_root_dir : $DEFAULT_ROOT_DIR
accelerator : gpu
devices : 0,
#devices : 0,1
#strategy : ddp
check_val_every_n_epoch: 1
enable_checkpointing: true
enable_model_summary: true
log_every_n_steps: 5
logger: true
max_steps: $MAX_STEPS
num_sanity_val_steps: 0
replace_sampler_ddp: true
track_grad_norm: 2
initializer:
init: $PRETRAINED_STATE
"
# The result of training will output a list of checkpoints in the lightning
# output directory
ls "$DEFAULT_ROOT_DIR"/lightning_logs/*/checkpoints/*.ckpt
# To use them we need to ensure they are packaged.
# Let's assume we have a checkpoint, This command should grab one of them, you
# should be more selective in the one(s) you choose.
CHECKPOINT_FPATH=$(for i in "$DEFAULT_ROOT_DIR"/lightning_logs/*/checkpoints/*.ckpt; do printf '%s\n' "$i"; break; done)
echo "CHECKPOINT_FPATH = $CHECKPOINT_FPATH"
# repackage it as such: (This command may change in the future to make this
# easier / more robust, but it should work in this context)
python -m geowatch.mlops.repackager "$CHECKPOINT_FPATH"
# That should have written a .pt package with a similar name. To make this
# bash script work, we will just glob for a package and assume its the one we
# want.
PACKAGE_FPATH=$(for i in "$DEFAULT_ROOT_DIR"/lightning_logs/*/checkpoints/*.pt; do printf '%s\n' "$i"; break; done)
echo "PACKAGE_FPATH = $PACKAGE_FPATH"
__doc_eval__='
Now we have a trained packaged model that is aware of your team features. The
goal is to use it to demonstrate an improvement in the IAPRA scores. This can
be done using the mlops framework. You can specify multiple values for an
option to grid search over the Cartesian product of all settings. You should
at the least include your model and the baseline model to determine if your
features are driving an improvement in the scores.
'
DATA_DVC_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
EXPT_DVC_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)
BASELINE_PACKAGE_FPATH="$EXPT_DVC_DPATH"/models/fusion/Drop4-BAS/packages/Drop4_TuneV323_BAS_30GSD_BGRNSH_V2/package_epoch0_step41.pt.pt
geowatch model_stats "$BASELINE_PACKAGE_FPATH"
# NOTE:
# The schedule evaluation script originally ran on a single coco file that
# contains all of the validation regions. A more stable way to run the system
# involves splitting the larger validation dataset into a single kwcoco file
# per region, and then running it on all regions separately.
python -m geowatch.cli.split_videos "$DATA_DVC_DPATH"/Drop4-BAS/data_vali_invariants.kwcoco.json
python -m geowatch.mlops.schedule_evaluation \
--params="
matrix:
bas_pxl.package_fpath:
- $PACKAGE_FPATH
- $BASELINE_PACKAGE_FPATH
bas_pxl.channels:
- 'auto'
bas_pxl.test_dataset:
- $DATA_DVC_DPATH/Drop4-BAS/data_vali_KR_R001_uky_invariants.kwcoco.json
- $DATA_DVC_DPATH/Drop4-BAS/data_vali_KR_R002_uky_invariants.kwcoco.json
- $DATA_DVC_DPATH/Drop4-BAS/data_vali_US_R007_uky_invariants.kwcoco.json
bas_pxl.chip_dims: auto
bas_pxl.chip_overlap: 0.3
bas_pxl.window_space_scale: auto
bas_pxl.output_space_scale: auto
bas_pxl.input_space_scale: auto
bas_pxl.time_span: auto
bas_pxl.time_sampling: auto
bas_poly.moving_window_size: null
bas_poly.thresh:
- 0.1
bas_pxl.enabled: 1
bas_poly.enabled: 1
bas_poly_eval.enabled: 1
bas_pxl_eval.enabled: 1
bas_poly_viz.enabled: 1
" \
--root_dpath="$EXPT_DVC_DPATH/_evaluations" \
--devices="0," --queue_size=1 \
--backend=tmux --queue_name "demo-queue" \
--pipeline=bas \
--run=1
### NOTE:
# The above script assumes that your bashrc activates the appropriate
# virtualenv by default. If this is not the case you will need to specify an
# additional argument to `watch.mlops.schedule_evaluation`. Namely:
# ``--virtualenv_cmd``. For instance if you have a conda environment named
# "watch", you would add ``--virtualenv_cmd="watch"`` to the command.
__doc_mlops__='
This script will run through the entire BAS pipeline and output results in the
"root_dpath".
The names of the outputs are chosen based on a hash of the configuration, which
enables us to reuse existing results. Symlinks are setup such that it is clear
what previous steps a specific result relied on.
The important part is that there will be a folder for each pipeline node.
The bas_pxl_eval node is the IARPA evaluation results and that stores the
metrics we are interested in.
For now you should manually inspect those results, but in the future
the mlops framework will contain a way to aggregate and analyze results
automatically.
'