Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
cache_uuids.py		cache_uuids.py
cassandra_reader.py		cassandra_reader.py
create_tables.cql		create_tables.cql
extract_common.py		extract_common.py
extract_serial.py		extract_serial.py
extract_spark.py		extract_spark.py
fn_shortcuts.py		fn_shortcuts.py
loop_read.py		loop_read.py
private_data.py		private_data.py

README.md

ADE20K Outdoors dataset

In this segmentation example we will import the ADE20K Outdoors dataset as a Cassandra dataset and then read the data into NVIDIA DALI.

As a first step, the raw files are to be downloaded from:

https://www.kaggle.com/datasets/residentmario/ade20k-outdoors

or, if you have installed Kaggle API, you can just run this command:

$ kaggle datasets download -d residentmario/ade20k-outdoors

In the following we will assume the original images are stored in the /data/ade20k/directory.

Starting Cassandra server

We begin by starting the Cassandra server shipped with the provided Docker container:

# Start Cassandra server
$ /cassandra/bin/cassandra

Note that the shell prompt is immediately returned. Wait until state jump to NORMAL is shown (about 1 minute).

Storing the (unchanged) images in the DB

The following commands will insert the original dataset in Cassandra and use the plugin to read the images in NVIDIA DALI.

# - Create the tables in the Cassandra DB
$ cd examples/ade20k/
$ /cassandra/bin/cqlsh -f create_tables.cql

# - Fill the tables with data and metadata
$ python3 extract_serial.py /data/ade20k/images/training/ /data/ade20k/annotations/training/ --data-table=ade20k.data --metadata-table=ade20k.metadata
# - Read the list of UUIDs and cache it to disk
$ python3 cache_uuids.py --metadata-table=ade20k.metadata --rows-fn=ade20k.rows

# - Tight loop data loading test in host memory
$ python3 loop_read.py --data-table=ade20k.data --rows-fn=ade20k.rows

# - Tight loop data loading test in GPU memory (GPU:0)
$ python3 loop_read.py --data-table=ade20k.data --rows-fn=ade20k.rows --use-gpu

# - Sharded, tight loop data loading test, using 2 processes via torchrun
$ torchrun --nproc_per_node=2 loop_read.py --data-table=ade20k.data --rows-fn=ade20k.rows

Compare with DALI fn.readers.file

The same scripts can be used to read the dataset from the filesystem, using the standard DALI file reader.

# - Tight loop data loading test in host memory
$ python3 loop_read.py --reader=file --image-root=/data/ade20k/images/ --mask-root=/data/ade20k/annotations/

# - Tight loop data loading test in GPU memory (GPU:0)
$ python3 loop_read.py --reader=file --image-root=/data/ade20k/images/ --mask-root=/data/ade20k/annotations/ --use-gpu

# - Sharded, tight loop data loading test, using 2 processes via torchrun
$ torchrun --nproc_per_node=2 loop_read.py --reader=file --image-root=/data/ade20k/images/ --mask-root=/data/ade20k/annotations/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ade20k

ade20k

README.md

ADE20K Outdoors dataset

Starting Cassandra server

Storing the (unchanged) images in the DB

Compare with DALI fn.readers.file

Files

ade20k

Directory actions

More options

Directory actions

More options

Latest commit

History

ade20k

Folders and files

parent directory

README.md

ADE20K Outdoors dataset

Starting Cassandra server

Storing the (unchanged) images in the DB

Compare with DALI fn.readers.file