Unlike hierarchical POSIX, object storage is flat, treating forward slash ('/') in object names as simply another symbol.
But that's not the entire truth. The other part of it is that user may want to operate on (ie., list, load, shuffle, copy, transform, etc.) subset of objects in a dataset that, for the lack of better word, looks exactly like a directory.
In fact, user often wants to do exactly that.
Train, for instance, on all audio files under en_es_synthetic/v1/train/
, or similar.
In object storages, the term for quote/unquote "what looks like a directory" is virtual directory or synthetic directory.
The motivation may become clearer if I say that the entire real-life dataset
contains many millions of objects and numerous virtual directories, including the aforementioned en_es_synthetic/v1/train/
.
Needless to say, aistore provides for all of that and more. There is a certainty subtlety, however, that makes sense to illustrate on examples.
- normally, remote backends do not return virtual directories, with two exceptions:
list-objects
operation is non-recursive (APIapc.LsNoRecursion
in the control message, CLI--nr
switch);- the bucket in question contains some sort of special directory that shows up anyway (e.g. bucket inventory).
list-objects
will always return virtual directories, assuming:- the corresponding backend's response includes those (see above), and
- user does not specify
apc.LsNoDirs
(CLI--no-dirs
)
- the output is always sorted alphanumerically, directories-first
$ ais ls s3://speech --prefix .inventory
NAME SIZE CACHED
.inventory/speech/data/
.inventory/speech/2024-05-31T01-00Z/manifest.checksum 33B no
.inventory/speech/2024-05-31T01-00Z/manifest.json 406B no
.inventory/speech/data/985fc9cb-5957-4fc8-b26d-092685a747e8.csv.gz 54.14MiB no
.inventory/speech/data/9dac8de5-cff9-432c-9663-b054ae5ce357.csv.gz 54.14MiB no
.inventory/speech/hive/dt=2024-05-30-01-00/symlink.txt 85B no
.inventory/speech/hive/dt=2024-05-31-01-00/symlink.txt 85B no
$ ais ls s3://speech/.inventory
$ ais ls s3://speech --prefix .inventory --no-dirs
NAME SIZE CACHED
.inventory/speech/2024-05-31T01-00Z/manifest.checksum 33B no
.inventory/speech/2024-05-31T01-00Z/manifest.json 406B no
.inventory/speech/data/985fc9cb-5957-4fc8-b26d-092685a747e8.csv.gz 54.14MiB no
.inventory/speech/data/9dac8de5-cff9-432c-9663-b054ae5ce357.csv.gz 54.14MiB no
.inventory/speech/hive/dt=2024-05-30-01-00/symlink.txt 85B no
.inventory/speech/hive/dt=2024-05-31-01-00/symlink.txt 85B no
$ ais ls s3://speech --prefix .inventory/speech/ --nr
NAME SIZE CACHED
.inventory/speech/2024-05-31T01-00Z/
.inventory/speech/data/
.inventory/speech/hive/
$ ais ls s3://speech --prefix .inventory/speech/data/ --nr
NAME SIZE CACHED
.inventory/speech/data/
.inventory/speech/data/985fc9cb-5957-4fc8-b26d-092685a747e8.csv.gz 54.14MiB no
.inventory/speech/data/9dac8de5-cff9-432c-9663-b054ae5ce357.csv.gz 54.14MiB no