-
Notifications
You must be signed in to change notification settings - Fork 185
Architecture
The lowest-level public interface into the TileDB library is through the C API. The symbols from the C++ source are not exported. All other APIs wrap the C API, including the C++ API.
Both the C and C++ APIs are included in the core repository. All other APIs reside in their own repository. Everything in the core source (tiledb/sm/*
) except for the c_api
and cpp_api
directories are inaccessible outside of the source itself.
C API: https://github.com/TileDB-Inc/TileDB/tree/dev/tiledb/sm/c_api
C++ API: https://github.com/TileDB-Inc/TileDB/tree/dev/tiledb/sm/cpp_api
Python API: https://github.com/TileDB-Inc/TileDB-Py
R API: https://github.com/TileDB-Inc/TileDB-R
Go API: https://github.com/TileDB-Inc/TileDB-Go
This is a broad but non-exhaustive ownership graph of the major classes within the TileDB core.
Array: An in-memory representation of a single on-disk TileDB array.
ArraySchema: Defines an array.
Attribute: Defines a single attribut.
Domain: Defines the array domain.
Dimension: Defines a dimension within the array domain.
FragmentMetadata: An in-memory representation of a single on-disk fragment's metadata.
RTree: Contains minimum bounding rectangles (MBRs) for a single fragment.
Context: A session state.
StorageManager: Facilitates all access between the user and the on-disk files.
VFS: The virtual filesystem interface that abstracts IO from the configured backend/filesystem.
Posix: IO interface to a POSIX-compliant filesystem.
Win: IO interface to a Windows filesystem.
S3: IO interface to an S3 bucket.
Azure: IO interface to an Azure Storage Blob.
GCS: IO interface to a Google Cloud Storage Bucket.
Consolidator: Implements the consolidation operations for fragment data, fragment metadata, and array metadata.
Query: Defines and provides state for a single IO query.
Reader: IO state for a read-query.
SubarrayPartioner: Slices a single subarray into smaller subarrays.
Subarray: Defines the bounds of an IO operation within an array.
FilterPipeline: Transforms data between memory and disk during an IO operation, depending on the defined filters within the schema.
Tile: An in-memory representation of an on-disk data tile.
ChunkedBuffer: Organizes tile data into chunks.
Write: IO state for a write-query.
The current on-disk format spec can be found here:
https://github.com/TileDB-Inc/TileDB/blob/dev/format_spec/FORMAT_SPEC.md.
The Array
class provides an in-memory representation of a single TileDB array. The ArraySchema
class stores the contents of the __array_schema.tdb
file. The Domain
, Dimension
, and Attribute
classes represent the sections of the array schema that they are named for. The Metadata
class represents the __meta
directory and nested files. The FragmentMetadata
represents a single __fragment_metadata.tdb
file, one per fragment. Tile data (e.g. attr.tdb
and attr_var.tdb
) is stored within instances of Tile
.
The VFS (virtual filesystem) provides a filesystem-like interface file management and IO. It abstracts one of the six currently-available "backends" (sometimes referred to as "filesystems"). The available backends are: POSIX, Windows, AWS S3, Azure Blob Storage, Google Cloud Storage, and Hadoop Distributed File System.
The read path serves two primary functions:
- Large reads are split into smaller batched reads.
- Modifies small read requests to read-ahead more bytes than requested. After the read, the excess bytes are cached in-memory. Read-ahead buffers are cached by URI in an LRU policy.
The write path directly passes the write request to the backend, deferring parallelization, write caching, and write flushing.