Logblock: An LSM-based block device

Logblock is a software block device built on the concept of log-structured merge-trees (LSMs).

Logblock is inspired by how the Flash Translation Layer (FTLs) of an SSD works. But instead of NAND devices, we have blob storage systems, which are conceptually similar at the API level. By pushing the FTL up the stack, we can build block devices on top of abstract distributed storage systems, decoupling the block device from the underlying storage hardware. This is more CPU/memory instensive than simply mapping block ranges on a storage device to a logical block device, which is largely how traditional distributed block device are implemented. But decoupling from the storage hardware create a large amount of flexibility in terms of data allocation, replication, encoding, etc.

How to use

Currently, logblock is very basic and only supports storing data on the local filesystem. The logblock binary is located in cmd/logblock and can be run as:

% ./logblock -size 64G /dev/nbd0 /path/to/data/storage/directory
% sudo mke2fs /dev/nbd0
% sudo mount /dev/nbd0 /mnt/test

Overview

Conceptually, a block device can be thought of as a key-value store (a very limited one). The key is a fixed 8-byte integer, which is the block index (or LBA), and the value is simply the block data. Therefore, one can trivially build a block device on top of their favourite key-value database.

This is not particularly useful for building a locally-attached block device (maybe except for simulating a very large device for testing). However, one can use a distributed KV-store to build scalable, fault-tolerent block devices for VMs and other applications.

Logblock goes one step further, and instead of building on top of a database, it leverages the properties of a block device (i.e. fixed-size keys and values) to build a more efficient solution.

As an LSM, writes are written to a write-ahead log. Once the log reaches a certain size, it is "compacted" into a sparse block format. A block mapping keeps track of which file contains which block. The compaction step also frees up space which has been overwritten by more recently written blocks.

QNAs (Questions Nobody Asked)

Why is it called "logblock2"?

Because it's a "Log"-structured "Block" device. Also, naming is hard.

No, why is it called logblock_2_?

Because I have an original logblock git repo, which is >5 years old. It has multiple iterations of the write-ahead log and sparse block format, various block mapping data structures, a "graveyard" of unused code, and a git history which is starting to look like https://xkcd.com/1296/. It was probably best to start a new repo as I open-source this project.

How is this better than existing-distributed-block-device?

Honestly, it isn't. For one, this isn't a complete project, but rather a technical demo. It has too many missing features to be useful, not the least of which is integration with a blob/log storage backend.

TODO

TRIM support
Re-designed metadata model
Storage backends (i.e. S3, HDFS, etc)
Performance improvements
More tests
Alternate compaction strategies

LICENSE

Logblock is released under a BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
cmd/logblock		cmd/logblock
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
common.mk		common.mk
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logblock: An LSM-based block device

How to use

Overview

QNAs (Questions Nobody Asked)

Why is it called "logblock2"?

No, why is it called logblock_2_?

How is this better than existing-distributed-block-device?

TODO

LICENSE

About

Releases

Packages

Languages

License

akmistry/logblock2

Folders and files

Latest commit

History

Repository files navigation

Logblock: An LSM-based block device

How to use

Overview

QNAs (Questions Nobody Asked)

Why is it called "logblock2"?

No, why is it called logblock_2_?

How is this better than existing-distributed-block-device?

TODO

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages