Skip to content

akmistry/logblock2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logblock: An LSM-based block device

Logblock is a software block device built on the concept of log-structured merge-trees (LSMs).

Logblock is inspired by how the Flash Translation Layer (FTLs) of an SSD works. But instead of NAND devices, we have blob storage systems, which are conceptually similar at the API level. By pushing the FTL up the stack, we can build block devices on top of abstract distributed storage systems, decoupling the block device from the underlying storage hardware. This is more CPU/memory instensive than simply mapping block ranges on a storage device to a logical block device, which is largely how traditional distributed block device are implemented. But decoupling from the storage hardware create a large amount of flexibility in terms of data allocation, replication, encoding, etc.

How to use

Currently, logblock is very basic and only supports storing data on the local filesystem. The logblock binary is located in cmd/logblock and can be run as:

% ./logblock -size 64G /dev/nbd0 /path/to/data/storage/directory
% sudo mke2fs /dev/nbd0
% sudo mount /dev/nbd0 /mnt/test

Overview

Conceptually, a block device can be thought of as a key-value store (a very limited one). The key is a fixed 8-byte integer, which is the block index (or LBA), and the value is simply the block data. Therefore, one can trivially build a block device on top of their favourite key-value database.

This is not particularly useful for building a locally-attached block device (maybe except for simulating a very large device for testing). However, one can use a distributed KV-store to build scalable, fault-tolerent block devices for VMs and other applications.

Logblock goes one step further, and instead of building on top of a database, it leverages the properties of a block device (i.e. fixed-size keys and values) to build a more efficient solution.

As an LSM, writes are written to a write-ahead log. Once the log reaches a certain size, it is "compacted" into a sparse block format. A block mapping keeps track of which file contains which block. The compaction step also frees up space which has been overwritten by more recently written blocks.

QNAs (Questions Nobody Asked)

Why is it called "logblock2"?

Because it's a "Log"-structured "Block" device. Also, naming is hard.

No, why is it called logblock_2_?

Because I have an original logblock git repo, which is >5 years old. It has multiple iterations of the write-ahead log and sparse block format, various block mapping data structures, a "graveyard" of unused code, and a git history which is starting to look like https://xkcd.com/1296/. It was probably best to start a new repo as I open-source this project.

How is this better than existing-distributed-block-device?

Honestly, it isn't. For one, this isn't a complete project, but rather a technical demo. It has too many missing features to be useful, not the least of which is integration with a blob/log storage backend.

TODO

  • TRIM support
  • Re-designed metadata model
  • Storage backends (i.e. S3, HDFS, etc)
  • Performance improvements
  • More tests
  • Alternate compaction strategies

LICENSE

Logblock is released under a BSD 3-Clause License.

About

Logblock: An LSM-based block device

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published