Frz is a simple command-line utility that makes it possible to manage large amounts of data with Git in a completely peer-to-peer way. It can scan a repository to make sure that all files are present and undamaged, and if some files are missing or broken, it can look for good copies in arbitrary directory trees.
Frz replaces your files with symlinks that point to
content-indexed files in a directory named .frz. Frz manages
the (large) files in there, and Git manages the (tiny) symlinks.
You can use any Git workflow you like for the symlinks: sync with a
central repository, sync peer-to-peer, use tags, use branches, etc.
All you need to remember is to use frz add instead of git add.
Whenever a Git operation leaves you with symlinks that point to .frz
content files that you don’t yet have, you run frz fill to fetch the
missing content from a place that does have it.
frz fill scans the directory tree for symlinks whose targets are
missing, which is fast; there is also frz repair, which additionally
checks that the content files that you already have are not damaged.
-
You don’t need
frzif you just want to read your data. Allfrzdoes is write-protect your files and move them to a separate directory, leaving a directory tree with symlinks behind. -
Frz never stores multiple copies of your content. If you have a 10 TB disk, you can use Frz and Git to manage very close to 10 TB of data.
-
Frz deduplicates your data on a file level. If you have two files with the same content, Frz stores only one copy.
-
You can use the full suite of Git tools and workflows to manage the tree of symlinks. Sync with a central repository, sync peer-to-peer, use tags, use branches, inspect the commit graph, restore old revisions, etc. Frz doesn’t get in the way; you just need to run
frz fillwhenever a Git operation leaves you with symlinks that point to.frzcontent files that you don’t yet have. -
There’s no Frz server.
frz fillandfrz repairfetch files from any directory tree, not even necessarily another Frz repository. -
Content synchronisation, content error detection, and content repair are the same thing. When you run
frz fillto fetch content that you don’t yet have, Frz does exactly the same thing as when you runfrz repair, except that it skips some expensive verification of the content files that you already have. This makes Frz simpler to use and simpler to develop, and reduces the amount of code (and the number of bugs) in seldom-used error paths.
Frz is distributed under the Apache 2.0 license. See LICENSE.txt for details.
You need
On Debian and Ubuntu, sudo apt install cmake ninja gcc libgit2-dev.
On macOS with MacPorts, sudo port install cmake ninja gcc10 libgit2. When invoking cmake (see below),
use the extra options -D CMAKE_C_COMPILER=/opt/local/bin/gcc-mp-10 -D CMAKE_CXX_COMPILER=/opt/local/bin/g++-mp-10 (replacing /opt/local/
with whatever install directory you chose for MacPorts).
Download the sources; then, in that directory,
$ mkdir build
$ cd build
$ cmake -G Ninja -D CMAKE_BUILD_TYPE=Release ..
$ ninja frz
$ ./frz --help # run the new frz binary!
There’s a separate page with technical details for the curious.
Contributions are welcome! See the developer documentation.
We only have two modestly-sized files in this example, but Frz is designed to work with terabytes of data and as many files as Git can handle.
$ mkdir -p frz-test/foo $ cd frz-test $ head -c 1G /dev/urandom > one.bin $ head -c 2G /dev/urandom > foo/two.bin
One day, there will be a frz init command. Until then, we have to do
this by hand:
$ git init Initialized empty Git repository in /tmp/frz-test/.git/ $ mkdir .frz $ echo .frz >> .git/info/exclude
frz add will add all files in a specified directory tree. “.” is
the current directory, so this command will add everything:
$ frz add . + foo/two.bin + one.bin 2 files successfully added 0 files successfully added and deduplicated 0 directory entries skipped because they weren't regular files 0 files skipped because of errors
We can see that frz add replaced the files with symlinks, and that
the file contents is stored in the .frz directory:
$ ls -l one.bin lrwxrwxrwx 1 user user 72 Feb 21 13:59 one.bin -> .frz/blake3/nr/r6/ttns8389pbzhmdgk1bf319bh6m3hmukans2nhg4ze85h73q1000000
Now commit it. frz add already ran git add for us, so we only need
to run git commit:
$ git commit -m "My first commit" 2 files changed, 2 insertions(+) create mode 120000 foo/two.bin create mode 120000 one.bin
$ cd .. $ git clone frz-test frz-test-clone Cloning into 'frz-test-clone'... done. $ cd frz-test-clone $ mkdir .frz $ echo .frz >> .git/info/exclude
At this point, we have the whole git-controlled tree of symlinks, but
are missing the content files that the symlinks point to. frz fill
will fix that for us, if we point it to a directory tree where copies
of the required files can be found:
$ frz fill --copy-from ../frz-test Checking that referenced content is present... Listing files in /tmp/frz-test... done (34 files) Hashing files... done (0 files, 2147483648 bytes) Hashing files... done (0 files, 1073741824 bytes) Checking that referenced content is present... done (2 links) Content files 2 missing (restored) 0 missing (not restored)
We remove the write protection from one of our files and overwrite the
first kilobyte. When we run frz repair, the checksum fails to match,
and frz looks for a good copy of the file in the directory we
specified.
$ chmod u+w one.bin $ dd if=/dev/urandom of=one.bin bs=1k count=1 conv=notrunc 1+0 records in 1+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000210962 s, 4.9 MB/s $ frz repair --copy-from ../frz-test Checking index links and content files... Removing nrr6ttns8389pbzhmdgk1bf319bh6m3hmukans2nhg4ze85h73q1000000 from the index because it points to x8, which has the wrong hash (5jw4kk42pw8fxabdz10wzxyx5g1zd0j93txtfcze39gsrf30tnbh000000). Checking index links and content files... done (2 links, 2 files) Checking orphaned content files... Adding 5jw4kk42pw8fxabdz10wzxyx5g1zd0j93txtfcze39gsrf30tnbh000000 to the index, pointing to x8 (content was already present, but not indexed). Checking orphaned content files... done (1 files, 1073741824 bytes) Checking that referenced content is present... Listing files in /tmp/frz-test... done (34 files) Hashing files... done (0 files, 1073741824 bytes) Checking that referenced content is present... done (2 links) Index symlinks 1 OK 1 bad (removed) 1 missing (recreated) Content files 0 duplicates (moved aside) 1 missing (restored) 0 missing (not restored)