Skip to content

Commit

Permalink
Merge pull request #70 from poettering/v2-prep
Browse files Browse the repository at this point in the history
preparations for v2
  • Loading branch information
poettering authored Jul 26, 2017
2 parents effa00a + 23a89b6 commit 68d4928
Show file tree
Hide file tree
Showing 8 changed files with 280 additions and 53 deletions.
70 changes: 70 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
~~~ casync 2 ~~~

- casync now supports retrieving index and chunk data from sftp:// URLs. (In
addition to the existing ftp://, http:// and https:// support).

- casync will now honour $TMP if it is set, for placing temporary files and
directories.

- casync now saves/restores basic btrfs subvolume information. (Specifically it
will store whether a directory is a subvolume, and whether it has the
read-only bit set.) Control this metadata option with the new
--with=subvolume/--without=subvolume and
--with=subvolume-ro/--without=subvolume-ro switches.

- casync now saves/restores SELinux label information. Control this metadata
option with the new --with=selinux/--without=selinux switches.

- The libgcrypt dependency has been replaced with an OpenSSL dependency, as
that appears to be better supported today, and may be used to generate
SHA512/256 hashes (see below).

- casync now permits selecting the hash function to use with the new --digest=
option. SHA512/256 is now supported in addition to the old SHA256 algorithm,
which continues to be supported. The new default however is SHA512/256, as it
is substantially faster at otherwise equal properties on today's 64bit
processors. In specific environments SHA256 might perform better, hence both
algorithms remain supported. Index files contain information about the hash
algorithm used, hence automatic compatibility is retained.

- casync now permits selecting the compression format to use with the new
option --compression=. In addition to the originally reported xz compression,
gzip and zstd compression are now supported, the latter being the new default
as it provides excellent compression at very high speeds. It's OK to mix
chunks compressed with different algorithms in the same store, but of course
clients downloading them need to be new enough to read chunks in non-xz
formats. Note that the file suffix for compressed chunks changed ".xz" →
".cacnk", as they now may contain either compression, and continuing to use
the ".xz" suffix would be misleading. To retain compatibility with older
casync, the environment variable $CASYNC_COMPRESSED_CHUNK_SUFFIX may be set
to ".xz", to force usage of the old suffix.

- When extracting archives or archive indexes a subset of the metadata stored
in the archive may now be selected to be replayed, using the usual --with=
and --without= options. For example, if an archive containing full metadata
is extracted with --without=privileged only the unprivileged metadata fields
are extracted (i.e. no file ownership, ACLs, SELinux labels, ...).

- After completing an operation statistics about downloaded chunks are now
shown.

- When invoking "casync mkdev" the third parameter may now be an arbitrarily
selected path below /dev which is then created as a symlink to the block
device used, and registered with udev. This means the usual device
enumeration will find the block device under the name picked. Example:

# casync mkdev /somepath/tomy/index-file.caibx /dev/quux

This will expose the block image /somepath/tomy/index-file.caibx as /dev/quux.

Contributions from: David Guibert, enkore, Felipe Sateler, Jesus Rodriguez,
John Paul Adrian Glaubitz, Lennart Poettering, Martin Pitt, Silvio Fricke,
Zbigniew Jędrzejewski-Szmek

~~~ casync 1 ~~~

- Initial release

Contributions from: Daniel Mack, Djalal Harouni, Lennart Poettering, Martin
Pitt, Nikita Puzyryov, Thomas Hindoe Paaboel Andersen, Zbigniew
Jędrzejewski-Szmek
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ sizes are pretty evenly distributed, without the file boundaries
affecting them.

The "chunking" algorithm is based on a the buzhash rolling hash
function. SHA256 is used as strong hash function to generate digests
of the chunks. xz is used to compress the individual chunks.
function. SHA512/256 is used as strong hash function to generate digests of the
chunks (alternatively: SHA256). zstd is used to compress the individual chunks
(alternatively xz or gzip).

Is this new? Conceptually, not too much. This uses well-known concepts,
implemented in a variety of other projects, and puts them together in a
Expand All @@ -88,6 +89,7 @@ but there are other systems that use similar algorithms, in particular:
2. .caidx → index file referring to a directory tree (i.e. a .catar file)
3. .caibx → index file referring to a blob (i.e. any other file)
4. .castr → chunk store directory (where we store chunks under their hashes)
5. .cacnk → a compressed chunk in a chunk store (i.e. one of the files stored below a .castr directory)

## Operations on directory trees

Expand Down
42 changes: 24 additions & 18 deletions doc/casync.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,20 @@ Commands
--------

| **casync** **make** [*ARCHIVE* | *ARCHIVE_INDEX*] [*DIRECTORY*]
| **casync** **make** [*ARCHIVE* | *ARCHIVE_INDEX* | *BLOB_INDEX*] *FILE* | *DEVICE*
| **casync** **make** [*BLOB_INDEX*] *FILE* | *DEVICE*
This will create either a .catar archive or an .caidx index for for the given
*DIRECTORY*, or a .caibx index for the given *FILE* or block *DEVICE*. The type
of output is decided based on the extension. *DIRECTORY* is optional, and
the current directory will be used if not specified.
of output is automatically chosen based on the file extension (this may be
override with ``--what=``). *DIRECTORY* is optional, and the current directory
will be used if not specified.

When a .caidx or .caibx file is created, a .castr storage directory will be
created too (see ``--store=`` option).
created too, by default located in the same directory, and named
``default.castr`` unless configured otherwise (see ``--store=`` option).

The metadata included in the archive is controlled by the ``--with-*`` and
``--without-*`` options.

|
| **casync** extract [*ARCHIVE* | *ARCHIVE_INDEX*] [*DIRECTORY*]
Expand All @@ -43,6 +48,9 @@ into the specified *DIRECTORY*, or the contents specified by *BLOB_INDEX*
to the specified *FILE* or block *DEVICE*. *DIRECTORY* may be omitted,
and the current directory will be used by default.

The metadata replayed from the archive is controlled by the ``--with-*`` and
``--without-*`` options.

|
| **casync** list [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*]
Expand All @@ -60,17 +68,14 @@ The output includes the permission mask and file names::
|
| **casync** mtree [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*]
This is similar to **list**, but includes information about each entry
in a key=value format::
This is similar to **list**, but includes information about each entry in the
key=value format defined by BSD mtree(5):

$ casync mtree /usr/share/doc/casync
. type=dir mode=0755 uid=0 gid=0 time=1500343585.721189650
README.md type=file mode=0644 size=7286 uid=0 gid=0 time=1498175562.000000000 sha256digest=af75eacac1f00abf6adaa7510a2c7fe00a4636daf9ea910d69d96f0a4ae85df4
TODO type=file mode=0644 size=2395 uid=0 gid=0 time=1498175562.000000000 sha256digest=316f11a03c08ec39f0328ab1f7446bd048507d3fbeafffe7c32fad4942244b7d

What information is included is influenced by the ``--with-*`` and
``--without-*`` options.

|
| **casync** stat [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*] [*PATH*]
Expand All @@ -94,7 +99,7 @@ Example output::
|
| **casync** digest [*ARCHIVE* | *BLOB* | *ARCHIVE_INDEX* | *BLOB_INDEX* | *DIRECTORY*]
This will compute and print the SHA256 checksum of the argument.
This will compute and print the checksum of the argument.
The argument is optional and defaults to the current directory::

$ casync digest
Expand All @@ -107,14 +112,14 @@ The argument is optional and defaults to the current directory::
| **casync** mount [*ARCHIVE* | *ARCHIVE_INDEX*] *PATH*
This will mount the specified .catar archive or .caidx index at the
specified *PATH*, using the fuse protocol.
specified *PATH*, using the FUSE protocol.

|
| **casync** mkdev [*BLOB* | *BLOB_INDEX*] [*NODE*]
This will create a block device *NODE* with the contents specified
by the .caibx *BLOB_INDEX* or just the file or block device *BLOB*,
using the nbd protocol.
using the NBD protocol.

Example::

Expand All @@ -135,11 +140,12 @@ General options:
--help, -h Show terse help output
--verbose, -v Show terse status information during runtime
--store=PATH The primary chunk store to use
--extra-store=PATH Additional chunk store to look for chunks in
--chunk-size=<[MIN]:AVG:[MAX]> The minimal/average/maximum number of bytes in a chunk
--digest=<sha256|sha512-256> The digest algorithm to use.
--seed=PATH Additional file or directory to use as seed
--rate-limit-bps=LIMIT Maximum bandwidth in bytes/s for remote communication
--extra-store=<PATH> Additional chunk store to look for chunks in
--chunk-size=<[MIN:]AVG[:MAX]> The minimal/average/maximum number of bytes in a chunk
--digest=<DIGEST> Pick digest algorithm (sha512-256 or sha256)
--compression=<COMPRESSION> Pick compression algorithm (zstd, xz or gzip)
--seed=<PATH> Additional file or directory to use as seed
--rate-limit-bps=<LIMIT> Maximum bandwidth in bytes/s for remote communication
--exclude-nodump=no Don't exclude files with chattr(1)'s +d **nodump** flag when creating archive
--exclude-submounts=yes Exclude submounts when creating archive
--reflink=no Don't create reflinks from seeds when extracting
Expand All @@ -150,7 +156,7 @@ General options:
--seed-output=no Don't implicitly add pre-existing output as seed when extracting
--recursive=no List non-recursively
--uid-shift=<yes|SHIFT> Shift UIDs/GIDs
--uid-range=RANGE Restrict UIDs/GIDs to range
--uid-range=<RANGE> Restrict UIDs/GIDs to range

Input/output selector:

Expand Down
4 changes: 3 additions & 1 deletion meson.build
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
project('casync', 'c',
version : '0.1',
version : '2',
license : 'LGPLv2+',
default_options: [
'c_std=gnu99',
Expand Down Expand Up @@ -62,6 +62,8 @@ foreach arg : c_args
endforeach

conf = configuration_data()
conf.set_quoted('PACKAGE_VERSION', meson.project_version())

conf.set('_GNU_SOURCE', true)
conf.set('__SANE_USERSPACE_TYPES__', true)

Expand Down
49 changes: 19 additions & 30 deletions src/casync-tool.c
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ static void help(void) {
"%1$s [OPTIONS...] mkdev [BLOB|BLOB_INDEX] [NODE]\n\n"
"Content-Addressable Data Synchronization Tool\n\n"
" -h --help Show this help\n"
" --version Show brief version information\n"
" -v --verbose Show terse status information during runtime\n"
" --store=PATH The primary chunk store to use\n"
" --extra-store=PATH Additional chunk store to look for chunks in\n"
Expand Down Expand Up @@ -155,6 +156,11 @@ static void help(void) {
program_invocation_short_name);
}

static void version(void) {
printf("%s " PACKAGE_VERSION "\n",
program_invocation_short_name);
}

static int parse_chunk_sizes(const char *v, size_t *ret_min, size_t *ret_avg, size_t *ret_max) {
uint64_t a, b, c;
char *k;
Expand Down Expand Up @@ -269,10 +275,12 @@ static int parse_argv(int argc, char *argv[]) {
ARG_MKDIR,
ARG_DIGEST,
ARG_COMPRESSION,
ARG_VERSION,
};

static const struct option options[] = {
{ "help", no_argument, NULL, 'h' },
{ "version", no_argument, NULL, ARG_VERSION },
{ "verbose", no_argument, NULL, 'v' },
{ "store", required_argument, NULL, ARG_STORE },
{ "extra-store", required_argument, NULL, ARG_EXTRA_STORE },
Expand Down Expand Up @@ -315,6 +323,10 @@ static int parse_argv(int argc, char *argv[]) {
help();
return 0;

case ARG_VERSION:
version();
return 0;

case 'v':
arg_verbose = true;
break;
Expand Down Expand Up @@ -607,6 +619,7 @@ static int parse_argv(int argc, char *argv[]) {

static int set_default_store(const char *index_path) {
const char *e;
int r;

if (arg_store)
return 0;
Expand All @@ -616,39 +629,13 @@ static int set_default_store(const char *index_path) {
/* If the default store is set via an environment variable, use that */
arg_store = strdup(e);
else if (index_path) {
char *d;
CaLocatorClass c;

/* Otherwise, derive it from the index file path */

c = ca_classify_locator(index_path);
if (c < 0) {
fprintf(stderr, "Failed to automatically derive store location: %s\n", index_path);
return -EINVAL;
}

if (c == CA_LOCATOR_URL) {
const char *p;

p = index_path + strcspn(index_path, ";?");
for (;;) {
if (p <= index_path)
break;

if (p[-1] == '/')
break;

p--;
}

d = strndupa(index_path, p - index_path);
arg_store = strjoin(d, "default.castr");
} else {
d = dirname_malloc(index_path);
if (!d)
return log_oom();
arg_store = strjoin(d, "/default.castr");
free(d);
r = ca_locator_patch_last_component(index_path, "default.castr", &arg_store);
if (r < 0) {
fprintf(stderr, "Failed to automatically derive store location from index: %s\n", strerror(-r));
return r;
}
} else
/* And if we don't know any, then place it in the current directory */
Expand Down Expand Up @@ -4043,5 +4030,7 @@ int main(int argc, char *argv[]) {
strv_free(arg_extra_stores);
strv_free(arg_seeds);

/* fprintf(stderr, PID_FMT ": exiting with error code: %s\n", getpid(), strerror(-r)); */

return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}
Loading

0 comments on commit 68d4928

Please sign in to comment.