Merge pull request #70 from poettering/v2-prep

preparations for v2
systemd · Jul 26, 2017 · 68d4928 · 68d4928
2 parents effa00a + 23a89b6
commit 68d4928
Show file tree

Hide file tree

Showing 8 changed files with 280 additions and 53 deletions.
diff --git a/NEWS b/NEWS
@@ -0,0 +1,70 @@
+~~~ casync 2 ~~~
+
+- casync now supports retrieving index and chunk data from sftp:// URLs. (In
+  addition to the existing ftp://, http:// and https:// support).
+
+- casync will now honour $TMP if it is set, for placing temporary files and
+  directories.
+
+- casync now saves/restores basic btrfs subvolume information. (Specifically it
+  will store whether a directory is a subvolume, and whether it has the
+  read-only bit set.) Control this metadata option with the new
+  --with=subvolume/--without=subvolume and
+  --with=subvolume-ro/--without=subvolume-ro switches.
+
+- casync now saves/restores SELinux label information. Control this metadata
+  option with the new --with=selinux/--without=selinux switches.
+
+- The libgcrypt dependency has been replaced with an OpenSSL dependency, as
+  that appears to be better supported today, and may be used to generate
+  SHA512/256 hashes (see below).
+
+- casync now permits selecting the hash function to use with the new --digest=
+  option. SHA512/256 is now supported in addition to the old SHA256 algorithm,
+  which continues to be supported. The new default however is SHA512/256, as it
+  is substantially faster at otherwise equal properties on today's 64bit
+  processors. In specific environments SHA256 might perform better, hence both
+  algorithms remain supported. Index files contain information about the hash
+  algorithm used, hence automatic compatibility is retained.
+
+- casync now permits selecting the compression format to use with the new
+  option --compression=. In addition to the originally reported xz compression,
+  gzip and zstd compression are now supported, the latter being the new default
+  as it provides excellent compression at very high speeds. It's OK to mix
+  chunks compressed with different algorithms in the same store, but of course
+  clients downloading them need to be new enough to read chunks in non-xz
+  formats. Note that the file suffix for compressed chunks changed ".xz" →
+  ".cacnk", as they now may contain either compression, and continuing to use
+  the ".xz" suffix would be misleading. To retain compatibility with older
+  casync, the environment variable $CASYNC_COMPRESSED_CHUNK_SUFFIX may be set
+  to ".xz", to force usage of the old suffix.
+
+- When extracting archives or archive indexes a subset of the metadata stored
+  in the archive may now be selected to be replayed, using the usual --with=
+  and --without= options. For example, if an archive containing full metadata
+  is extracted with --without=privileged only the unprivileged metadata fields
+  are extracted (i.e. no file ownership, ACLs, SELinux labels, ...).
+
+- After completing an operation statistics about downloaded chunks are now
+  shown.
+
+- When invoking "casync mkdev" the third parameter may now be an arbitrarily
+  selected path below /dev which is then created as a symlink to the block
+  device used, and registered with udev. This means the usual device
+  enumeration will find the block device under the name picked. Example:
+
+          # casync mkdev /somepath/tomy/index-file.caibx /dev/quux
+
+  This will expose the block image /somepath/tomy/index-file.caibx as /dev/quux.
+
+Contributions from: David Guibert, enkore, Felipe Sateler, Jesus Rodriguez,
+John Paul Adrian Glaubitz, Lennart Poettering, Martin Pitt, Silvio Fricke,
+Zbigniew Jędrzejewski-Szmek
+
+~~~ casync 1 ~~~
+
+- Initial release
+
+Contributions from: Daniel Mack, Djalal Harouni, Lennart Poettering, Martin
+Pitt, Nikita Puzyryov, Thomas Hindoe Paaboel Andersen, Zbigniew
+Jędrzejewski-Szmek
diff --git a/README.md b/README.md
@@ -61,8 +61,9 @@ sizes are pretty evenly distributed, without the file boundaries
 affecting them.
 
 The "chunking" algorithm is based on a the buzhash rolling hash
-function. SHA256 is used as strong hash function to generate digests
-of the chunks. xz is used to compress the individual chunks.
+function. SHA512/256 is used as strong hash function to generate digests of the
+chunks (alternatively: SHA256). zstd is used to compress the individual chunks
+(alternatively xz or gzip).
 
 Is this new? Conceptually, not too much. This uses well-known concepts,
 implemented in a variety of other projects, and puts them together in a
@@ -88,6 +89,7 @@ but there are other systems that use similar algorithms, in particular:
 2. .caidx → index file referring to a directory tree (i.e. a .catar file)
 3. .caibx → index file referring to a blob (i.e. any other file)
 4. .castr → chunk store directory (where we store chunks under their hashes)
+5. .cacnk → a compressed chunk in a chunk store (i.e. one of the files stored below a .castr directory)
 
 ## Operations on directory trees
 

diff --git a/doc/casync.rst b/doc/casync.rst
@@ -24,15 +24,20 @@ Commands
 --------
 
 | **casync** **make** [*ARCHIVE* | *ARCHIVE_INDEX*] [*DIRECTORY*]
-| **casync** **make** [*ARCHIVE* | *ARCHIVE_INDEX* | *BLOB_INDEX*] *FILE* | *DEVICE*
+| **casync** **make** [*BLOB_INDEX*] *FILE* | *DEVICE*
 
 This will create either a .catar archive or an .caidx index for for the given
 *DIRECTORY*, or a .caibx index for the given *FILE* or block *DEVICE*. The type
-of output is decided based on the extension. *DIRECTORY* is optional, and
-the current directory will be used if not specified.
+of output is automatically chosen based on the file extension (this may be
+override with ``--what=``). *DIRECTORY* is optional, and the current directory
+will be used if not specified.
 
 When a .caidx or .caibx file is created, a .castr storage directory will be
-created too (see ``--store=`` option).
+created too, by default located in the same directory, and named
+``default.castr`` unless configured otherwise (see ``--store=`` option).
+
+The metadata included in the archive is controlled by the ``--with-*`` and
+``--without-*`` options.
 
 |
 | **casync** extract [*ARCHIVE* | *ARCHIVE_INDEX*] [*DIRECTORY*]
@@ -43,6 +48,9 @@ into the specified *DIRECTORY*, or the contents specified by *BLOB_INDEX*
 to the specified *FILE* or block *DEVICE*. *DIRECTORY* may be omitted,
 and the current directory will be used by default.
 
+The metadata replayed from the archive is controlled by the ``--with-*`` and
+``--without-*`` options.
+
 |
 | **casync** list [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*]
 
@@ -60,17 +68,14 @@ The output includes the permission mask and file names::
 |
 | **casync** mtree [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*]
 
-This is similar to **list**, but includes information about each entry
-in a key=value format::
+This is similar to **list**, but includes information about each entry in the
+key=value format defined by BSD mtree(5):
 
   $ casync mtree /usr/share/doc/casync
   . type=dir mode=0755 uid=0 gid=0 time=1500343585.721189650
   README.md type=file mode=0644 size=7286 uid=0 gid=0 time=1498175562.000000000 sha256digest=af75eacac1f00abf6adaa7510a2c7fe00a4636daf9ea910d69d96f0a4ae85df4
   TODO type=file mode=0644 size=2395 uid=0 gid=0 time=1498175562.000000000 sha256digest=316f11a03c08ec39f0328ab1f7446bd048507d3fbeafffe7c32fad4942244b7d
 
-What information is included is influenced by the ``--with-*`` and
-``--without-*`` options.
-
 |
 | **casync** stat [*ARCHIVE* | *ARCHIVE_INDEX* | *DIRECTORY*] [*PATH*]
 
@@ -94,7 +99,7 @@ Example output::
 |
 | **casync** digest [*ARCHIVE* | *BLOB* | *ARCHIVE_INDEX* | *BLOB_INDEX* | *DIRECTORY*]
 
-This will compute and print the SHA256 checksum of the argument.
+This will compute and print the checksum of the argument.
 The argument is optional and defaults to the current directory::
 
   $ casync digest
@@ -107,14 +112,14 @@ The argument is optional and defaults to the current directory::
 | **casync** mount [*ARCHIVE* | *ARCHIVE_INDEX*] *PATH*
 
 This will mount the specified .catar archive or .caidx index at the
-specified *PATH*, using the fuse protocol.
+specified *PATH*, using the FUSE protocol.
 
 |
 | **casync** mkdev [*BLOB* | *BLOB_INDEX*] [*NODE*]
 
 This will create a block device *NODE* with the contents specified
 by the .caibx *BLOB_INDEX* or just the file or block device *BLOB*,
-using the nbd protocol.
+using the NBD protocol.
 
 Example::
 
@@ -135,11 +140,12 @@ General options:
 --help, -h                      Show terse help output
 --verbose, -v                   Show terse status information during runtime
 --store=PATH                    The primary chunk store to use
---extra-store=PATH              Additional chunk store to look for chunks in
---chunk-size=<[MIN]:AVG:[MAX]>  The minimal/average/maximum number of bytes in a chunk
---digest=<sha256|sha512-256>    The digest algorithm to use.
---seed=PATH                     Additional file or directory to use as seed
---rate-limit-bps=LIMIT          Maximum bandwidth in bytes/s for remote communication
+--extra-store=<PATH>            Additional chunk store to look for chunks in
+--chunk-size=<[MIN:]AVG[:MAX]>  The minimal/average/maximum number of bytes in a chunk
+--digest=<DIGEST>               Pick digest algorithm (sha512-256 or sha256)
+--compression=<COMPRESSION>     Pick compression algorithm (zstd, xz or gzip)
+--seed=<PATH>                   Additional file or directory to use as seed
+--rate-limit-bps=<LIMIT>        Maximum bandwidth in bytes/s for remote communication
 --exclude-nodump=no             Don't exclude files with chattr(1)'s +d **nodump** flag when creating archive
 --exclude-submounts=yes         Exclude submounts when creating archive
 --reflink=no                    Don't create reflinks from seeds when extracting
@@ -150,7 +156,7 @@ General options:
 --seed-output=no                Don't implicitly add pre-existing output as seed when extracting
 --recursive=no                  List non-recursively
 --uid-shift=<yes|SHIFT>         Shift UIDs/GIDs
---uid-range=RANGE               Restrict UIDs/GIDs to range
+--uid-range=<RANGE>             Restrict UIDs/GIDs to range
 
 Input/output selector:
 

diff --git a/meson.build b/meson.build
@@ -1,5 +1,5 @@
 project('casync', 'c',
-        version : '0.1',
+        version : '2',
         license : 'LGPLv2+',
         default_options: [
                 'c_std=gnu99',
@@ -62,6 +62,8 @@ foreach arg : c_args
 endforeach
 
 conf = configuration_data()
+conf.set_quoted('PACKAGE_VERSION', meson.project_version())
+
 conf.set('_GNU_SOURCE', true)
 conf.set('__SANE_USERSPACE_TYPES__', true)
 

diff --git a/src/casync-tool.c b/src/casync-tool.c
@@ -70,6 +70,7 @@ static void help(void) {
                "%1$s [OPTIONS...] mkdev [BLOB|BLOB_INDEX] [NODE]\n\n"
                "Content-Addressable Data Synchronization Tool\n\n"
                "  -h --help                  Show this help\n"
+               "     --version               Show brief version information\n"
                "  -v --verbose               Show terse status information during runtime\n"
                "     --store=PATH            The primary chunk store to use\n"
                "     --extra-store=PATH      Additional chunk store to look for chunks in\n"
@@ -155,6 +156,11 @@ static void help(void) {
                program_invocation_short_name);
 }
 
+static void version(void) {
+        printf("%s " PACKAGE_VERSION "\n",
+               program_invocation_short_name);
+}
+
 static int parse_chunk_sizes(const char *v, size_t *ret_min, size_t *ret_avg, size_t *ret_max) {
         uint64_t a, b, c;
         char *k;
@@ -269,10 +275,12 @@ static int parse_argv(int argc, char *argv[]) {
                 ARG_MKDIR,
                 ARG_DIGEST,
                 ARG_COMPRESSION,
+                ARG_VERSION,
         };
 
         static const struct option options[] = {
                 { "help",              no_argument,       NULL, 'h'                   },
+                { "version",           no_argument,       NULL, ARG_VERSION           },
                 { "verbose",           no_argument,       NULL, 'v'                   },
                 { "store",             required_argument, NULL, ARG_STORE             },
                 { "extra-store",       required_argument, NULL, ARG_EXTRA_STORE       },
@@ -315,6 +323,10 @@ static int parse_argv(int argc, char *argv[]) {
                         help();
                         return 0;
 
+                case ARG_VERSION:
+                        version();
+                        return 0;
+
                 case 'v':
                         arg_verbose = true;
                         break;
@@ -607,6 +619,7 @@ static int parse_argv(int argc, char *argv[]) {
 
 static int set_default_store(const char *index_path) {
         const char *e;
+        int r;
 
         if (arg_store)
                 return 0;
@@ -616,39 +629,13 @@ static int set_default_store(const char *index_path) {
                 /* If the default store is set via an environment variable, use that */
                 arg_store = strdup(e);
         else if (index_path) {
-                char *d;
-                CaLocatorClass c;
 
                 /* Otherwise, derive it from the index file path */
 
-                c = ca_classify_locator(index_path);
-                if (c < 0) {
-                        fprintf(stderr, "Failed to automatically derive store location: %s\n", index_path);
-                        return -EINVAL;
-                }
-
-                if (c == CA_LOCATOR_URL) {
-                        const char *p;
-
-                        p = index_path + strcspn(index_path, ";?");
-                        for (;;) {
-                                if (p <= index_path)
-                                        break;
-
-                                if (p[-1] == '/')
-                                        break;
-
-                                p--;
-                        }
-
-                        d = strndupa(index_path, p - index_path);
-                        arg_store = strjoin(d, "default.castr");
-                } else {
-                        d = dirname_malloc(index_path);
-                        if (!d)
-                                return log_oom();
-                        arg_store = strjoin(d, "/default.castr");
-                        free(d);
+                r = ca_locator_patch_last_component(index_path, "default.castr", &arg_store);
+                if (r < 0) {
+                        fprintf(stderr, "Failed to automatically derive store location from index: %s\n", strerror(-r));
+                        return r;
                 }
         } else
                 /* And if we don't know any, then place it in the current directory */
@@ -4043,5 +4030,7 @@ int main(int argc, char *argv[]) {
         strv_free(arg_extra_stores);
         strv_free(arg_seeds);
 
+        /* fprintf(stderr, PID_FMT ": exiting with error code: %s\n", getpid(), strerror(-r)); */
+
         return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
 }