Skip to content

Latest commit

 

History

History
697 lines (489 loc) · 44.9 KB

README.md

File metadata and controls

697 lines (489 loc) · 44.9 KB

conf

conf is a derive-based config parser aimed at the practically-minded web developer building large web projects.

Crates.io Crates.io License License Build Status

API Docs | Proc-macro Reference | Examples

Overview

conf uses clap under the hood to parse CLI arguments and generate help text.

conf has an intentionally similar proc-macro API to clap-derive, but it is not a fork. It is a new library with different goals. It offers some powerful features and support that clap-derive does not, which help with the configuration of large projects. But it also doesn't offer some features of clap, which I have found to be less useful in a typical web project.

The features that you get for this bargain are:

  • You can assign a prefix to a structure's fields when flattening it into another structure, and you can similarly do env prefixing in a controlled way.
  • You get ALL the errors and not just one of them if some required env is missing and/or several of the values are invalid. In my searching I found that surprisingly few config crates out there actually do this. Very helpful if your deployments take a while.
  • Isolation & testability around env. clap only supports reading env values from std::env::var_os.
    • If you want to test what happens when different variables are set, your tests can become racy.
    • If you want to test a component that takes config as an argument, and use ::parse_from to initialize the config, then your tests will pass or fail depending on your local env.
    • If you want to implement Default based on the default values your declared on your structure, you can't really because you can't isolate it from env.
    • conf lets you pass an iterator to represent a snapshot of the environment.
  • Support for env aliases. clap supports aliases for command-line arguments but not for env. Make changes without breaking compatibility.
  • You can declare fields which are only read from env and cannot be read from args at all.
  • You can declare fields which represent secrets. This controls whether or not the entire value should be printed in error messages if it fails to parse.
  • Support for an optional-flatten syntax. This can be simpler and more idiomatic than using argument groups and such in clap-derive.
  • Support for user-defined validation predicates. This allows you to express constraints that can't be expressed in clap.
  • Support for layered config. This means that you can use data loaded from a file as an additional source for config values, alongside args and env.

As of version ??? conf supports using config content in any serde-compatible format, such as JSON, YAML, TOML, etc., as a hierarchical config layer. The same commitment to "All the errors and not just one of them" holds. There are several advantages of this integrated approach:

  • Other popular approaches to hierarchical config include using clap for CLI argument parsing only, and then folding the results of that into a library like figment or config, which can also manage env, files, and compositing it all together.
    • However, typically this creates a maintanence burden, because if a required field could be read via clap or could be read from env or a config file, it needs to be Option<T> for clap and T in the final config structure, so you end up needing to maintain two parallel structures.
    • If these structures get out of sync, there isn't really any tooling to help you figure it out and the error messages may be confusing.
    • Dividing the information between two structures this way means that clap isn't aware of the other ways that a value can be read. But clap is responsible for generating the --help text, and so this causes the documentation of the config to be incomplete and makes it harder for users to figure out how to use your program.
    • It leads to poor quality error reporting, because crates like figment and config rely on serde::Deserialize to marshall the composited data onto your final structure. This precludes giving multiple error reports if there are multiple problems in different parts of the config. (See MOTIVATION.md for more discussion.)
  • When using conf instead, all of these problems are avoided. Notably, conf provides its own proc-macro, and so we can walk the serde::de::Deserializer ourselves and ensure that we get comprehensive error reporting, even if serde_derive::Deserialize would have stopped at the first error.
  • conf can also be used together with figment advantageously. See Multiple config files for more on this.

conf is heavily influenced by clap-derive and the earlier struct-opt which I used for years. They are both great and became popular for a reason.

Where there is overlap, conf tries to stay extremely close to clap-derive syntax and behavior, in most cases, for familiarity and ease of migrating a large project. In some cases, there are small deviations from the behavior of clap-derive to either help avoid mistakes, or to make the defaults closer to a good 12-factor app behavior. For some advanced features of clap, conf has a way to achieve the same thing, but we took a different approach. This is typically in an attempt to simplify how it works for the user of the derive macro, to have fewer named concepts, or to ease maintenance going forward. (Because we don't offer an analogue of the clap_builder API, the design tradeoffs are different.)

The public API here is restricted to the Conf and Subcommands traits, proc-macros to derive them, and one error type. It is hoped that this will both reduce the learning curve and ease future development and maintenance.

See MOTIVATION.md for more discussion about this project and the other various alternatives out there.

Using conf in a cargo project

First add conf to the dependencies in your Cargo.toml file:

[dependencies]
conf = "0.1"

Then, create a struct which represents the configuration data your application needs to read on startup. This struct should derive the Conf trait, and the conf attributes should be used to describe how each field can be read.

use conf::Conf;

#[derive(Conf)]
pub struct Config {
    /// This is a string parameter, which can be read from args as `--my-param` or from env as `MY_PARAM`.
    #[arg(long, env)]
    my_param: String,

    /// This flag corresponds to `-f` or `--force` in args
    #[arg(short, long)]
    force: bool,

    /// URL to hit, which can be read from args as `--url` or from env as `URL`.
    #[arg(long, env)]
    url: Url, // This works because Url implements `FromStr`.
}

Finally, you can parse the config:

    let config = Config::parse();

Usually you would call that somewhere in fn main() and then use the config to initialize your application.

The parse() function will automatically add a --help option for users that contains auto-generated documentation, based on your doc strings.

Additionally, if parsing fails for some reason, it will display a helpful error message and exit.

(The Conf trait offers a few variants of this function, which you can read about in the docs.)

Generally, the CLI interface and help text that is generated is meant to conform to POSIX and GNU conventions. Read more in clap docu about this.

A tour

A field in your struct can be read from a few sources:

  • #[arg(short)] means that it has an associated "short" command-line option, such as -u. By default the first letter of your field is used. This can be overridden with #[arg(short='t')] for example.
  • #[arg(long)] means that it has an associated "long" command-line option, such as --url. By default the kebab-case name of your field is used. This can be overridden with #[arg(long="target-url")] for example.
  • #[arg(env)] means that it has an associated environment variable, such as URL. By default the upper snake-case name of your field is used. This can be overridden with #[arg(env="TARGET_URL")] for example.
  • #[arg(default_value)] specifies a default value for this field if none of the other three possible sources provides one.

Such attributes can be combined by separating them with commas, for example #[arg(long, env, default_value="x")] means the field has an assocated long option, an associated environment variable, and a default value if both of these are omitted.

Your field can have any type as long as it implements FromStr, and this will be used to parse it. The type bool is special and results in a "flag" being generated rather than a "parameter", which expects no string parameter to be passed during parsing. Option<T> is also special, and indicates that the value is optional rather than required. You can also specify an alternative parsing function using value_parser.

So far this is almost exactly the same clap-derive. Where it gets more interesting is the flatten option.

You may have one structure that derives Conf and declares a bunch of related config values:

#[derive(Conf)]
pub struct DbConfig {
    /// Database connection URL.
    #[arg(long)]
    pub db_url: String,

    /// Set the maximum number of connections of the pool.
    #[arg(long)]
    pub db_max_connections: Option<u32>,

    /// Set the minimum number of connections of the pool.
    #[arg(long)]
    pub db_min_connections: Option<u32>,

    /// Set the timeout duration when acquiring a connection.
    #[arg(long)]
    pub db_connect_timeout: Option<u64>,

    /// Set the maximum amount of time to spend waiting for acquiring a connection.
    #[arg(long)]
    pub db_acquire_timeout: Option<u64>,

    /// Set the idle duration before closing a connection.
    #[arg(long)]
    pub db_idle_timeout: Option<u64>,

    /// Set the maximum lifetime of individual connections.
    #[arg(long)]
    pub db_max_lifetime: Option<u64>
}

Then you can "flatten" it into a larger Conf structure using the conf(flatten) attribute.

#[derive(Conf)]
pub struct Config {
    /// Database
    #[conf(flatten)]
    db: DbConfig,
}

Intuitively, this is meant to read a lot like the serde(flatten) attribute, and has a similar behavior. During parsing, the parser behaves as if every field of DbConfig were declared within Config, and generates matching options, env, and help, but then the parsed values actually get stored in subfields of the .db field.

Using flatten can save a lot of labor. For example, suppose your web application consists of ten different web services, and they all need a DbConfig. Instead of duplicating all the values, any env, any defaults, any help text, in each Config that you have, you can write that once and then flatten it ten times. Then, later when you discover that DbConfig should contain another value, you only have to add it to DbConfig once, and every service that uses DbConfig will get the new config parameter. Also, when you need to initialize your db connection, you can just pass it the entire .db field rather than pick out needed config arguments one-by-one.

Where conf differs from clap-derive is that we expect that you will use flatten in your project quite a lot.

For example, you might need to do this:

#[derive(Conf)]
pub struct Config {
    #[conf(flatten)]
    pub auth_service: HttpClientConfig,

    #[conf(flatten)]
    pub friend_service: HttpClientConfig,

    #[conf(flatten)]
    pub snaps_service: HttpClientConfig,
}

because logically, you have three different http clients that you need to configure.

However with clap-derive, this is going to cause a problem, because when the fields from HttpClientConfig get flattened, their names will collide, and the parser will reject it as ambiguous.

When using conf, you can resolve it by declaring a prefix.

#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix)]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix)]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix)]
    pub snaps_service: HttpClientConfig,
}

This will cause every option associated to the auth_service structure to get a prefix, derived from the field name, auth_service, on any long-form options and on any env variables. The prefix will be kebab-case for long-form options and upper snake-case for env variables. And similarly for friend_service and snaps_service.

You can also override this prefix:

#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: HttpClientConfig,
}

You can also configure env prefixes and option prefixes separately if you want that. Setting env_prefix will cause env vars to be prefixed, but not options. long_prefix will cause long-form options to be prefixed, but not env vars. (Short options are never prefixed, so there is not usually a good way to resolve a conflict among them. Short options should be used with caution in a large project.)

Finally, you can also declare prefixes at the level of a struct rather than a field. So for example, if you need every environment variable your program reads to be prefixed with ACME_, you can achieve that very easily.

#[derive(Conf)]
#[conf(env_prefix="ACME_")]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: HttpClientConfig,
}

Option<T> can also be used with a flattened structure, so if one of these services is optional, you can simply write:

#[derive(Conf)]
#[conf(env_prefix="ACME_")]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}

You can read about all the attributes and usage in the docs or the REFERENCE.md, but hopefully this is enough to get started.

See also the examples.

Topics

This section discusses more advanced features and usage patterns, as well as alternatives.

Reading files

Sometimes, a web service needs to read a file on startup.

One way this can be done in conf is by using the value_parser feature, which works very similarly as in clap.

A value_parser is a function that takes a &str and returns either a value or an error.

For example, if you need to read a yaml file on startup according to a schema, one way you could do that is

use conf::Conf;
use serde::Deserialize;
use std::{error::Error, fs};

#[derive(Deserialize)]
pub struct MyYamlSchema {
    pub example: String,
}

#[derive(Conf)]
pub struct Config {
    #[conf(long, env, value_parser = |file: &str| -> Result<_, Error> { Ok(serde_yaml::from_str(fs::read_to_string(&file)?)?) }]
    pub yaml_file: MyYamlSchema,
}

This will read a file path either from CLI args or from env, then attempt to open the file and parse it according to the yaml schema.

If your value_parser is complex or needs to be reused, the best practice is to put it in a named function.

#[derive(Conf)]
pub struct Config {
    #[conf(long, env, value_parser = utils::read_yaml_file)]
    pub yaml_file: MyYamlSchema,
}

This can be a good pattern for things like reading a certificate or a cryptographic key from a file, which you want to check on startup. This way you will fail fast if the file is not found or is invalid, but also report all other config problems at the same time.

This kind of approach would always read the key from a file, but would allow you to specify the file path either in args or in env. This is not the same thing as hierarchical config files though, which we'll discuss next.

Hierarchical config

Hierarchical config is the idea that config values should be merged in from files as well as from args and env.

Applications should follow a hierarchical configuration structure. Use the following order, from highest priority to lowest.

  1. Command-line arguments
  2. Environment variables
  3. Directory or repository-scoped configuration
  4. User-scoped configuration
  5. System-wide configuration
  6. Default configuration shipped with the program.

conf has strong built-in support for (1), (2), and (6) here. To get the others, there are basically two approaches.

.env files

The simplest approach to hierarchical config, IMO, is to use a crate like dotenvy. This crate can search for an .env file, and then set env values if they are not already set in your program. You can do this right before calling Config::parse(), and in this manner achieve hierarchical config, with args > env > .env file > defaults. You can load multiple .env files this way if you need to, searching user-provided paths, default paths, and so on.

In web applications, I often use this approach for development rather than production, and I recommend this approach especially for smaller projects.

If your application has a lot of required values, it may take an engineer a while to figure out how to just run it locally. But you may not want to provide default values in the program that would not be appropriate in production, for safety. Instead, you can provide a .env file which is checked in to the repo, with values which are appropriate for local testing / CI. Then an engineer can use cargo run and it will just work. When you go to build docker containers, you can leave out these .env files, and then be sure that in the deployed environment, kubernetes or similar is in total control, and any missing or misspelled values in the helm charts and whatnot will be loud and fail fast.

These .env files work well if you are using diesel, because the diesel cli tool also uses dotenvy to search for a .env file and find the DATABASE_URL when manging database migrations locally.

You can also pass .env files directly to docker run if you want to test docker containers locally.

This is a very traditional approach to configuring 12-factor apps. You get most of the benefits of having config files, but it's also typically easier to deploy the app if it doesn't require files to be mounted into a container, and the config is typically easier to change in a deployed environment if it is based on environment variables.

The biggest drawback of this approach is that you are limited to things that can easily be expressed in a .env format. If your config structure logically contains arrays of structs, it may not be very natural to express that in .env.

Another drawback is that the .env format doesn't really have a spec, and there are many divergent parser implementations. Eventually you may run into incompatibilities between what docker does, what bash does, and what the numerous dotenv libraries in different programming languages do. This is typically annoying but not insurmountable.

General config files

Alternatively, you may prefer that your application can load layered config from a file in a more structured format.

In the conf API, self-describing structured data like this is called a "document". (conf doesn't really care if it actually came from a file.)

To use a document as a source for layered config in conf, you can do the following:

  1. You must have the serde feature enabled in conf, which is on by default.

    You must annotate your structs with #[conf(serde)]. This can create additional build-time requirements -- fields in your structs might need to implement serde::Deserialize depending on how they are annotated.

  2. First, determine the file path and load the document content. For example,

    let config_path = std::env::var("CONFIG").ok().or_else("config.yaml".to_owned());
    
    let doc_content: serde_yaml::Value = serde_yaml::from_reader(fs::File::open(&config_path).unwrap()).unwrap();

    Note that conf doesn't force you to use any particular library or error handling discipline here. You may prefer to skip the file if it is not specified, or not found, or invalid, and try to proceed without it.

  3. Next, use the builder API to parse an instance of your structure.

    let config = MyConfig::conf_builder()
                 .doc(config_path, doc_content)
                 .parse();

    The builder uses std::env::vars_os and std::env::args_os as env and args sources by default, but these can be overrided if desired. The config_path string parameter is used in error messages.

Intuitively what happens is, conf attempts to initialize your struct, mapping the yaml data onto it, similar to serde::Deserialize. However, for any fields in your Conf struct, if there are multiple value sources, the priority is args > env > serde > defaults. So values from the serde::Deserializer can be shadowed, and also holes in the serde data can be filled from defaults and so on.

Any value_parser is run only after the available value sources and their priorities have been resolved.

conf will work best if you use a "self-describing" format, which has a type like serde_yaml::Value or serde_json::Value which can hold any valid yaml or json, and you deserialize into that first. In particular, it's not recommended to do the following, even if it would avoid some copies:

   // Not recommended
   let config = MyConfig::conf_builder()
                .doc(config_path, serde_yaml::Deserializer::from_reader(fs::File::open(&config_path).unwrap()))
                .parse();

If the file is not valid yaml or json, then at some point in the middle of the walk, the deserializer may be in a broken state, and any further attempts to interact with it will yield errors. Then conf may report numerous errors as it tries to read data for different parts of your structure, giving up on failing branches and continuing to try on other branches. These errors may distract from the root cause. By deserializing into a Value type first, and failing fast if that doesn't work, you can avoid this scenario.

See a worked example which is under test if you like.

Multiple config files

A limitation of conf is that you can only pass it one document in this manner -- you can't call [ConfBuilder::doc] multiple times and pass a series of progressively lower-priority file contents.

However, you can use other libraries to help with this.

   let content: figment::Value
     = Figment::new()
       .merge(Json::file("file1"))
       .merge(Json::file("file2"))
       .extract()?;

The Figment::extract function invokes serde::Deserialize, and so can only report one error. But extracting into a figment::Value is not expected to fail, since this is the internal representation that figment uses. The figment::Value can then be passed to conf as a document, since it implements serde::de::Deserializer. Then conf is driving the initialization of your struct, and not serde_derive, which retains all the benefits of conf's design.

In this manner, you can get all 6 categories of hierarchical config in your app if needed, without significant restrictions on config file formats.

You can see a more complete example and tests in the repo.

In the future, we may extend our API so that the figment::Metadata, which tracks the provenance of individual values, can also be passed on to conf and used in error messages.

Documenting the config file format

The suggested way to help users of your program understand the config file format is:

  • Have some examples committed to your repo, and have tests that they parse correctly
  • Either distribute these with the documentation, or along with the release artifacts, or bake them into the binary and add a CLI option which makes the binary emit them.

For example, the AWS CLI tool provides options to emit a config skeleton for many commands, such as, aws ecs register-task-definition --generate-cli-skeleton.

Secrets

conf tries to provide the most helpful and detailed errors that it can, and also to report as many problems as it can when parsing fails.

Usually, if a user-provided value cannot be parsed, we want to provide the value and the error in the error message to help debugging. But if the value represents a secret, then logging its value is bad.

To prevent conf from logging the value, you can mark the field as secret.

    #[arg(env, secret)]
    pub api_key: ApiKey

When conf knows that something is a secret, it will avoid revealing the value when generating any kind of error message or help text. conf will also describe it with the [secret] tag in the help text.

Handling secrets is a complex topic and much of the discussion is out of scope here. We'll offer just three points of guidance around this tool.

  1. The more valuable the secrets are, and the more challenging the threat model is, the more time it makes sense to spend working on defensive measures. The converse is also true. No one really has context to judge this except you, so instead of offering one-size-fits-all guidance, I prefer to think in terms of a sliding scale.
  2. If you're at a point where systematically marking things secret seems like a good idea, then you should also be using special types to manage the secrets. For example, using SecretString from the secrecy crate instead of String will prevent your password from appearing in debug logs after it has been loaded. There are alternatives out there if secrecy crate doesn't work for your use-case. This is usually a pretty low-effort improvement, and it goes hand-in-hand with what the secret marking does.
    • It's very easy to expose your secret by accident if you don't do something like this. For example, just by putting a #[tracing::instrument] annotation on a function that some day takes a config struct, you could accidentally log your password.
  3. If you're at a point where you think you need to systematically zeroize all copies of your secret that reside in process memory when they are no longer needed, then you are past the point where you can use an environment variable to pass the secret value to the application. Your application most likely needs to read the secret value from a file instead.
    • The rust standard library handles environment values as std::ffi::OsString internally and in its API, but this type cannot be securely zeroized. There are no public APIs to mutably access the underlying bytes, and no public APIs that would otherwise do this for you.
    • At a lower level, glibc exposes the environment as char **environ, makes copies of the entire environment whenever it is changed using set_var or similar, and leaks the old values. It is difficult to systematically ensure that all of these copies are cleaned up if they contain sensitive data. environ often gets copied by other things very early in the process. The rust standard library also interacts with the environment via these glibc APIs, which means that typical rust libraries like dotenvy do as well.

Argument groups and constraints

clap has support for the concept of "argument groups" (ArgGroup) and also "dependencies" among Arg's. This is used to create additional conditions that must be satisfied for the config to be valid, and error messages if it is invalid. clap provides many functions on Arg and on ArgGroup which can be used to define various kinds of constraints, such as conditional dependency or mutual exclusion, between Arg's or ArgGroup's.

The main reason to use these features in clap is that it will generate nicely formatted errors if these constraints are violated, and then you don't have to worry about handling the situation in your application code.

conf similarly wants to support adding constraints in this manner that are checked during parsing, but the design goal is that all of these errors should reportable alongside all the other types of errors.

For several reasons, conf chose to offer a different API than the clap for these purposes.

  • In clap, this API was designed first for the clap builder API, and then exposed via the clap-derive API.
  • There are about a dozen functions exposed in total, and multiple named concepts (Arg is now joined by ArgGroup which is different from Args)
  • The API relies on explicit id values for Arg's and ArgGroups, but this is less idiomatic in the derive API. The derive API is simpler from the user's point of view if these id's are not really exposed and are more like implementation details.
  • The API often provides multiple ways to do the same thing, which makes code that uses it less predictable.
  • The API has many defaults that I find hard to remember. For example, in an ArgGroup, does required default to true or false? Does multiple default to true or false? These defaults are different for an Args.
  • Sometimes the API doesn't feel idiomatic. For example if I have a group of options where if one of them appears, all of them must appear, the most idiomatic thing is if the API can give me a single Option that includes all of them. Otherwise I have to unwrap a bunch of options in application code, on the assumption that my constraint works as expected.

conf provides one mechanism for idiomatically representing when some collection of arguments are optional-but-mutually-required. Then it provides a few one-offs to express exclusivity between arguments. Finally, it provides a very general mechanism that can express arbitrary constraints.

flatten-optional

conf supports the following syntax:

#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}

Intuitively, this means that the snaps_service config is optional, and if none of those fields appear, that's not an error, and snaps_service will be None in the parsed config object. However, if any of the fields of snaps_service appear, then all of its required fields must appear, and parsing the entire flattened object must succeed.

This allows the code that consumes the conditional config to be simpler -- you can just match on whether snaps_service is present or absent, and the type system encodes that when any of those fields are present, all are present. And you can express which arguments in the group are required to be present or not by marking them optional or not (or giving them a default value), within HttpClientConfig.

This feature actually covers every use-case I've had in real-life for argument groups and constraints in clap across all my web projects, and I like it because I feel that it introduces fewer named concepts and promotes code reuse. The same struct can be flattened in a required way in one setting and in an optional way in another setting.

Hopefully it's easy to remember what it means, just by looking at the type of the data, and thinking about what would have to happen for it to succeed. If we can't see any of the (prefixed) substructure's fields appearing, then we return None. If we see some of them appearing, it indicates that we're supposed to be producing a Some. Once we decide that we're supposed to produce Some, it's an error if we can't do so in the normal (non-optional) manner for flatten'ed structures.

one_of_fields

conf provides a simple way to specify that some fields in a struct are mutually exclusive.

#[derive(Conf)]
#[conf(at_most_one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(long, env)]
    pub c: Vec<String>,
}

When used with two fields, it provides a way to translate many usages of conflicts_with in the clap-derive API.

When used with all fields in a struct, it is similar to an ArgGroup with multiple=false and required=false in the clap-derive API.

This also works with the flatten-optional feature, so one or more optional flattened groups can be made exclusive with eachother or with simple arguments in this structure.

However, it can only be used with fields on the struct that is marked with this attribute, and cannot be used with fields inside of flattened structs, or elsewhere in the structure.

conf provides a variation which requires exactly one of the fields to appear.

#[derive(Conf)]
#[conf(one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(long, env)]
    pub c: Vec<String>,
}

When used with all fields in a struct, this is similar to an ArgGroup with multiple=false and required=true in the clap-derive API.

Finally conf provides one more variation

#[derive(Conf)]
#[conf(at_least_one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(long, env)]
    pub c: Vec<String>,
}

When used with all fields in a struct, this is similar to an ArgGroup with multiple=true and required=true in the clap-derive API.

Any of these attributes can be used multiple times on the same struct to create multiple constraints that apply to that struct.

validation predicate

flatten-optional and one_of_fields provide some easy-to-understand ways to create dependencies and exclusion constraints between different optional fields in a conf structure. They can directly translate many simple uses of ArgGroup and some of the constraints in the clap-derive API. But, there are many other constraint types supported by clap that don't translate directly into this, and we don't support declaring arg group membership directly on a field, which is something that clap does support.

At the same time, there are other kinds of constraints you might have a legitimate use for that you can't express in clap's API. For example, one of your arguments might be a Url object, and you might want to require that if the Url starts with https then some other options are required. As far as I know, there's no way to do this in clap.

Instead of providing direct analogues for every function in clap's constraint API, conf supports user-defined validation predicates on a per-struct basis.

A validation predicate is a function that takes &T where T is the struct at hand, and returns Result<(), impl Display>.

It behaves similarly to value_parser, in that any function expression can be accepted.

The idea here is, rather than adding increasing numbers of one-off constraint types to conf, or enabling you to write non-local constraints using proc-macro attributes, it will be more maintainable for you and for conf if you just express what you want in rust code, once your constraints get sophisticated enough. There's both less API for you to learn and remember, and less API surface area for conf to test and maintain. You will also be able to generate very precise error messages when complex constraints fail.

Using these features together, you can express any kind of constraint you want to impose on your config structure, and hopefully make it feel idiomatic and natural.


Given that the validation_predicate for a T runs after we have actually parsed a T, why have this feature at all? The users could just run such functions on their own after Config::parse succeeds.

The benefit of using the validation_predicate is that if a predicate fails, conf is still able to report those errors and any other errors that occurred elsewhere in the tree.

For example, in this config struct:

#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}

It's possible that when parsing a Config, the auth_service fails to parse because of a missing required argument, friend_service fails to parse because of a missing argument and an invalid value, and snaps_service parses but fails its validation predicate. In this scenario conf will report all of these errors, which distinguishes it from other crates in this genre.

Who should use this crate?

The best reason to use this is crate is if you have a medium-to-large project, such as a web app consisting of multiple services, which has a lot of configuration needs. You have multiple services that have several common subsystems or components, and these components have subcomponents, some of which are shared, etc., all of which should read config from the environment in accordance with 12-factor style, and may need to read more such config on short notice. You may already be using clap-derive but have run into limitations as your project has grown.

The purpose of the crate is to help you arrange all of that config in the simplest and most maintainable way possible, while still ensuring that all values that are needed are checked for on program startup (failing fast), reporting as many configuration errors as possible in the most helpful way possible when your deployment goes bad, and providing automated --help documentation of all of the config that is being read.

If you think that this crate is a good fit for you, the suggested way to use it is:

  • Whenever you have a component that you think should use a value that is read on startup, you should create a config struct for that component. You should derive(Conf) on that struct, and pass that config struct to the component on initialization. The config struct should live in the same module as the component that it is configuring.
  • If your component is initialized by a larger component, then that component should have its own config struct and you should use flatten to assemble it. You should usually use the prefix and help_prefix options when flattening.
  • Each binary target should have a config struct, and should ::parse() it in fn main().

This way, whenever you discover in the future that you need to add more config values for one of your small components, all you have to do is add it to the associated config struct, and it will automatically appear in every service that needs it, as many times as needed with appropriate prefixing, without you having to plumb it through every step of the way. Additionally, it makes it easier to create correct config for any future services or tools. And it causes all of your services and tools to have a similar, predictable style, and to have all of their config documented in --help, even pretty obscure environment variables and such, which usually just don't get documented if you choose to read them directly from std::env instead.

When should clap-derive be preferred to this crate?

This crate defines itself somewhat differently from clap-derive and has different features and goals.

  • clap-derive is meant to be an alternative to the clap builder API, and exposes essentially all of the features of the builder.
  • clap itself is primarily a CLI argument parser per maintainers, and many simple features around env support, like, arguments that can only be read from env, are considered out of scope.

conf places emphasis on features differently.

  • env is actually the most important thing for a 12-factor web app.
  • conf has a different architecture, such that it's easier to pass information at runtime between a struct and the struct that it is flattened into, in both directions. This enables many of the new features that it brings to the table. The details are not part of the public API, so that they can be extended to support new features without a breaking change.
  • conf has very specific goals around error reporting. We want to return as many config errors as possible at once, because deployment might take a relatively long time.

In order to meet its goals, conf does not use clap to handle env at all. clap is only used to parse CLI arguments as strings, and to render help text, which are the two things that it is best at.

This crate can expose more features of the underlying clap builder and get closer towards the feature set offered by clap-derive, but will probably never expose all of them -- we can only expose features that we are sure will work well with the additional features that we have created, like flatten-with-prefix, and will work well with the manner in which we are using the underlying clap builder. The most interesting features are those that can be motivated by common web development needs.

If you have very specific CLI argument parsing needs, or if you need pixel-perfect help text, you will be better off using clap directly instead of this crate, because you will have more control that way. clap is the most mature and feature-complete CLI argument parser out there, by a wide margin.

In many web projects, you don't really have such needs. You aren't making very sophisticated use of clap, your project is small, and you don't particularly need any features of conf either, so you will be able to use clap-derive or conf equally well and not notice very much difference.

If you prefer, you can stick with clap-derive, and then only if you find that you need flatten-with-prefix or another feature, try to switch to conf at that point.

conf is designed to make this migration relatively easy for such projects. (Indeed, I started working on conf because I had several large projects on clap-derive and I was hitting limitations and being forced info workarounds that I wasn't happy with, and I couldn't find a wholly satsifactory alternative.) If you find that you get stuck when trying to migrate, you can open a discussion and we can try to help.

License

Code is available under MIT or Apache 2 at your option.