Skip to content

Latest commit

 

History

History
391 lines (303 loc) · 11.3 KB

configuration.md

File metadata and controls

391 lines (303 loc) · 11.3 KB

Configuration

Benthos pipelines are configured in a YAML file that consists of a number of root sections, the key parts being:

Arranged like so:

input:
  type: kafka_balanced
  kafka_balanced:
    addresses: [ TODO ]
    topics: [ foo, bar ]
    consumer_group: foogroup

buffer:
  type: none

pipeline:
  processors:
  - type: jmespath
    jmespath:
      query: '{ message: @, meta: { link_count: length(links) } }'

output:
  type: s3
  s3:
    bucket: TODO
    path: "${!metadata:kafka_topic}/${!json_field:message.id}.json"

Config examples for every input, output and processor type can be found here.

These types are hierarchical. For example, an input can have a list of child processor types attached to it, which in turn can have their own condition or even more processor children.

This is powerful but can potentially lead to large and cumbersome configuration files. This document outlines tooling provided by Benthos to help with writing and managing these more complex configuration files.

Testing

For guidance on how to write and run unit tests for your configuration files read this guide.

Contents

Concise Configuration

It's often possible to make your configuration files more concise but less explicit by omitting the type field in components as well as any fields that are default. For example, the above configuration could be written as:

input:
  kafka_balanced:
    addresses: [ TODO ]
    topics: [ foo, bar ]
    consumer_group: foogroup

pipeline:
  processors:
  - jmespath:
      query: '{ message: @, meta: { link_count: length(links) } }'

output:
  s3:
    bucket: TODO
    path: "${!metadata:kafka_topic}/${!json_field:message.id}.json"

Customising Your Configuration

Sometimes it's useful to write a configuration where certain fields can be defined during deployment. For this purpose Benthos supports environment variable interpolation, allowing you to set fields in your config with environment variables like so:

input:
  type: kafka_balanced
  kafka_balanced:
    addresses:
    - ${KAFKA_BROKER:localhost:9092}
    topics:
    - ${KAFKA_TOPIC:default-topic}

This is very useful for sharing configuration files across different deployment environments.

It's also possible to use type fields for environment variable based feature toggles. For example, the following config:

type: ${FEATURE:noop}
jmespath:
  query: '{ enveloped: @ }'
text:
  operator: set
  text: "Wrapped: ${!content}"

Allows us to use the env var FEATURE to choose between two different processor steps (or neither.)

Reusing Configuration Snippets

It's possible to break a large configuration file into smaller parts with JSON references. Benthos doesn't yet support the full specification as it only resolves local or file path URIs, but this still allows you to break your configs down significantly.

To reference a config snippet use the $ref keyword:

local_reference:
  $ref: '#/path/to/field'

file_reference:
  $ref: './foo.yaml'

file_field_reference:
  $ref: './foo.yaml#/path/to/field'

For example, suppose we have a configuration snippet saved under ./config/foo.yaml:

pipeline:
  processors:
  - type: cache
    cache:
      operator: get
      key: ${!json_field:id}
      cache: objects

And we wished to use this snippet within a larger configuration file ./config/bar.yaml. We can do so by adding an object with a key $ref and a string value which is the path to our snippet:

pipeline:
  processors:
  - type: decompress
    decompress:
      algorithm: gzip

  - "$ref": "./foo.yaml#/pipeline/processors/0"

When Benthos loads this config, it will resolve the reference, resulting in this configuration:

pipeline:
  processors:
  - type: decompress
    decompress:
      algorithm: gzip

  - type: cache
    cache:
      operator: get
      key: ${!json_field:id}
      cache: objects

Note that the path of a reference is relative to the configuration file containing the reference, therefore the path used above is ./foo.yaml and not ./config/foo.yaml.

If you like, these references can even be nested.

It is further possible to use environment variables to specify which snippet to load. This works because environment variable interpolations within configurations are resolved before references are resolved.

pipeline:
  processors:
  - type: decompress
    decompress:
      algorithm: gzip

  - "$ref": "./${TARGET_SNIPPET}#/pipeline/processors/0"

Running the above with TARGET_SNIPPET=foo.yaml benthos -c ./config/bar.yaml would be equivalent to the previous example.

Enabling Discovery

The discoverability of configuration fields is a common headache with any configuration driven application. The classic solution is to provide curated documentation that is often hosted on a dedicated site. Benthos does this by generating a markdown document per configuration section.

However, a user often only needs to get their hands on a short, runnable example config file for their use case. They just need to see the format and field names as the fields themselves are usually self explanatory. Forcing such a user to navigate a website, scrolling through paragraphs of text, seems inefficient when all they actually needed to see was something like:

input:
  type: amqp
  amqp:
    url: amqp://guest:guest@localhost:5672/
    consumer_tag: benthos-consumer
    exchange: benthos-exchange
    exchange_type: direct
    key: benthos-key
    prefetch_count: 10
    prefetch_size: 0
    queue: benthos-queue
output:
  type: stdout

In order to make this process easier Benthos is able to generate usable configuration examples for any types, and you can do this from the binary using the --example flag in combination with --print-yaml or --print-json. If, for example, we wanted to generate a config with a websocket input, a Kafka output and a JMESPath processor in the middle, we could do it with the following command:

benthos --print-yaml --example websocket,kafka,jmespath

There are also examples within the config directory, where there is a config file for each input and output type, and inside the processors subdirectory there is a file showing each processor type, and so on.

All of these generated configuration examples also include other useful config sections such as metrics, logging, etc with sensible defaults.

Printing Every Field

The format of a Benthos config file naturally exposes all of the options for a section when it's printed with all default values. For example, in a fictional section foo, which has type options bar, baz and qux, if you were to print the entire default foo section of a config it would look something like this:

foo:
  type: bar
  bar:
    field1: default_value
    field2: 2
  baz:
    field3: another_default_value
  qux:
    field4: false

Which tells you that section foo supports the three object types bar, baz and qux, and defaults to type bar. It also shows you the fields that each section has, and their default values.

The Benthos binary is able to print a JSON or YAML config file containing every section in this format with the commands benthos --print-yaml --all and benthos --print-json --all. This can be extremely useful for quick and dirty config discovery when the full repo isn't at hand.

As a user you could create a new config file with:

benthos --print-yaml --all > conf.yaml

And simply delete all lines for sections you aren't interested in, then you are left with the full set of fields you want.

Alternatively, using tools such as jq you can extract specific type fields:

# Get a list of all input types:
benthos --print-json --all | jq '.input | keys'

# Get all Kafka input fields:
benthos --print-json --all | jq '.input.kafka'

# Get all AMQP output fields:
benthos --print-json --all | jq '.output.amqp'

# Get a list of all processor types:
benthos --print-json --all | jq '.pipeline.processors[0] | keys'

# Get all JSON processor fields:
benthos --print-json --all | jq '.pipeline.processors[0].json'

Help With Debugging

Once you have a config written you now move onto the next headache of proving that it works, and understanding why it doesn't. Benthos, like most good config driven services, performs validation on configs and tries to provide sensible error messages.

However, with validation it can be hard to capture all problems, and the user usually understands their intentions better than the service. In order to help expose and diagnose config errors Benthos provides two mechanisms, linting and echoing.

Linting

Benthos has a lint command (--lint) that, after parsing a config file, will print any errors it detects.

The main goal of the linter is to expose instances where fields within a provided config are valid JSON or YAML but don't actually affect the behaviour of Benthos. These are useful for pointing out typos in object keys or the use of deprecated fields.

For example, imagine we have a config foo.yaml, where we intend to read from AMQP, but there is a typo in our config struct:

input:
  type: amqp
  amqq:
    url: amqp://guest:guest@rabbitmqserver:5672/

This config is parse successfully, and Benthos will simply ignore the amqq key and run using default values for the amqp input. This is therefore an easy error to miss, but if we use the linter it will immediately report the problem:

$ benthos -c ./foo.yaml --lint
input: Key 'amqq' found but is ignored

Which points us to exactly where the problem is.

Echoing

Echoing is where Benthos can print back your configuration after it has been parsed. It is done with the --print-yaml and --print-json commands, which print the Benthos configuration in YAML and JSON format respectively. Since this is done after parsing and applying your config it is able to show you exactly how your config was interpretted:

benthos -c ./your-config.yaml --print-yaml

You can check the output of the above command to see if certain sections are missing or fields are incorrect, which allows you to pinpoint typos in the config.

If your configuration is complex, and the behaviour that you notice implies a certain section is at fault, then you can drill down into that section by using tools such as jq:

# Check the second processor config
benthos -c ./your-config.yaml --print-json | jq '.pipeline.processors[1]'

# Check the condition of a filter processor
benthos -c ./your-config.yaml --print-json | jq '.pipeline.processors[0].filter'