Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(docs): Add ADR on structured logging. #1018

Merged
merged 17 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .config/dictionaries/project.dic
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ icudtl
ideascale
idents
ilap
IMEI
Instantitation
integ
Intellij
Expand All @@ -121,8 +122,8 @@ jormungandr
Jörmungandr
junitreport
junitxml
Keyhash
keychains
Keyhash
keyserver
keyspace
KUBECONFIG
Expand Down
8 changes: 7 additions & 1 deletion .earthlyignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Files and directories created by pub

**/.dart_tool/
**/.packages
**/build/
Expand All @@ -11,4 +12,9 @@
**/Earthfile

# node related
**/node_modules/

**/node_modules/

# Rust related

**/target
4 changes: 3 additions & 1 deletion .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
"bbenoist.vagrant",
"ms-kubernetes-tools.vscode-kubernetes-tools",
"fill-labs.dependi",
"lawrencegrant.cql"
"lawrencegrant.cql",
"ldez.ignore-files",
"davidlday.languagetool-linter"
]
}
12 changes: 10 additions & 2 deletions .vscode/settings.recommended.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@
"cSpell.autoFormatConfigFile": true,
"cSpell.language": "en,en-US",
"files.associations": {
"Earthfile": "earthfile"
"Earthfile": "earthfile",
".pages": "yaml",
".earthlyignore": "ignore"
},
"rust-analyzer.linkedProjects": [
"./catalyst-gateway/Cargo.toml"
Expand Down Expand Up @@ -95,5 +97,11 @@
"rust-analyzer.lens.enable": true,
"vs-kubernetes": {
"vs-kubernetes.kubeconfig": "utilities/local-cluster/shared/k3s.yaml"
}
},
"dependi.rust.informPatchUpdates": true,
"dependi.decoration.compatible.style": null,
"git.enableCommitSigning": true,
"languageToolLinter.external.url": "https://api.languagetoolplus.com",
"languageToolLinter.languageTool.language": "en-US",
"languageToolLinter.languageTool.motherTongue": "en-US"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: 0007 Api Versions
adr:
author: Steven Johnson
created: 16-Oct-2024
status: proposed
tags:
- api
---

## Context

The Catalyst Voices backend service, known as Catalyst Gateway, provides an HTTP API to front end services.

It is required that the API be:

* Stable enough for Frontend integrations to be reliable;
* Flexible enough to allow endpoints to evolve, be modified or removed over time.

A secondary consideration is that wherever practical, the API be defined such that it can be replicated on the Hermes Engine.

## Assumptions

That Catalyst Gateway will provide its API's with an HTTP API, defined by an OpenAPI specification.

## Decision

All Individual API Endpoints will be named and versioned according to the following rules:

1. All endpoints URIs will be under the `/api` path in the backend.
2. The general structure of any endpoint will be `/api/<version>/<path>`
3. `<version>` defines a version of the endpoint supplied by a `<path>` and is **NOT** aligned with any other versions.
4. `<version>` **MUST** either be `draft` or matching the regular expression `^v(?!0)\d{1,}$`[^1]
5. `<path>` Can be any path of at least one element as required for the endpoint.
6. **ALL** new endpoints start as `draft` they are promoted to a versioned endpoint by a deliberate decision.
7. New endpoints **MUST NOT** be versioned other than `draft` in their initial PR.
8. New endpoints can only be versioned after the behavior and form of the endpoint has been validated, in a subsequent PR.
9. *ALL* draft endpoints are subject to change and can be safely modified in any way.
10. `draft` Versioned endpoints can be used by frontend code.
However, if the API changes or disappears, the front end should fail gracefully.
11. The numbered version of an endpoint will always start at `v1` and increments sequentially.
12. API versions are **ONLY** incremented if their behavior changes in a way that can reasonably be considered a breaking change
with respect to the currently published OpenAPI specification for that Endpoint.
13. The API endpoint versions are not aligned to any semantic versioning of the backend.
14. `draft` endpoints should not be tested in CI, in such a way as they break the CI pipeline if they change.
15. **ALL** versioned endpoints **SHOULD** have CI tests which ensure the API endpoint itself has not changed in a breaking way.
16. Code generation can generate code for `draft` endpoints, provided doing so does not result in breaking CI.
17. Any Integration tests written against `draft` endpoints should not fail.
They may produce warnings if they do not match expected outputs.
18. Re-versioning an endpoint is NOT required if the change to it is backward compatible with the existing endpoint.
A non-exhaustive list of possible cases for this are:
* A new *OPTIONAL* query parameter is added to an endpoint.
* A response field is added, and the OpenAPI document defined that `"additionalProperties": true`
in the schema definition where the additional field is to be added.
* A previously undocumented behavior is documented.
In this case, the behavior must already exist, it is not a breaking change to clarify documentation.
19. *ALL* non-breaking changes proposed in a PR to existing API's,
or migrating an API from `draft` to a versioned API **MUST** be signed off by
* at least 1 Architect on the Team; and
* the Engineering manager; and preferably
* A senior member of the team with primary responsibility for the frontend.
20. When a new version of an endpoint is released, the existing endpoint **MUST** be marked as `deprecated`
in the OpenAPI specification for the endpoint.
This **MUST** occur in the same PR as the PR that moves the endpoint from `draft` to a versioned API.
21. PRs which version a `draft` endpoint **SHOULD ONLY** include the following.
There can be no other changes to the logic in an API versioning PR.
* The change from `draft` to the required version;
* If it is supersedes an existing endpoint version, marking that endpoint as `deprecated`.
* Updating any draft integration tests to match the new version of the endpoint, and ensure they will Fail CI, if they fail.

The purpose of these rules is to allow us to iterate quickly and implement new API endpoints.
And remove unnecessary risks of breaking the front end, or CI while adding new endpoints.

Currently, the API is mostly unstable.
We also do not have any external consumers of the API to consider.
However, once we enter production with the API we will need a strategy for deprecating and removing obsolete API endpoints.
That will be the subject of a further ADR related specifically to that topic.

## Risks

There are no significant risks identified for the ADR.

## Consequences

If we do not do this, change management of the API will quickly become difficult and unreliable.

## More Information

* [Free Code Camp - How to Version a REST API](https://www.freecodecamp.org/news/how-to-version-a-rest-api/)
* [Postman - API Versioning](https://www.postman.com/api-platform/api-versioning/)

[^1]: `^` asserts the position at the start of the string.</br>
`v` matches the character `v`.</br>
`(?!0)` is a negative look-ahead assertion that checks if the digit after 'v' is not equal to 0.</br>
`\d{1,}` matches one or more digits (1 to unlimited) which ensures that there are no leading zeros in the numeric part of the string.</br>
`$` asserts the position at the end of the string.</br>
For example, `v123` would be a valid string, but `v0` or `v00789` would not match this pattern because they have leading zeros in the numeric part of the string.
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: 0008 Structured Logging
adr:
author: Steven Johnson
created: 17-Oct-2024
status: proposed
tags:
- logging
---

<!-- cspell: words Sematext,Stackify -->

## Context

Both Backend and Frontend components of Catalyst Voices produce log messages.
These log message help in development, but also in fault-finding in production.

Structured logging is a way of logging which separates the log text message from the data pertinent to the logged event.

## Assumptions

That both the Frontend and Backend are using logging libraries capable of structured logging.

## Decision

* We will use structured logging for all log messages that are to be sent to a log collection system.
* We will not use string formatting within structured log messages, but embed all data within fields in the log message.
* All log levels *MUST* use structured logging.
* It is possible that **DEBUG** or **TRACE** level logs can selectively be enabled in production to help investigate issues.
* Log text messages should be specific enough to identify the actual log messages when searching or summarizing logs.
* Avoid using the same log message multiple times in the same code.
* As far as we are able, we should be consistent with field names.
* For example, we always use "error" for error values, not "error" or "err" or "e".
stevenj marked this conversation as resolved.
Show resolved Hide resolved
* Fields do not need to have the same name as the variable, what is important is the field name be unambiguous and consistent.
* Logs *MUST* include the date and time of the event being logged.
* Date and Time *MUST* always be referenced to UTC, and not Local Time.
* Date and Time *SHOULD* be formatted according to [RFC3339] (An open formalized subset of [ISO8601]).
* If [RFC3339] is not possible to use, then [ISO8601] format *MAY* be used instead.
* Logs *SHOULD*, if possible, include the filename, and line of the log message in the code.
* Loggers *SHOULD* be configured to automatically include log fields required for every log message.
* All logs *MUST* obfuscate a client's private information before logging.
* A non-exhaustive list of private information includes:
* IP Addresses
* User-Agent Strings
* Location Information (Longitude/Latitude)
* Device identifiers (IMEI, MAC Address, etc.)
* Browser Fingerprints
* For data like this, that uniquely identifies a client or session, the obfuscation of that data should be consistent.
* For example, `ip addresses` can be hashed and turned into a [UUID].
* The [UUID] in this case will always be the same for an `ip address`.
* The privacy of the client is protected because it is not easily possible to derive the `ip address` from the [UUID].
* This gives us the ability to cluster multiple interactions from a common connection without revealing the users' identity.
* It also allows the Logs to help provide support to a user if they supply their IP Address. (Client Directed Unmasking).
* *NEVER* log Security sensitive information either in the clear or obfuscated, such as (non-exhaustive):
* Usernames
* Email Addresses
* Passwords
* API Keys
* Private Encryption Keys

*NOTE :* **IF AND ONLY IF** log messages are a developer convenience,
**AND** will never be collected for online diagnostic purposes,
**THEN** they do not need to follow this ADR.
*IF in doubt, use structured logging.*

### Log Levels

The Standard Abstract Log levels are defined (from the highest priority to lowest) as:

* **CRITICAL** *(Optional)* - When a system has completely failed.
* **ERROR** can also be used for this.
* **ERROR** - Something has failed which normally should not.
* Does not have to be a fatal failure.
* The Log message should clearly show if the error is fatal or unrecoverable.
* For example, a failed DB connection is an *ERROR* but is not fatal.
* Errors should be considered not fatal, unless they explicitly state otherwise.
* **WARNING** - Something which can happen, but normally shouldn't.
* **INFO** - Important System informational messages.
* **DEBUG** - Highest level detailed system operation logs.
* **TRACE** - Verbose and detailed system operation logs.

Individual languages may use different terms for these, and may have more or less defined levels.
Use the language's native log levels which map closest to this list.
For example, some languages do not have *CRITICAL*, we would map that level to *ERROR*.
In all such cases, use the next most appropriate level which matches closest with the above definitions.

#### Logging in Libraries

* Libraries *SHOULD NOT* use *INFO* level logs.
* this level of log should be used only by applications or services.
* Libraries *MAY* use any other log level.

### Log Verbosity

Logs should aim to provide exactly the right amount of information, and no more.

* All **ERROR** logs should be logged in the most appropriate location.
* **ERROR** or **WARNING** logs, *SHOULD* include enough context to identify the actual source of the error.
* Info level logs should only be used to provide ***IMPORTANT*** operational data.
* Such as:
* Build or configuration information.
* The start of internal persistent services
* **INFO** logs should not typically continuously stream as a result of normal system use.
* API Endpoint statistics logs are an example of an exception to this guidance.
* However, they *SHOULD* also have a method of selectively enabling/disabling these *INFO* logs.
* The same principle *SHOULD* apply to any other regularly logged **INFO** level logs that may be required.
* **DEBUG** level logs should be assumed to be enabled selectively in production.
* Their purpose *SHOULD* be aimed to helping to diagnose faults.
* Detailed and Verbose or Streaming Debug level logs should be at **TRACE** level.

### Example of Unstructured Logging to avoid

An example of unstructured logging in rust, which is not to be used:

```rust
error!("Hello {name} An error occurred in {thing}, doing {something:?}: {err}");
```

The problems this example exhibits are:

1. This is difficult to process with automation.
2. It would need to be searched with a regex.
3. The fixed words probably appear in similar patterns in other log messages, making it difficult to discern what the log is about.
4. It may be easy for a developer to read.
However, it is not easy to read when there are tens of thousands of logs from multiple instances of a service.

### Example of Structured Logging to utilize

Instead, we should use:

```rust
error!(name=name, thing=%thing, something=?something, error=?err, "An error occurred processing named updates to the database");
```

* Each piece of dynamic data is a field, including the string representation of the error.
* The message explains what the error is and helps locate the log both in the log messages itself, and in the code.

## Risks

The only risk relates to not doing structured logging.
It will make the system harder to manage for Operations and Support staff.

***ALWAYS REMEMBER:
Log messages are intended to be primarily read by operations and support staff.
Do not assume they know how the code works, just because you might.
Attempt to make log messages helpful to the people who will work with the system.***

## Consequences

Structured logging makes searching, summarizing and parsing log messages significantly easier.
Without it, log messages quickly become difficult to use, which defeats the purpose of generating them.

## More Information

* [Sematext: What is structured logging](https://sematext.com/glossary/structured-logging/)
* [Stackify: What Is Structured Logging and Why Developers Need It](https://stackify.com/what-is-structured-logging-and-why-developers-need-it/)

[RFC3339]: <https://www.rfc-editor.org/rfc/rfc3339>
[ISO8601]: <https://www.iso.org/iso-8601-date-and-time-format.html>
[UUID]: <https://www.rfc-editor.org/rfc/rfc9562.html>
Loading