input-output-hk · minikin · Oct 20, 2024 · Oct 16, 2024 · Oct 16, 2024 · Oct 16, 2024
diff --git a/.config/dictionaries/project.dic b/.config/dictionaries/project.dic
@@ -106,6 +106,7 @@ icudtl
 ideascale
 idents
 ilap
+IMEI
 Instantitation
 integ
 Intellij
@@ -121,8 +122,8 @@ jormungandr
 Jörmungandr
 junitreport
 junitxml
-Keyhash
 keychains
+Keyhash
 keyserver
 keyspace
 KUBECONFIG

diff --git a/.earthlyignore b/.earthlyignore
@@ -1,4 +1,5 @@
 # Files and directories created by pub
+
 **/.dart_tool/
 **/.packages
 **/build/
@@ -11,4 +12,9 @@
 **/Earthfile
 
 # node related
-**/node_modules/
+
+**/node_modules/
+
+# Rust related
+
+**/target
diff --git a/.vscode/extensions.json b/.vscode/extensions.json
@@ -20,6 +20,8 @@
         "bbenoist.vagrant",
         "ms-kubernetes-tools.vscode-kubernetes-tools",
         "fill-labs.dependi",
-        "lawrencegrant.cql"
+        "lawrencegrant.cql",
+        "ldez.ignore-files",
+        "davidlday.languagetool-linter"
     ]
 }
diff --git a/.vscode/settings.recommended.json b/.vscode/settings.recommended.json
@@ -52,7 +52,9 @@
   "cSpell.autoFormatConfigFile": true,
   "cSpell.language": "en,en-US",
   "files.associations": {
-    "Earthfile": "earthfile"
+    "Earthfile": "earthfile",
+    ".pages": "yaml",
+    ".earthlyignore": "ignore"
   },
   "rust-analyzer.linkedProjects": [
     "./catalyst-gateway/Cargo.toml"
@@ -95,5 +97,11 @@
   "rust-analyzer.lens.enable": true,
   "vs-kubernetes": {
     "vs-kubernetes.kubeconfig": "utilities/local-cluster/shared/k3s.yaml"
-  }
+  },
+  "dependi.rust.informPatchUpdates": true,
+  "dependi.decoration.compatible.style": null,
+  "git.enableCommitSigning": true,
+  "languageToolLinter.external.url": "https://api.languagetoolplus.com",
+  "languageToolLinter.languageTool.language": "en-US",
+  "languageToolLinter.languageTool.motherTongue": "en-US"
 }
diff --git a/docs/src/architecture/09_architecture_decisions/0007-api-versions.md b/docs/src/architecture/09_architecture_decisions/0007-api-versions.md
@@ -0,0 +1,97 @@
+---
+    title: 0007 Api Versions
+    adr:
+        author: Steven Johnson
+        created: 16-Oct-2024
+        status:  proposed
+    tags:
+        - api
+---
+
+## Context
+
+The Catalyst Voices backend service, known as Catalyst Gateway, provides an HTTP API to front end services.
+
+It is required that the API be:
+
+* Stable enough for Frontend integrations to be reliable;
+* Flexible enough to allow endpoints to evolve, be modified or removed over time.
+
+A secondary consideration is that wherever practical, the API be defined such that it can be replicated on the Hermes Engine.
+
+## Assumptions
+
+That Catalyst Gateway will provide its API's with an HTTP API, defined by an OpenAPI specification.
+
+## Decision
+
+All Individual API Endpoints will be named and versioned according to the following rules:
+
+1. All endpoints URIs will be under the `/api` path in the backend.
+2. The general structure of any endpoint will be `/api/<version>/<path>`
+3. `<version>` defines a version of the endpoint supplied by a `<path>` and is **NOT** aligned with any other versions.
+4. `<version>` **MUST** either be `draft` or matching the regular expression `^v(?!0)\d{1,}$`[^1]
+5. `<path>` Can be any path of at least one element as required for the endpoint.
+6. **ALL** new endpoints start as `draft` they are promoted to a versioned endpoint by a deliberate decision.
+7. New endpoints **MUST NOT** be versioned other than `draft` in their initial PR.
+8. New endpoints can only be versioned after the behavior and form of the endpoint has been validated, in a subsequent PR.
+9. *ALL* draft endpoints are subject to change and can be safely modified in any way.
+10. `draft` Versioned endpoints can be used by frontend code.  
+However, if the API changes or disappears, the front end should fail gracefully.
+11. The numbered version of an endpoint will always start at `v1` and increments sequentially.
+12. API versions are **ONLY** incremented if their behavior changes in a way that can reasonably be considered a breaking change
+with respect to the currently published OpenAPI specification for that Endpoint.
+13. The API endpoint versions are not aligned to any semantic versioning of the backend.
+14. `draft` endpoints should not be tested in CI, in such a way as they break the CI pipeline if they change.
+15. **ALL** versioned endpoints **SHOULD** have CI tests which ensure the API endpoint itself has not changed in a breaking way.
+16. Code generation can generate code for `draft` endpoints, provided doing so does not result in breaking CI.
+17. Any Integration tests written against `draft` endpoints should not fail.  
+They may produce warnings if they do not match expected outputs.
+18. Re-versioning an endpoint is NOT required if the change to it is backward compatible with the existing endpoint.
+A non-exhaustive list of possible cases for this are:
+    * A new *OPTIONAL* query parameter is added to an endpoint.
+    * A response field is added, and the OpenAPI document defined that `"additionalProperties": true`
+in the schema definition where the additional field is to be added.
+    * A previously undocumented behavior is documented.  
+In this case, the behavior must already exist, it is not a breaking change to clarify documentation.
+19. *ALL* non-breaking changes proposed in a PR to existing API's,
+or migrating an API from `draft` to a versioned API **MUST** be signed off by
+    * at least 1 Architect on the Team; and
+    * the Engineering manager; and preferably
+    * A senior member of the team with primary responsibility for the frontend.
+20. When a new version of an endpoint is released, the existing endpoint **MUST** be marked as `deprecated`
+in the OpenAPI specification for the endpoint.
+This **MUST** occur in the same PR as the PR that moves the endpoint from `draft` to a versioned API.
+21. PRs which version a `draft` endpoint **SHOULD ONLY** include the following.
+There can be no other changes to the logic in an API versioning PR.
+    * The change from `draft` to the required version;
+    * If it is supersedes an existing endpoint version, marking that endpoint as `deprecated`.
+    * Updating any draft integration tests to match the new version of the endpoint, and ensure they will Fail CI, if they fail.
+
+The purpose of these rules is to allow us to iterate quickly and implement new API endpoints.
+And remove unnecessary risks of breaking the front end, or CI while adding new endpoints.
+
+Currently, the API is mostly unstable.  
+We also do not have any external consumers of the API to consider.
+However, once we enter production with the API we will need a strategy for deprecating and removing obsolete API endpoints.
+That will be the subject of a further ADR related specifically to that topic.
+
+## Risks
+
+There are no significant risks identified for the ADR.
+
+## Consequences
+
+If we do not do this, change management of the API will quickly become difficult and unreliable.
+
+## More Information
+
+* [Free Code Camp - How to Version a REST API](https://www.freecodecamp.org/news/how-to-version-a-rest-api/)
+* [Postman - API Versioning](https://www.postman.com/api-platform/api-versioning/)
+
+[^1]: `^` asserts the position at the start of the string.</br>
+`v` matches the character `v`.</br>
+`(?!0)` is a negative look-ahead assertion that checks if the digit after 'v' is not equal to 0.</br>
+`\d{1,}` matches one or more digits (1 to unlimited) which ensures that there are no leading zeros in the numeric part of the string.</br>
+`$` asserts the position at the end of the string.</br>
+For example, `v123` would be a valid string, but `v0` or `v00789` would not match this pattern because they have leading zeros in the numeric part of the string.
@@ -0,0 +1,160 @@
+---
+    title: 0008 Structured Logging
+    adr:
+        author: Steven Johnson
+        created: 17-Oct-2024
+        status:  proposed
+    tags:
+        - logging
+---
+
+<!-- cspell: words Sematext,Stackify -->
+
+## Context
+
+Both Backend and Frontend components of Catalyst Voices produce log messages.
+These log message help in development, but also in fault-finding in production.
+
+Structured logging is a way of logging which separates the log text message from the data pertinent to the logged event.
+
+## Assumptions
+
+That both the Frontend and Backend are using logging libraries capable of structured logging.
+
+## Decision
+
+* We will use structured logging for all log messages that are to be sent to a log collection system.
+* We will not use string formatting within structured log messages, but embed all data within fields in the log message.
+* All log levels *MUST* use structured logging.
+  * It is possible that **DEBUG** or **TRACE** level logs can selectively be enabled in production to help investigate issues.
+* Log text messages should be specific enough to identify the actual log messages when searching or summarizing logs.
+  * Avoid using the same log message multiple times in the same code.
+* As far as we are able, we should be consistent with field names.
+  * For example, we always use "error" for error values, not "error" or "err" or "e".
+  * Fields do not need to have the same name as the variable, what is important is the field name be unambiguous and consistent.
+* Logs *MUST* include the date and time of the event being logged.
+  * Date and Time *MUST* always be referenced to UTC, and not Local Time.
+  * Date and Time *SHOULD* be formatted according to [RFC3339] (An open formalized subset of [ISO8601]).
+  * If [RFC3339] is not possible to use, then [ISO8601] format *MAY* be used instead.
+* Logs *SHOULD*, if possible, include the filename, and line of the log message in the code.
+* Loggers *SHOULD* be configured to automatically include log fields required for every log message.
+* All logs *MUST* obfuscate a client's private information before logging.
+  * A non-exhaustive list of private information includes:
+    * IP Addresses
+    * User-Agent Strings
+    * Location Information (Longitude/Latitude)
+    * Device identifiers (IMEI, MAC Address, etc.)
+    * Browser Fingerprints
+  * For data like this, that uniquely identifies a client or session, the obfuscation of that data should be consistent.
+    * For example, `ip addresses` can be hashed and turned into a [UUID].
+      * The [UUID] in this case will always be the same for an `ip address`.
+      * The privacy of the client is protected because it is not easily possible to derive the `ip address` from the [UUID].
+      * This gives us the ability to cluster multiple interactions from a common connection without revealing the users' identity.
+      * It also allows the Logs to help provide support to a user if they supply their IP Address. (Client Directed Unmasking).
+* *NEVER* log Security sensitive information either in the clear or obfuscated, such as (non-exhaustive):
+  * Usernames
+  * Email Addresses
+  * Passwords
+  * API Keys
+  * Private Encryption Keys
+
+*NOTE :* **IF AND ONLY IF** log messages are a developer convenience,
+**AND** will never be collected for online diagnostic purposes,
+**THEN** they do not need to follow this ADR.
+*IF in doubt, use structured logging.*
+
+### Log Levels
+
+The Standard Abstract Log levels are defined (from the highest priority to lowest) as:
+
+* **CRITICAL** *(Optional)* - When a system has completely failed.
+  * **ERROR** can also be used for this.
+* **ERROR** - Something has failed which normally should not.
+  * Does not have to be a fatal failure.
+  * The Log message should clearly show if the error is fatal or unrecoverable.
+    * For example, a failed DB connection is an *ERROR* but is not fatal.
+    * Errors should be considered not fatal, unless they explicitly state otherwise.
+* **WARNING** - Something which can happen, but normally shouldn't.
+* **INFO** - Important System informational messages.
+* **DEBUG** - Highest level detailed system operation logs.
+* **TRACE** - Verbose and detailed system operation logs.
+
+Individual languages may use different terms for these, and may have more or less defined levels.
+Use the language's native log levels which map closest to this list.
+For example, some languages do not have *CRITICAL*, we would map that level to *ERROR*.
+In all such cases, use the next most appropriate level which matches closest with the above definitions.
+
+#### Logging in Libraries
+
+* Libraries *SHOULD NOT* use *INFO* level logs.
+  * this level of log should be used only by applications or services.
+* Libraries *MAY* use any other log level.
+
+### Log Verbosity
+
+Logs should aim to provide exactly the right amount of information, and no more.
+
+* All **ERROR** logs should be logged in the most appropriate location.
+  * **ERROR** or **WARNING** logs, *SHOULD* include enough context to identify the actual source of the error.
+* Info level logs should only be used to provide ***IMPORTANT*** operational data.
+  * Such as:
+    * Build or configuration information.
+    * The start of internal persistent services
+  * **INFO** logs should not typically continuously stream as a result of normal system use.
+  * API Endpoint statistics logs are an example of an exception to this guidance.
+    * However, they *SHOULD* also have a method of selectively enabling/disabling these *INFO* logs.
+    * The same principle *SHOULD* apply to any other regularly logged **INFO** level logs that may be required.
+* **DEBUG** level logs should be assumed to be enabled selectively in production.
+  * Their purpose *SHOULD* be aimed to helping to diagnose faults.
+* Detailed and Verbose or Streaming Debug level logs should be at **TRACE** level.
+
+### Example of Unstructured Logging to avoid
+
+An example of unstructured logging in rust, which is not to be used:
+
+```rust
+    error!("Hello {name} An error occurred in {thing}, doing {something:?}: {err}");
+```
+
+The problems this example exhibits are:
+
+1. This is difficult to process with automation.
+2. It would need to be searched with a regex.
+3. The fixed words probably appear in similar patterns in other log messages, making it difficult to discern what the log is about.
+4. It may be easy for a developer to read.
+However, it is not easy to read when there are tens of thousands of logs from multiple instances of a service.
+
+### Example of Structured Logging to utilize
+
+Instead, we should use:
+
+```rust
+    error!(name=name, thing=%thing, something=?something, error=?err, "An error occurred processing named updates to the database");
+```
+
+* Each piece of dynamic data is a field, including the string representation of the error.
+* The message explains what the error is and helps locate the log both in the log messages itself, and in the code.
+
+## Risks
+
+The only risk relates to not doing structured logging.
+It will make the system harder to manage for Operations and Support staff.
+
+***ALWAYS REMEMBER:
+Log messages are intended to be primarily read by operations and support staff.
+Do not assume they know how the code works, just because you might.
+Attempt to make log messages helpful to the people who will work with the system.***
+
+## Consequences
+
+Structured logging makes searching, summarizing and parsing log messages significantly easier.
+Without it, log messages quickly become difficult to use, which defeats the purpose of generating them.
+
+## More Information
+
+* [Sematext: What is structured logging](https://sematext.com/glossary/structured-logging/)
+* [Stackify: What Is Structured Logging and Why Developers Need It](https://stackify.com/what-is-structured-logging-and-why-developers-need-it/)
+
+[RFC3339]: <https://www.rfc-editor.org/rfc/rfc3339>
+[ISO8601]: <https://www.iso.org/iso-8601-date-and-time-format.html>
+[UUID]: <https://www.rfc-editor.org/rfc/rfc9562.html>