We have libraries the following libaries which match the logging standards defined below:
- Go - github.com/ONSdigital/log.go
- Java - com.github.onsdigital.logging
- Javascript - app/utilities/log.js
Apps should log all important events, especially failures, as well as enough events to be able to trace the execution logic through each major step without being overly noisy. Any information which adds value is worth including in the event. If you're unsure, log more rather than less.
A few ideas of useful things to log are:
- application startup (with sanitised application config output under
data
) - log all failures
- http requests
- requests to data stores and other third party services (e.g. mongo, postgres, kafka, neptune, etc.)
- log events at significant steps throughout the execution flow
- sometimes it is useful to have
started x
andfinished x
for very long running events, but usually a single event can suffice - avoid back to back log statements as a general rule, especially if the second event implies the first
- sometimes it is useful to have
Always consider the performance impact of logging:
- loops
- log-specific conditionals ("should I log this")
- JSON serialisation
- deeply nested data structures
NEVER log passwords, credentials, auth headers, IP addresses, or other sensitive information in any form.
We have defined the formatting and data structures, including field names and types, we use for logging.
This is important to allow the centralised logging service to properly index the data and make it searchable. Any logs not adhering to the specification may be lost.
Any changes, including the addition of new fields, must be agreed by the team as a whole before being added to this spec. Any changes to this spec will require changes to all of the logging libraries as well as the centralised logging service.
Although "human readable" formatting is useful for local development, all apps deployed to an environment must comply to the following formatting rules.
- All logs must be JSON Lines
- All logs must be UTF8 encoded
- All logs must not be colour formatted
- All logs must not contain newlines
- All logs must not contain tabs
- All logs must not be string escaped
The following are the only top level fields. If you have app specific details you wish to add, it is likely that they should be added under data
rather than creating a new field.
Field name | Required | Type | Example | Description |
---|---|---|---|---|
created_at |
Yes | datetime 1 |
"2019-01-21T16:19:12.356Z" |
Date and time of the log event (ISO8601 extended date and time string) |
namespace |
Yes | string |
"dp-frontend-router" |
Service name or other identifier |
event |
Yes | string |
"connecting to mongodb" |
The event being logged |
trace_id |
No2 | string |
Trace ID from OpenCensus | |
span_id |
No | string |
Span ID from OpenCensus | |
severity |
v2 - Yes | int8 |
2 |
The event severity level code |
http |
No | http |
HTTP event data | |
auth |
No | auth |
Authentication event data | |
errors |
No | []error |
Error event data | |
raw |
No | string |
Log message captured from a third party library | |
data |
No | object 3 |
Arbitrary key-value pairs |
1 All dates must be UTC and in ISO8601 extended date time format with at least millisecond precision: yyyy-MM-dd'T'HH:mm:ss.SSSZZ
(e.g. 2019-01-21T16:19:12.356Z
).
2 Not mandatory, but must be included for all events created during the handling of a request.
3 The event-specific details in data
are input as an object/map in code, but are stored as a text string by the centralised logging service to allow additional detail to be added to an event without needing to worry about key name collision. These details can still be searched using a general text search on the data
field.
HTTP events are so common that we've defined a specific top-level field to capture it. The http
field defines the common http event data.
This can be used to log inbound or outbound HTTP event data.
Field name | Required | Type | Example | Description |
---|---|---|---|---|
method |
Yes | string |
"GET" |
The HTTP method used |
scheme |
Yes | string |
"https" |
The scheme or protocol from the URL |
host |
Yes | string |
"10.100.20.1" |
The hostname or ip from the URL |
port |
Yes | int32 |
443 |
The port number from the URL |
path |
Yes | string |
"/healthcheck" |
The request path from the URL |
query |
No | string |
"x=1&y=1" |
The unparsed query string from the URL |
status_code |
No | int16 |
200 |
The HTTP status code |
started_at |
Yes | datetime 1 |
"2019-01-21T16:19:12.356Z" |
ISO8601 date string of the request start time |
ended_at |
No | datetime 1 |
"2019-01-21T16:19:12.356Z" |
ISO8601 date string of the request end time |
duration |
No | int64 |
236578 |
Difference between started_at and ended_at in nanoseconds |
response_content_length |
No | int64 |
34 |
The length of the response |
1 All dates must be UTC and in ISO8601 extended date time format with at least millisecond precision: yyyy-MM-dd'T'HH:mm:ss.SSSZZ
(e.g. 2019-01-21T16:19:12.356Z
).
The auth
field defines the common auth event data.
Field name | Required | Type | Example | Description |
---|---|---|---|---|
identity |
Yes | string |
"dp-import-tracker" |
The user id or service id |
identity_type |
Yes | string |
"service" |
The identity type (i.e. user or service ) |
Errors are logged as an array. Each element of the array represents a single error and has the following fields.
Field name | Required | Type | Example | Description |
---|---|---|---|---|
message |
Yes | string |
"connection refused" |
The error cause |
stack_trace |
No | []stack_trace |
Stack trace as an array | |
data |
No | object 1 |
Arbitrary key-value pairs |
1 The error-specific details in data
are input as an object/map in code, but are stored as a text string by the centralised logging service to allow additional detail to be added to an event without needing to worry about key name collision. These details can still be searched using a general text search on the data
field.
The stack trace is logged as an array. Each element of the array has the following fields.
Field name | Required | Type | Example | Description |
---|---|---|---|---|
file |
No | string |
"/some/path/main.go" |
The source file containing the code throwing an error |
function |
No | string |
"main.main" |
The function in which the error is occurring |
line |
No | int16 |
18 |
The line in the source file that is throwing the error |
The event
field is a brief description of the type of event.
A good event
description is:
- lowercase
- not a sentence (i.e. is brief and without a full stop)
- generic (e.g.
http request
) - kept consistent where possible
- not a formatted string (additional information should be provided via other fields)
For example:
'event': 'file could not be read from s3'
These are guidelines only - there are some exceptions where it's ok to ignore them, for example when using uppercase to refer to a constant or environment variable:
'event': 'DEBUG is not a boolean'
The event severity levels are used to identify how critical an event is, especially failure events.
There are four severity levels:
Code | Severity | Description |
---|---|---|
0 | FATAL | An catastrophic failure event resulting in the death of the application |
1 | ERROR | An unrecoverable failure event that halts the current flow of execution |
2 | WARN | A failure event that can be managed within the current flow of execution (e.g. retried) |
3 | INFO | All non-failure events |
For example, if in the process of serving an HTTP request an action is retried three times, the first two failures would have a severity of WARN
, whereas the third time would have an ERROR
.
The log libraries attempt to intercept all logs from third party libraries in order to prevent collision with our logging specification. These intercepted log messages should be output as a string under the raw
field with the event
as "third party log"
and severity
of 3
(INFO). All other relevant fields (e.g. created_at
, namespace
, trace_id
, span_id
, etc) should be set as they would normally.
For example:
{
"created_at" : "2019-02-01T13:45:24.157Z",
"namespace" : "my-app",
"trace_id": "1105cb0c04f86a4b6a1abaf74246b87f",
"severity" : 3,
"event": "third party log",
"raw" : "Started ServerConnector@7f4fedd{HTTP/1.1,[http/1.1]}{0.0.0.0:4567}"
}
There are two arbitrary data
fields in the logging spec. One at the top level for general event specific data and one under error
for error specific data. These fields are a map/object in code, but will be stored as a string field by the centralised logging service to allow additional detail to be added to an event without needing to worry about key name collision. These details can still be searched using a general text search on the data
field.
The data
fields are a means of adding event specific details that help in understanding the event, but are not common enough to justify a field in the spec. These fields are an important part of the event data and it is important that the data be useful and well structured.
Following creation of the v2 library in log.go, severity
is now a mandatory field when creating log events. In addition to this, it is not possible to pass severity
as an optional field, therefore removing the possibility of duplicate severity
logs.
Four additional functions have been added to the log library: Info
, Warn
, Error
and Fatal
. These functions wrap the existing Event
function and have been named to correspond with their severity level (shown here). So, for example, the Warn
function corresponds with the WARN severity. These functions can be used to provide information about a specific severity level rather than using the more generic Event
function.
As we already had a public Error
function that is used in our apps to display error messages as an option in the logs, the original Error
function has now been renamed to FormatError
, which also ensures that the wrapper function names (Info
, Warn
, Error
& Fatal
) are consistent with their error level. The functionality of FormatError
remains unchanged.
The Error
and Fatal
functions also require that an error is passed as an argument. An example of this for the Fatal
function would be:
log.Fatal(ctx, "failed to shutdown http server", errors.New("testfatal"))
FormatError
is called within the Error
and Fatal
functions, which ensures that the StackTrace
information is also shown in the log.
TODO: Add kafka log data to the logging spec
- Events should refer to an ID to tie the events across services together - whether this is the same ID as used in context, or another identifier, which has previously been logged alongside the context
- log
topic
and (if consuming)group
andoffset