Skip to content

Conversation

@emmiegit
Copy link
Member

@emmiegit emmiegit commented Jan 23, 2026

This uses the exn crate to change all of DEEPWELL's errors to be context. This strategy is described in Stop Forwarding Errors, Start Designing Them, which discusses how approaches like thiserror, despite being easy, simply passing it on as-is, with no context of how you got there.

It uses the example of:

It’s 3am. Production is down. You’re staring at a log line that says:

Error: serialization error: expected ',' or '}' at line 3, column 7

Which I think is a compelling argument at the start against error forwarding.

A number of ideas are mentioned there, such as human-readable vs machine-processable errors (like if a request can be retried or not), but for our purposes we have a few needs:

  • We need integer error codes for JSONRPC and for end-user error localization.
  • We need errors to be human-readable for debugging.
  • We don't need machine handling such as retries, since all requests go to the outside world, as DEEPWELL is an internal API for framerail et al.

As such, the solution I came up with, there are two parts, the string message (free-form for describing the current stage, intended for developers), and the "error type", which embodies both the error code / type of operation, and has any ancillary data to be presented in the JSONRPC error.


Speaking of for developers, the key distinction for exn is that simply doing ? will not accept the error, you need to do .or_raise() to add context about this layer. This is technically optional if the error type is the same, meaning that it's less enforced for our crate here, but we should endeavor to add an error layer for every function that's not the most simple wrapper. This way we can see the logic that resulted in the current error.

You will notice that throughout the code, I make use of a make_error closure to create an error - note that for exn it is an anti-pattern for you to make the error message the particular step you're trying to do. Instead it should describe what this current layer wants to do, thus the identical error message for the layer.

The only exception for this are bottom-tier errors, where it should be the specific item that failed. We can manually kick off errors with bail!, and you can see there are a few root-level errors for things like "page not found". This is the exception to the rule.

So, if I'm in a function called rerender_page(), then the error message should be something like failed to rerender page [...more context here] and a more specific error like "page not found" would be wrapped by this higher level.


With this information, we can look at an example error response:

[1001] method 'page_move' failed, at src/api.rs:309:5
|
|-> [1009] failed to move page, at src/endpoints/page.rs:235:10
|
|-> [1009] failed to move page 'start' to '' (ID 1) in site ID 3, performed by user ID 2, at src/services/page/service.rs:359:14
|
|-> [4300] cannot create page with empty slug, at src/services/page/service.rs:1152:13
emmie@Augustus ~/git/wikijump/deepwell$ (error)                                           

With the error JSON itself being:

{
    "code": 4300,
    "message": "Page slug cannot be empty",
    "data": {
        "call_trace": "[1001] method 'page_move' failed, at src/api.rs:309:5\n|\n|-> [1009] failed to move page, at src/endpoints/page.rs:235:10\n|\n|-> [1009] failed to move page 'start' to '' (ID 1) in site ID 3, performed by user ID 2, at src/services/page/service.rs:359:14\n|\n|-> [4300] cannot create page with empty slug, at src/services/page/service.rs:1152:13",
        "code_trace": [
            1001,
            1009,
            1009,
            4300
        ],
        "extra": null
    }
}

Artifically-added error deeper in the stack:

[1001] method 'page_edit' failed, at src/api.rs:307:5
|
|-> [1009] failed to edit page, at src/endpoints/page.rs:207:10
|
|-> [1009] failed to edit page 'start' (ID 1) in site ID 3, performed by user ID 2, at src/services/page/service.rs:260:18
|
|-> [1010] failed to create new page revision on page ID 1 in category ID 1 on site ID 3 by user ID 2, at src/services/page_revision/service.rs:196:22
|
|-> [1300] failed to create new text entry, at src/services/text.rs:176:18
|
|-> invalid digit found in string, at src/services/text.rs:176:18

This is pretty long, but most of the changes involve adding make_error or .or_raise(...) to places where we run operations returning a Result, so it should be skimmable.

For future development, feel free to add additional context to error messages if you feel that it's necessary. For instance, if you cannot tell what the issue is from a glance at the trace, then there might be a parameter we should be logging, even if it means needing to clone some data. Ease of debugability will make long-term maintenance on the project easier.

@emmiegit
Copy link
Member Author

emmiegit commented Feb 8, 2026

Thanks!

@emmiegit emmiegit merged commit 1301532 into develop Feb 8, 2026
9 checks passed
@emmiegit emmiegit deleted the error branch February 8, 2026 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants