Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Best-effort responses #268

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Changes from 31 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
138ccc5
Introduce timeouts for function call
oggy-dfin Jan 29, 2024
5d7edb2
Clarify that only 1 response is delivered + fix wording around multip…
oggy-dfin Jan 29, 2024
d64a3f6
Weaken the timeout guarantees
oggy-dfin Jan 29, 2024
524929a
Fix the error code
oggy-dfin Jan 29, 2024
128c8eb
Fix typo
oggy-dfin Jan 29, 2024
78b1b0e
Reword the early timeout condition
oggy-dfin Jan 30, 2024
ff8d96c
Rename the API call, change the error codes, and allow the callee to …
oggy-dfin Jan 31, 2024
132de81
Fix typos
oggy-dfin Jan 31, 2024
e8d3e64
Apply suggestions from code review
oggy-dfin Jan 31, 2024
bf2132b
Apply suggestions from code review
oggy-dfin Feb 2, 2024
d8a82df
Clarify rejection time some more
oggy-dfin Feb 2, 2024
e7b29a8
Clarify the reject code.
oggy-dfin Feb 2, 2024
395ac97
Fix the description of call_with_best_effort_response
oggy-dfin Feb 2, 2024
046eda8
Try and improve the wording some more
oggy-dfin Feb 5, 2024
ae2a7d9
More word shuffling
oggy-dfin Feb 9, 2024
ed422ff
A first attempt at an abstract spec of deadlines
oggy-dfin Feb 9, 2024
26f0947
A few fixes for deadlines
oggy-dfin Feb 12, 2024
d424166
Remove the TODO/DONE markers for deadlines
oggy-dfin Feb 12, 2024
7531920
Retain the message ordering in the spontaneous reject transition
oggy-dfin Feb 12, 2024
4ac97b8
Also allow dropping expired requests
oggy-dfin Feb 12, 2024
feed505
Move the deadline field to FromCanister origin
oggy-dfin Mar 14, 2024
dde3ccb
Bind the reject_msg var in the text of the message expiry transition
oggy-dfin Mar 14, 2024
5ce4c7b
Accept best-effort-response related system API calls in more contexts
oggy-dfin Mar 14, 2024
d9d6a46
Describe effects of new System API calls more formally
oggy-dfin Mar 14, 2024
b83f5b3
A couple of small fixes
oggy-dfin Mar 14, 2024
03c75c0
Fix call context timeout condition
oggy-dfin Mar 26, 2024
6b895de
Update spec/index.md
oggy-dfin Mar 26, 2024
00c5d64
Update spec/index.md
oggy-dfin Mar 26, 2024
cf649cd
Update spec/index.md
oggy-dfin Mar 26, 2024
7340b95
Update spec/index.md
oggy-dfin Mar 26, 2024
4068ae3
Apply suggestions from code review
oggy-dfin Apr 2, 2024
fb131ac
Keep the timestamp when expiring messages
oggy-dfin Apr 2, 2024
dad83e4
Rename reject_msg to <implementation-specific>
oggy-dfin Apr 2, 2024
e6c21ae
Weaken the invariants for expired origins
oggy-dfin Apr 3, 2024
702281e
Remove spontaneous request rejection from the PR
oggy-dfin Apr 3, 2024
48e5876
Update spec/index.md
oggy-dfin Oct 3, 2024
53f30df
Update spec/index.md
oggy-dfin Oct 3, 2024
68fe8ef
Fix the update_methods et al and add a note on deadlines in queries
oggy-dfin Oct 7, 2024
cea138b
Update spec/index.md
oggy-dfin Oct 8, 2024
1a61d1d
Fix the types
oggy-dfin Oct 8, 2024
4005aa8
Merge branch 'master' into response_timeouts
oggy-dfin Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 144 additions & 11 deletions spec/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ This specification may refer to certain constants and limits without specifying

Maximum wall clock time spent on evaluation of a query call.

- `MAX_CALL_TIMEOUT`

The maximum timeout (in seconds) for an inter-canister call.
dsarlis marked this conversation as resolved.
Show resolved Hide resolved

### Principals {#principal}

Principals are generic identifiers for canisters, users and possibly other concepts in the future. As far as most uses of the IC are concerned they are *opaque* binary blobs with a length between 0 and 29 bytes, and there is intentionally no mechanism to tell canister ids and user ids apart.
Expand Down Expand Up @@ -1008,6 +1012,8 @@ Rejection codes are member of the following enumeration:

- `CANISTER_ERROR` (5): Canister error (e.g., trap, no response)

- `SYS_UNKNOWN` (6): Response unknown; system stopped waiting for it (e.g., timed out, or system under high load).

The symbolic names of this enumeration are used throughout this specification, but on all interfaces (HTTPS API, System API), they are represented as positive numbers as given in the list above.

The error message is guaranteed to be a string, i.e. not arbitrary binary data.
Expand Down Expand Up @@ -1284,6 +1290,7 @@ The following sections describe various System API functions, also referred to a
ic0.msg_reject_code : () -> i32; // Ry Rt CRy CRt
ic0.msg_reject_msg_size : () -> i32; // Rt CRt
ic0.msg_reject_msg_copy : (dst : i32, offset : i32, size : i32) -> (); // Rt CRt
ic0.msg_deadline : () -> i64; // U Q CQ Ry Rt CRy CRt

ic0.msg_reply_data_append : (src : i32, size : i32) -> (); // U Q CQ Ry Rt CRy CRt
ic0.msg_reply : () -> (); // U Q CQ Ry Rt CRy CRt
Expand Down Expand Up @@ -1322,6 +1329,7 @@ The following sections describe various System API functions, also referred to a
) -> ();
ic0.call_on_cleanup : (fun : i32, env : i32) -> (); // U CQ Ry Rt CRy CRt T
ic0.call_data_append : (src : i32, size : i32) -> (); // U CQ Ry Rt CRy CRt T
ic0.call_with_best_effort_response : (timeout_seconds : i32) -> (); // U CQ Ry Rt CRy CRt T
ic0.call_cycles_add : (amount : i64) -> (); // U Ry Rt T
ic0.call_cycles_add128 : (amount_high : i64, amount_low: i64) -> (); // U Ry Rt T
ic0.call_perform : () -> ( err_code : i32 ); // U CQ Ry Rt CRy CRt T
Expand Down Expand Up @@ -1424,6 +1432,12 @@ The canister can access an argument. For `canister_init`, `canister_post_upgrade

The reject message. Traps if there is no reject message (i.e. if `reject_code` is `0`).

- `ic0.msg_deadline : () -> i64`

The deadline, in nanoseconds since 1970-01-01, after which the caller might stop waiting for a response.

For calls with best-effort responses, the deadline is computed based on the time the call was made, and the `timeout_seconds` parameter provided by the caller. For other calls, a deadline of 0 will be returned.
Dfinity-Bjoern marked this conversation as resolved.
Show resolved Hide resolved

### Responding {#responding}

Eventually, the canister will want to respond to the original call, either by replying (indicating success) or rejecting (signalling an error):
Expand Down Expand Up @@ -1550,6 +1564,16 @@ If this traps (e.g. runs out of cycles), the state changes from the `cleanup` fu

There must be at most one call to `ic0.call_on_cleanup` between `ic0.call_new` and `ic0.call_perform`.

- `ic0.call_with_best_effort_response : (timeout_seconds : i32) -> ()`

Relaxes the response delivery guarantee to be best effort, asking the system to respond at the latest after `timeout_seconds` have elapsed. Best effort means the system may also respond with a `SYS_UNKNOWN` reject code, signifying that the call **may or may not** have been processed by the callee. Then, even if the callee produces a response, it will not be delivered to the caller.

Any value for `timeout_seconds` is permitted, but is silently bounded from above by the `MAX_CALL_TIMEOUT` system constant; i.e., larger timeouts are treated as equivalent to `MAX_CALL_TIMEOUT` and do not cause an error. The implementation may add a specific [error code](#error-codes) to a reject message to indicate the cause, in particular whether the timeout expired. Note that the reject callback may be executed (possibly significantly) later than the specified time (e.g., if the caller is under high load), or before timeout expiration (e.g., if the system is under load).
mraszyk marked this conversation as resolved.
Show resolved Hide resolved

A caller that receives a `SYS_UNKNOWN` code, yet needs to learn the call outcome, must find an out-of-band way of doing so. For example, if the callee provides idempotent function calls, the caller can simply retry the call. Sample causes of `SYS_UNKNOWN` include the call not being delivered in time, call processing not completing in time, reply delivery taking too long, and the system shedding load.

This method can be called only in between `ic0.call_new` and `ic0.call_perform`, and at most once at that. Otherwise, it traps. A different timeout can be specified for each call.

- `ic0.call_data_append : (src : i32, size : i32) -> ()`

Appends the specified bytes to the argument of the call. Initially, the argument is empty. Traps if the total appended data exceeds the [maximum inter-canister call payload](https://internetcomputer.org/docs/current/developer-docs/backend/resource-limits#resource-constraints-and-limits).
Expand Down Expand Up @@ -2706,6 +2730,7 @@ The [WebAssembly System API](#system-api) is relatively low-level, and some of i
arg: Blob;
transferred_cycles: Nat;
callback: Callback;
timeout_seconds : NoTimeout | Nat;
}

UpdateFunc = WasmState -> Trap { cycles_used : Nat; } | Return {
Expand Down Expand Up @@ -2804,6 +2829,7 @@ To ensure that only one response is generated, and also to detect when no respon
| FromCanister {
calling_context : CallId;
callback: Callback;
deadline : NoDeadline | Timestamp | Expired
}
| FromSystemTask
CallCtxt = {
Expand Down Expand Up @@ -3608,6 +3634,10 @@ then
origin = FromCanister {
call_context = M.call_context;
callback = call.callback;
deadline = if call.timeout_seconds ≠ NoTimeout
then S.time[M.receiver] + call.timeout_seconds * 10^9
else NoDeadline

};
caller = M.receiver;
callee = call.callee;
Expand All @@ -3618,7 +3648,7 @@ then
}
| call ∈ res.new_calls ] ·
[ ResponseMessage {
origin = S.call_contexts[M.call_context].origin
origin = S.call_contexts[M.call_context].origin;
response = res.response;
refunded_cycles = Available - res.cycles_accepted;
}
Expand Down Expand Up @@ -3702,6 +3732,73 @@ The functions `query_as_update` and `system_task_as_update` turns a query functi

Note that by construction, a query function will either trap or return with a response; it will never send calls, and it will never change the state of the canister.

#### Spontaneous request rejection {#request-rejection}

The system can reject a request at any point in time, e.g., because it is overloaded.

Condition:
```html
S.messages = Older_messages · CallMessage CM · Younger_messages
(CM.queue = Unordered) or (∀ msg ∈ Older_messages. msg.queue ≠ CM.queue)
reject_code ∈ { SYS_FATAL, SYS_TRANSIENT, DESTINATION_INVALID }
```

State after (given some `reject_msg`):
```html
S.messages =
Older_messages ·
ResponseMessage {
origin = CM.origin;
response = Reject (reject_code, reject_msg);
oggy-dfin marked this conversation as resolved.
Show resolved Hide resolved
refunded_cycles = CM.transferred_cycles;
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
} ·
Younger_messages
```

#### Call expiry {#call-expiry}

These transitions expire calls with best-effort responses. The transition can be taken before the specified call deadline (e.g., due to high system load), and we thus ignore the caller time in these transitions. We define two variants of the transition, one that expires messages, and one that expires calls that are in progress (i.e., have open downstream call contexts).

The first transition defines the expiry of messages, where `reject_msg` is some textual message describing the rejection reason.

```html
S.messages = Older_messages · M · Younger_messages
M = CallMessage _ ∨ M = ResponseMessage _
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
M.origin = FromCanister O
O.deadline ∉ { NoDeadline, Expired }
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
```

State after
```html
S.messages = Older_messages · (M with origin = FromCanister O with deadline = Expired) · Younger_messages ·
ResponseMessage {
origin = FromCanister O with deadline = NoDeadline;
response = Reject (SYS_UNKNOWN, reject_msg);
refunded_cycles = 0;
}
```

The next transition defines the expiry of calls that are being processed by the callee.
mraszyk marked this conversation as resolved.
Show resolved Hide resolved

Condition
```html
ctxt_id ∈ S.call_contexts
S.call_contexts[ctxt_id].origin = FromCanister O
S.call_contexts[ctxt_id].needs_to_respond = true
O.deadline ∉ { NoDeadline, Expired }
```

State after

```html
S.call_contexts[ctxt_id].origin = FromCanister O with deadline = Expired
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
S.messages = S.messages · ResponseMessage {
origin = FromCanister O with deadline = NoDeadline;
response = Reject (SYS_UNKNOWN, reject_msg);
refunded_cycles = 0;
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
}
```

#### Call context starvation {#rule-starvation}

If the call context needs to respond (in particular, if the call context is not for a system task) and there is no call, downstream call context, or response that references a call context, then a reject is synthesized. The error message below is *not* indicative. In particular, if the IC has an idea about *why* this starved, it can put that in there (e.g. the initial message handler trapped with an out-of-memory access).
Expand All @@ -3711,10 +3808,10 @@ Conditions
```html

S.call_contexts[Ctxt_id].needs_to_respond = true
∀ CallMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id
∀ ResponseMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id
∀ (_ ↦ {needs_to_respond = true, origin = FromCanister O, …}) ∈ S.call_contexts: O.calling_context ≠ Ctxt_id
∀ (_ ↦ Stopping Origins) ∈ S.canister_status: ∀(FromCanister O, _) ∈ Origins. O.calling_context ≠ Ctxt_id
∀ CallMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ ResponseMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ (_ ↦ {needs_to_respond = true, origin = FromCanister O, …}) ∈ S.call_contexts: O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ (_ ↦ Stopping Origins) ∈ S.canister_status: ∀(FromCanister O, _) ∈ Origins. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
mraszyk marked this conversation as resolved.
Show resolved Hide resolved

```

Expand All @@ -3730,7 +3827,7 @@ S with
ResponseMessage {
origin = S.call_contexts[Ctxt_id].origin;
response = Reject (CANISTER_ERROR, <implementation-specific>);
refunded_cycles = S.call_contexts[Ctxt_id].available_cycles
refunded_cycles = S.call_contexts[Ctxt_id].available_cycles;
}

```
Expand All @@ -3744,10 +3841,10 @@ Conditions
```html

S.call_contexts[Ctxt_id].needs_to_respond = false
∀ CallMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id
∀ ResponseMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id
∀ (_ ↦ {needs_to_respond = true, origin = FromCanister O, …}) ∈ S.call_contexts: O.calling_context ≠ Ctxt_id
∀ (_ ↦ Stopping Origins) ∈ S.canister_status: ∀(FromCanister O, _) ∈ Origins. O.calling_context ≠ Ctxt_id
∀ CallMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ ResponseMessage {origin = FromCanister O, …} ∈ S.messages. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ (_ ↦ {needs_to_respond = true, origin = FromCanister O, …}) ∈ S.call_contexts: O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
∀ (_ ↦ Stopping Origins) ∈ S.canister_status: ∀(FromCanister O, _) ∈ Origins. O.calling_context ≠ Ctxt_id ∨ O.deadline = Expired
mraszyk marked this conversation as resolved.
Show resolved Hide resolved

```

Expand Down Expand Up @@ -5064,9 +5161,11 @@ S.messages = Older_messages · ResponseMessage RM · Younger_messages
RM.origin = FromCanister {
call_context = Ctxt_id
callback = Callback
deadline = D
oggy-dfin marked this conversation as resolved.
Show resolved Hide resolved
}
not S.call_contexts[Ctxt_id].deleted
S.call_contexts[Ctxt_id].canister ∈ dom(S.balances)
D ≠ Expired

```

Expand All @@ -5089,7 +5188,7 @@ S with

```

If the responded call context does not exist anymore, because the canister has been uninstalled since, the refunded cycles are still added to the canister balance, but no function invocation is enqueued:
If the responded call context does not exist anymore, because the canister has been uninstalled since, the refunded cycles are still added to the canister balance, but no function invocation is enqueued.

Conditions

Expand All @@ -5099,9 +5198,11 @@ S.messages = Older_messages · ResponseMessage RM · Younger_messages
RM.origin = FromCanister {
call_context = Ctxt_id
callback = Callback
deadline = D
}
S.call_contexts[Ctxt_id].deleted
S.call_contexts[Ctxt_id].canister ∈ dom(S.balances)
D ≠ Expired

```

Expand All @@ -5116,6 +5217,22 @@ S with

```

#### Dropping expired messages {#message-timeout}
mraszyk marked this conversation as resolved.
Show resolved Hide resolved

Condition:
```html
S.messages = Older_messages · M · Younger_messages
M = ResponseMessage _ ∨ M = CallMessage _
M.origin = FromCanister O
O.deadline = Expired
```

State after

```html
S.messages = Older_messages · Younger_messages
```

#### Respond to user request

When an ingress method call has been responded to, we can record the response in the list of queries.
Expand Down Expand Up @@ -5679,6 +5796,7 @@ We can model the execution of WebAssembly functions as stateful functions that h
sysenv : Env;
cycles_refunded : Nat;
method_name : NoText | Text;
deadline : NoDeadline | Timestamp;
mraszyk marked this conversation as resolved.
Show resolved Hide resolved
}
ExecutionState = {
wasm_state : WasmState;
Expand Down Expand Up @@ -6154,6 +6272,12 @@ The pseudo-code below does *not* explicitly enforce the restrictions of which im
if es.context ∉ {Rt, CRt} then Trap {cycles_used = es.cycles_used;}
copy_to_canister<es>(dst, offset, size, es.params.reject_msg)

ic0.msg_deadline<es>() : i64 =
if es.context ∉ {U, Q, CQ, Ry, Rt, CRy, CRt} then Trap {cycles_used = es.cycles_used;}
if es.params.deadline = Timestamp t
then return t
else return 0

ic0.msg_reply_data_append<es>(src : i32, size : i32) =
if es.context ∉ {U, Q, CQ, Ry, Rt, CRy, CRt} then Trap {cycles_used = es.cycles_used;}
if es.response ≠ NoResponse then Trap {cycles_used = es.cycles_used;}
Expand Down Expand Up @@ -6306,8 +6430,17 @@ The pseudo-code below does *not* explicitly enforce the restrictions of which im
on_reject = Closure { fun = reject_fun; env = reject_env }
on_cleanup = NoClosure
};
timeout_seconds = NoTimeout
}

ic0.call_with_best_effort_response<es>(timeout_seconds : i32) =
if
es.context ∉ {U, CQ, Ry, Rt, CRy, CRt, T}
or es.pending_call = NoPendingCall
or es.pending_call.timeout ≠ NoTimeout
then Trap {cycles_used = es.cycles_used;}
es.pending_call.timeout_seconds := max(timeout_seconds, MAX_CALL_TIMEOUT)
oggy-dfin marked this conversation as resolved.
Show resolved Hide resolved

ic0.call_on_cleanup<es> (fun : i32, env : i32) =
if es.context ∉ {U, CQ, Ry, Rt, CRy, CRt, T} then Trap {cycles_used = es.cycles_used;}
if fun > |es.wasm_state.store.table| then Trap {cycles_used = es.cycles_used;}
Expand Down