-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automations
crate and various chores
#1696
Conversation
Which disablses ANSI color logging output.
These are frequently so large, they overflow our current 1MB ops log line limit.
Deeply nested schemas are not useful and, in practice, are always recursive data-structures. Fixes #1667
Minor adjustments to account for drift between migrations and the production DB.
Introduce a new crate `automations` which offers a programming model for distributed execution of arbitrary, stateful tasks modeled as coroutines. A supporting migration introduces table `internal.tasks` which tracks the states of tasks, including task inner states and pending recieved messages. Implement a Fibonacci executor which is a load test and test-bed for automations behaviors, with various tune-able parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about the transaction in persist_outcome
got me thinking. Supposing we don't use a transaction for sending messages or spawning tasks, would there still be any advantage to doing so by returning the task/message as part of the PollOutcome
? We could instead pass in a helper object that tasks could call methods on to spawn children or send messages. I don't want to get too far ahead of myself, though, so hoping we can first talk though error handling
create role marketplace_integration; | ||
create role gatsby_reader; | ||
create role github_action_connector_refresh; | ||
create role wgd_automation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need wgd_automation
for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it's granted some usages in the fetched / compacted schema, so it needs to exist.
|
||
async fn persist_outcome( | ||
outcome: PollOutcome<BoxedRaw>, | ||
txn: &mut sqlx::PgConnection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started out thinking: txn
is a bit misleading. db
maybe better?
But then I got to wondering whether we should be using a transaction here. I'm trying to think through whether there's any scenarios that we wouldn't recover gracefully from, if we were to fail midway through this function. It seems like we might create a child task and then fail before updating the task state. So seems like we'd have at-least-once semantics on spawning child tasks, and also on sending messages to other tasks. That seems fine (I think?) for sending messages, but I wonder whether we'd want a stronger guarantee for creating child tasks.
It seems like maybe it could be good to just use a transaction here, or else try to cram all the updates into one giant query with CTEs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a transaction -- it's just been downcast to the PgConnection. I did that because in newer versions of sqlx, transaction is not an executor and can't be used with sqlx::query -- though admittedly that downcast coulda been done just as easily in this routine 🤷. I was also setting up for a test case I didn't write, where persist_outcome could be driven, inspected, and rolled back.
We definitely want full transaction semantics around task management!
(ready.task.id, ready.task.type_, ready.task.parent_id); | ||
|
||
if let Err(err) = executors::poll_task(ready, heartbeat_timeout).await { | ||
tracing::warn!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we might want to make these errors more visible, perhaps by persisting them on the tasks table? I'm supposing that the idea is we should do error handling within the poll
function, and this is really only for errors that we didn't otherwise catch or anticipate, which seems reasonable. Just want to check whether we're on the same page there, that the state
ought to expose error message if the task is doing its own error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly -- the place I got to was tasks are free to handled expected errors in any way they like, including sending them as messages or modeling them in task state, but they're not an "error" from this crate's perspective.
As discussed over VC, persisting outcomes is transactional. Anything else to cover before ✔️ ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I tried playing around with how controllers might be implemented as an Executor
, and immediately ran into the issue of needing some way to associate a particular catalog_name
with a tasks
row. I'm supposing that we'll need to add a column to live_specs
to relate each row to a tasks
row that corresponds to the controller. We'll need that in order to trigger controller runs after specs are published, or to trigger tasks in response to inferred schema updates, shard failures, etc.
The migration from controller_jobs
to tasks
is going to be a bit tricky also. I'm currently thinking that the first order of business for each of the new controller tasks
could be to migrate its state from controller_jobs
into the tasks
table.
I don't see any blockers to either of those, so I think we're good to merge here.
Yea, i think that's the right pattern. The entity table for a catalog name contains related task IDs.
Agreed, I think that's the right pattern. But it also raises a gap with this implementation which I think needs to be addressed: an Executor must be able to return "side-effects" which commit transactionally with a PollOutcome, such as cleaning up a I'm gonna merge but think this needs to be a next step. |
Description:
Chores:
Then, add an
automations
crate for durable, distributed execution of tasks modeled as coroutines.#1667
#1697
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
(anything that might help someone review this PR)
This change is