`automations` crate and various chores #1696

jgraettinger · 2024-10-12T12:50:49Z

Description:

Chores:

Support NO_COLOR so we can fix Cloud Run agent logs.
Truncate overly long Apply action_descriptions that we currently choke on.
Tighten schema inference to a maximum depth of 12.
Compact all migrations by refreshing from the production DB.

Then, add an automations crate for durable, distributed execution of tasks modeled as coroutines.

#1667
#1697

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

Which disablses ANSI color logging output.

These are frequently so large, they overflow our current 1MB ops log line limit.

Deeply nested schemas are not useful and, in practice, are always recursive data-structures. Fixes #1667

Minor adjustments to account for drift between migrations and the production DB.

Introduce a new crate `automations` which offers a programming model for distributed execution of arbitrary, stateful tasks modeled as coroutines. A supporting migration introduces table `internal.tasks` which tracks the states of tasks, including task inner states and pending recieved messages. Implement a Fibonacci executor which is a load test and test-bed for automations behaviors, with various tune-able parameters.

psFried

The comment about the transaction in persist_outcome got me thinking. Supposing we don't use a transaction for sending messages or spawning tasks, would there still be any advantage to doing so by returning the task/message as part of the PollOutcome? We could instead pass in a helper object that tasks could call methods on to spawn children or send messages. I don't want to get too far ahead of myself, though, so hoping we can first talk though error handling

psFried · 2024-10-14T16:39:44Z

supabase/migrations/00_polyfill.sql

+create role marketplace_integration;
+create role gatsby_reader;
+create role github_action_connector_refresh;
+create role wgd_automation;


nit: do we need wgd_automation for something?

yea, it's granted some usages in the fetched / compacted schema, so it needs to exist.

psFried · 2024-10-14T20:30:18Z

crates/automations/src/executors.rs

+
+async fn persist_outcome(
+    outcome: PollOutcome<BoxedRaw>,
+    txn: &mut sqlx::PgConnection,


I started out thinking: txn is a bit misleading. db maybe better?

But then I got to wondering whether we should be using a transaction here. I'm trying to think through whether there's any scenarios that we wouldn't recover gracefully from, if we were to fail midway through this function. It seems like we might create a child task and then fail before updating the task state. So seems like we'd have at-least-once semantics on spawning child tasks, and also on sending messages to other tasks. That seems fine (I think?) for sending messages, but I wonder whether we'd want a stronger guarantee for creating child tasks.

It seems like maybe it could be good to just use a transaction here, or else try to cram all the updates into one giant query with CTEs.

This is a transaction -- it's just been downcast to the PgConnection. I did that because in newer versions of sqlx, transaction is not an executor and can't be used with sqlx::query -- though admittedly that downcast coulda been done just as easily in this routine 🤷. I was also setting up for a test case I didn't write, where persist_outcome could be driven, inspected, and rolled back.

We definitely want full transaction semantics around task management!

psFried · 2024-10-14T21:04:45Z

crates/automations/src/server.rs

+                    (ready.task.id, ready.task.type_, ready.task.parent_id);
+
+                if let Err(err) = executors::poll_task(ready, heartbeat_timeout).await {
+                    tracing::warn!(


I wonder if we might want to make these errors more visible, perhaps by persisting them on the tasks table? I'm supposing that the idea is we should do error handling within the poll function, and this is really only for errors that we didn't otherwise catch or anticipate, which seems reasonable. Just want to check whether we're on the same page there, that the state ought to expose error message if the task is doing its own error handling.

Exactly -- the place I got to was tasks are free to handled expected errors in any way they like, including sending them as messages or modeling them in task state, but they're not an "error" from this crate's perspective.

jgraettinger · 2024-10-14T22:14:09Z

As discussed over VC, persisting outcomes is transactional. Anything else to cover before ✔️ ?

psFried

LGTM
I tried playing around with how controllers might be implemented as an Executor, and immediately ran into the issue of needing some way to associate a particular catalog_name with a tasks row. I'm supposing that we'll need to add a column to live_specs to relate each row to a tasks row that corresponds to the controller. We'll need that in order to trigger controller runs after specs are published, or to trigger tasks in response to inferred schema updates, shard failures, etc.
The migration from controller_jobs to tasks is going to be a bit tricky also. I'm currently thinking that the first order of business for each of the new controller tasks could be to migrate its state from controller_jobs into the tasks table.
I don't see any blockers to either of those, so I think we're good to merge here.

jgraettinger · 2024-10-15T15:53:14Z

I'm supposing that we'll need to add a column to live_specs to relate each row to a tasks row that corresponds to the controller.

Yea, i think that's the right pattern. The entity table for a catalog name contains related task IDs.

I'm currently thinking that the first order of business for each of the new controller tasks could be to migrate its state from controller_jobs into the tasks table.

Agreed, I think that's the right pattern.

But it also raises a gap with this implementation which I think needs to be addressed: an Executor must be able to return "side-effects" which commit transactionally with a PollOutcome, such as cleaning up a controller_jobs row, so that we're guaranteed they commit together.

I'm gonna merge but think this needs to be a next step.

jgraettinger added 5 commits October 12, 2024 07:28

agent: support a NO_COLOR=1 environment variable

89b337e

Which disablses ANSI color logging output.

runtime: truncate Apply action descriptions which are very very long

24622ad

These are frequently so large, they overflow our current 1MB ops log line limit.

doc: tighten schema inference maximum depth to 12

8dd1c93

Deeply nested schemas are not useful and, in practice, are always recursive data-structures. Fixes #1667

supabase: compact all migrations, refreshed from production

0213cd7

Minor adjustments to account for drift between migrations and the production DB.

jgraettinger added chore Internal cleanups, refactoring, or improvements change:planned This is a planned change labels Oct 12, 2024

jgraettinger requested a review from psFried October 12, 2024 12:58

psFried reviewed Oct 14, 2024

View reviewed changes

psFried approved these changes Oct 15, 2024

View reviewed changes

jgraettinger merged commit 0aa1dd6 into master Oct 15, 2024
6 checks passed

jgraettinger deleted the johnny/automations branch October 15, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`automations` crate and various chores #1696

`automations` crate and various chores #1696

jgraettinger commented Oct 12, 2024 •

edited

Loading

psFried left a comment

psFried Oct 14, 2024

jgraettinger Oct 14, 2024

psFried Oct 14, 2024

jgraettinger Oct 14, 2024 •

edited

Loading

psFried Oct 14, 2024

jgraettinger Oct 14, 2024

jgraettinger commented Oct 14, 2024

psFried left a comment

jgraettinger commented Oct 15, 2024

automations crate and various chores #1696

automations crate and various chores #1696

Conversation

jgraettinger commented Oct 12, 2024 • edited Loading

psFried left a comment

Choose a reason for hiding this comment

psFried Oct 14, 2024

Choose a reason for hiding this comment

jgraettinger Oct 14, 2024

Choose a reason for hiding this comment

psFried Oct 14, 2024

Choose a reason for hiding this comment

jgraettinger Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

psFried Oct 14, 2024

Choose a reason for hiding this comment

jgraettinger Oct 14, 2024

Choose a reason for hiding this comment

jgraettinger commented Oct 14, 2024

psFried left a comment

Choose a reason for hiding this comment

jgraettinger commented Oct 15, 2024

`automations` crate and various chores #1696

`automations` crate and various chores #1696

jgraettinger commented Oct 12, 2024 •

edited

Loading

jgraettinger Oct 14, 2024 •

edited

Loading