This repository has been archived by the owner on Dec 13, 2023. It is now read-only.
Providing a schema language for workflows and tasks #2500
NickTomlin
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
Hey Nick, this is super helpful. We would love to discuss more - if you are in the Bay Area - do you want to meet up for a coffee? cc: @v1r3n |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'd love to revisit the topic of providing types for conductor tasks and workflows which was previously explored in #1164 and attempted via JSON schema in #1959.
I wanted to start by talking at a higher level about the general use cases, draw backs, and schema languages before diving into an implementation. If there's dissent, i'd love to hear it!
Opportunities
The dynamic nature of task and workflow input and output makes it easy to accidentally pass the wrong key/type to a workflow or task.
Having a way to define a schema for inputs and outputs could help in a few areas:
Fewer Runtime errors
Because users are free to pass any type of input to at workflow or task, it's easy to run into issues where a workflow has passed the wrong
inputParameter
s to a task.This results in runtime failures that can be tricky to debug, and potentially time consuming to resolve.
One interesting thing to explore here may be a mix of "compile time" (or update time) checks with an additional run time type check to ensure valid inputs are actually being passed. This would ensure errors are caught at a top level boundary instead of triggering an exception inside a task.
Easier testing
Because there's no schema to enable auto-complete or validation during development, the only real way to test whether your workflow or task is correct is to run it.
This is a slow, error-prone feedback loop. Having a schema to validate workflow and task definitions would make this a much faster process.
Integration testing is still a great idea, but could be reserved for slower, higher-level tests where it is impactful.
Instant feedback with static analysis
Having a schema for workflows and tasks could enable features like auto-complete or static linting when developing workflows. This would be helpful for system and user defined tasks.
Auto-generated language bindings
Right now the process of translating
inputData
to a language type is manual.For
Java
I currently maintain a separatepojo
that I convert to the correct type withJackson
.This works, but it would be more convenient to define the schema once; I could see generating the pojo from the schema, or potentially generating the schema from the pojo.
Challenges
Backwards compatibility
This change should not break any existing workflow or task definitions. Ideally, we provide:
warn
first on-ramp to types.Ergonomics
Ideally whatever schema format we settle on can strike a balance between being easy for engineers to use and useful for machines.
JSON schema is a very powerful format, but it can be extremely verbose.
Preserving flexibility
Given that part of Conductor's job is being a coordinator, it's still important to preserve flexibility and allow for
Object
/any
style types (e.g. an httpbody
), but ideally this is used sparingly to indicate truly dynamic fields.Additionally, some users may not want to have typing at all. We probably want to leave this as an option rather than enforce it as a default. This is similar to how dynamic languages like Python and Ruby allow optionally augmenting with types that are not enforced at Runtime.
A few potential schema implementations
JSON schema
JSON schema was part of 1959. It's robust, well adopted, and a good fit for schematizing a JSON api.
Pros
Cons
JSON Type Definition
A newcomer, JSON type definition is an attempt to make a JSON schema language more user-friendly.
Pros
Cons
Protocol Buffers
Protobuffs are commonly used for back-end APIs but it's entirely possible to use them as a schema language (like conductor does internally).
Pros
Cons
sfixed64
) aren't particularly useful in Conductor's JSON-centric worldYour suggestion here
These are just a few that I know of. Please share additional ones and we can add them to the list!
Your thoughts?
This is just a start, I'm interested in the thoughts from the Conductor team and other users are. Please feel free to suggest any edits to this post to make it more helpful for generating discussion 😄.
Beta Was this translation helpful? Give feedback.
All reactions