Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add versioning-3 support including Deployment API, versioning overrides, and enhanced workflow describe #735

Open
wants to merge 3 commits into
base: versioning-3
Choose a base branch
from

Conversation

antlai-temporal
Copy link
Contributor

@antlai-temporal antlai-temporal commented Jan 6, 2025

What was changed

This PR adds support for the new deployment API, extends workflow describe to
provide versioning information, and adds batch and non-batch operators to modify
workflow execution properties, such as, versioning overrides.

temporal worker-deployment 
    temporal worker-deployment describe (--report-reachability)
    temporal worker-deployment get-current
    temporal worker-deployment set-current
    temporal worker-deployment list
temporal workflow update-options
temporal workflow describe

Why?

Support new Worker Versioning approach based on Deployments.

Checklist

  1. Closes
    [Feature Request] versioning override and move API #720 [Feature Request] DescribeWorkflow should show versioning info #719 [Feature Request] Support the deployment API #718
  2. How was this tested:

System and unit tests added

--deployment-series-name YourDeploymentSeriesName
```
options:
- name: deployment-series-name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: deployment-series-name
- name: series-name

Do you need to prefix many of these options with deployment-?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not, I'm on the fence... It is a bit verbose, but series-name is always part of a deployment, but the previous command scopes for deployment anyway...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with chad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removing deployment-...

@@ -2498,6 +2653,95 @@ commands:
type: int
description: Maximum number of Workflow Executions to display.

- name: temporal workflow modify-options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: temporal workflow modify-options
- name: temporal workflow update-options

Like #729

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional. Otherwise it gets listed under update, and it gets very confusing because people will assume it is a global option for updates. Activities don't take updates, so they don't have that issue...

Copy link
Member

@cretz cretz Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activities may take updates in the future, but I think we should call this "update options" in both the API and the CLI. And we do for the former. We are ok saying "update" and "update options" are different I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel that it makes more sense to call this workflow update-options. We've discussed the modify vs update name choice considering the context of workflow update and activity update-options, and I think that update and update-options can be separate things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, they can be separate things, the issue is whether people will get confused with 'options of an update".
I think using UpdateOptions in the API was a mistake, but the train may have already left the station, particularly, if activities in the CLI are already using it... Any thoughts @drewhoskins-temporal ?

Copy link

@drewhoskins-temporal drewhoskins-temporal Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @carlydf's take. This should be consistent with activity update-options as well as the general developer expectation that such methods are called 'update'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, majority rules, changing to update-options...

Comment on lines 15 to 17
Name string `json:"name"`
Type string `json:"type"`
FirstPollerTime time.Time `json:"firstPollerTime"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file go fmt'd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check, it should do it before every save...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2498,6 +2653,95 @@ commands:
type: int
description: Maximum number of Workflow Executions to display.

- name: temporal workflow modify-options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, like #729, update-options should be the command and let the args determine which values are set on the UpdateWorkflowExecutionOptions call. That way, later when we have more options, users can set multiple options at once and it makes it more like the API we expose over gRPC/HTTP. Having separate commands to make the same API call in this case I think is confusing and limiting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using update as a verb should be banned in our APIs. I know CRUD, but we are making so much investment on "update" (as a noun or adjective) that it gets really confusing. For example:

temporal workflow
Available commands:
...
 terminate        Forcefully end a Workflow Execution
  trace            Workflow Execution live progress
  update           Updates (Experimental)
  update-options Change Workflow Execution Options

is update-options a global option setting for update?

Copy link
Member

@cretz cretz Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using update as a verb should be banned in our APIs.

But it's not banned, it's in use, including the API call this command is making. I don't think it's as confusing as using different terms for updating options and using different terms in the API vs CLI vs SDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About whether to split in sub-commands or not. The trade-off is to make it easier to use, with a small number of relevant arguments, vs doing atomic changes to several fields. Not clear to me that you will need to, e.g., change the memo and also the versioning override atomically, they feel really different.
BTW, the PR for update activity options ended up with 11 arguments, 10 of them optional...
I guess we could always add a bulk update later if we need it, but not convinced...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely want to be able to change things like timeouts and priority atomically when we change the pinned version, that's part of why the gRPC API was designed the way that it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to have the "override" word in the flag name, so people know what they're setting.
Like:

# Pin a workflow to a specific Deployment
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=pinned \
     --deployment-series-name-override=MY_SERIES \  # required for pinned override
     --deployment-build-id-override=MY_BLD                 # required for pinned override

or

# Let this workflow be executed by the current Deployment and automatically move to future current Deployments.
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=auto_upgrade

or

# Unset any versioning override that may be set, and let this workflow be handled per its base versioning behavior.
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=unspecified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the subcommand "versioning-override" was to simplify the arguments, but if there is a legitimate goal for atomic updates of other properties, we will need to add the prefix to the options, and get rid of the subcommand. @drewhoskins-temporal @ShahabT any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the other workflow execution options that we have thought about adding to this API are:

  • Task queue
  • Timeouts
    • WF execution timeout
    • WF run timeout
    • WF task timeout
  • Priority / fairness key
  • Workflow memo

It seems to me that if the workflow definition used by the workflow changes, it's very plausible that the priority or timeouts or task queue might need to change also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we change the task queue we probably want to pin a new code atomically. We will still have the issue of tens of optional arguments though. I'm getting rid of the sub command for now, use prefix versioning-override for the options, and we will revisit when the argument list gets unwieldy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will revisit when the argument list gets unwieldy

Thanks! Yes, same for activity update-options.

- deployment describe
- deployment list
- deployment get-current
- deployment update-current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why update-current instead of set-current as in the API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do "set-current" to be consistent with API. "update" or "modify" feels less implementation-oriented (setters/getters), but we already use "get" instead of "read"...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this, but "set" feels like it describes the goal and impact of this call more directly than "update" which is more vague.
In general it seems less confusing to go with the same names as the gRPC API, and if there's a strong reason to use a different name, then the gRPC/HTTP APIs should probably also be renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to set-current

Comment on lines 438 to 439
Describes properties of a Deployment, such as whether it is current,
some custom metadata, or a list of its task queues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always return all three of these things, the word "or" is misleading.

Suggested change
Describes properties of a Deployment, such as whether it is current,
some custom metadata, or a list of its task queues.
Describes properties of a Deployment, such as whether it is current,
the list of its task queues, custom metadata if present, and reachability
status if requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't print empty things, e.g., an empty list of task queues, but I guess the meaning is implicit. I prefer your wording though...

Comment on lines 469 to 471
Gets the current Deployment for a Deployment Series Name. Whether a
Deployment is current or not can affect which Workers will execute
Tasks of an existing or new Workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gets the current Deployment for a Deployment Series Name. Whether a
Deployment is current or not can affect which Workers will execute
Tasks of an existing or new Workflow.
Gets the current Deployment for a Deployment Series Name. If a
Deployment is current, Workers of that Deployment will receive tasks
from new Pinned and AutoUpgrade Workflows and for existing
AutoUpgrade Workflows that are running on this Deployment Series.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not picky about how this is worded, but I think this is an opportunity to be specific with users about what "current" means for task routing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in two minds about how much context to give in cli docs. Clearly, you cannot expect people to learn Deployments concepts from cli commands, but a good reminder is sometimes useful, when it is not too repetitive...

Copy link
Contributor

@carlydf carlydf Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah. Maybe since Deployment are still in flux, better to be vague here so that it's still accurate if terminology changes? Feel free to ignore my suggestion here and just resolve the comment!

Tasks of an existing or new Workflow.

```
temporal deployment update-current [options]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, set-current seems more straightforward to me and matches the API

@@ -2498,6 +2653,95 @@ commands:
type: int
description: Maximum number of Workflow Executions to display.

- name: temporal workflow modify-options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to have the "override" word in the flag name, so people know what they're setting.
Like:

# Pin a workflow to a specific Deployment
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=pinned \
     --deployment-series-name-override=MY_SERIES \  # required for pinned override
     --deployment-build-id-override=MY_BLD                 # required for pinned override

or

# Let this workflow be executed by the current Deployment and automatically move to future current Deployments.
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=auto_upgrade

or

# Unset any versioning override that may be set, and let this workflow be handled per its base versioning behavior.
temporal workflow update-options --workflow-id MY_WF [--run-id="..."] \
     --versioning-behavior-override=unspecified

@@ -394,6 +394,160 @@ commands:
description: Reason for terminating the batch job.
required: true

- name: temporal deployment
Copy link

@drewhoskins-temporal drewhoskins-temporal Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be called worker-deployment in this context where we're not in a worker. (and per feedback from candace) Deployment is too ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, renaming...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth a discussion on whether we want this as temporal worker deployment instead of temporal worker-deployment. There is discussion of more server-side worker management features in the future, so I wonder if worker should be a category of commands instead of a prefix.


```
temporal deployment describe \
--deployment-series-name YourDeploymentSeriesName \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we reiterating "deployment" when it's already at the top level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion above

```
temporal deployment describe \
--deployment-series-name YourDeploymentSeriesName \
--deployment-build-id YourDeploymentBuildId \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be for the pre-release but I'd like to sure we're moving on/aligned on the details of https://www.notion.so/temporalio/Safe-Deploys-Deployment-IDs-Proposal-15b8fc567738809386fad0523f775d84 that we want to carry forward.

description: Series Name for the current Deployment.
required: true

- name: temporal deployment list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you've been aware, but I've been talking to @ShahabT and Candace about this and am pushing against this design. I think users either want a list of series, or a list of deployments within a specific series. An API that intermingles them in a (presumably time-sorted) list doesn't serve a real use case AFAICT and is pretty weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also tempted to make the series-name attribute required, but the discussions on limits on number of Deployments, and lack of scavenger, may make it useful to enumerate them all, if anything just to count them...
Will revisit once we have a list series-name api...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there would be ways to list both series and deployments within a series. But this is what we have at the moment, hence I don't think we should block cli rn.

@antlai-temporal
Copy link
Contributor Author

antlai-temporal commented Jan 7, 2025

Adding most of the suggested changes in the last commit, thanks for the feedback!

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor discussion point

@@ -394,6 +394,162 @@ commands:
description: Reason for terminating the batch job.
required: true

- name: temporal worker-deployment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned at #735 (comment), but it may be worth a discussion on whether we want this as temporal worker deployment instead of temporal worker-deployment. There is discussion of more server-side worker management features in the future, so I wonder if worker should be a category of commands instead of a prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewhoskins-temporal you suggested worker-deployment. Are you aware of any other worker functionality coming that needs CLI support?
It may not be obvious to people that deployment is under worker, so I'm hesitant to hide it unless something is coming soon.
In any case worker-deployment commands will be experimental for a while, so we will have time to restructure if needed...

@@ -394,6 +394,162 @@ commands:
description: Reason for terminating the batch job.
required: true

- name: temporal worker-deployment
Copy link
Member

@cretz cretz Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewhoskins-temporal - I noticed in our gRPC and HTTP API we chose to just call these "deployments" without the term "worker" at all. I like the "worker" qualifying term, but the inconsistency of not doing so for our API is unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants