Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Infrequent Polling sample #150

Closed
wants to merge 4 commits into from
Closed

Conversation

drewhoskins-temporal
Copy link
Contributor

What was changed

Fix the polling/infrequent sample. It wasn't correctly calculating attempts and was therefore running forever.

Why?

Checklist

Test plan:

Verify that it works:
poetry run pytest tests/polling/infrequent/workflow_test.py --workflow-environment time-skipping

Verify that it skips in non-time-skipping mode, to avoid slow test:

poetry run pytest tests/polling/infrequent/workflow_test.py

  1. Any docs updates needed?

polling/test_service.py Outdated Show resolved Hide resolved
)
self.try_attempts += 1
if self.try_attempts % self.error_attempts == 0:
async def get_service_result(self, input, attempt: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def get_service_result(self, input, attempt: int):
async def get_service_result(self, input: ComposeGreetingInput, attempt: int):

Copy link
Contributor Author

@drewhoskins-temporal drewhoskins-temporal Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went a couple turns down this rabbit hole, but it turns out to require a shockingly large refactoring, which is not worth doing.

polling/test_service.py Outdated Show resolved Hide resolved
@drewhoskins-temporal drewhoskins-temporal force-pushed the drewhoskins_polling branch 2 times, most recently from 9441578 to b054fd8 Compare November 21, 2024 02:29
while True:
try:
try:
result = await test_service.get_service_result(input)
result = await test_service.get_service_result(input, attempt)
Copy link
Contributor

@dandavison dandavison Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the way the sample previously did not pass the attempt number to the test service is desirable to maintain, since in real world examples the service in question won't want to know about our attempt counting.

In other words, the way it is now will make some readers think that the solution to the polling-from-activity problem involves passing an attempt count to a service.

Is there a way to fix the algorithm without explicitly passing the attempt number to the service?

Copy link
Contributor Author

@drewhoskins-temporal drewhoskins-temporal Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and like the meta-point of wanting samples to look real-world, though this one feels shippable anyway. I could use a global, but it wouldn't work out-of-process, and anyway I'm not inspired to spend the time to change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I do think it's worth finding a way to make your test pass without making the sample misleading. Here's a global variable version: #152 (I'm starting to see why the code had that modulo-arithmetic return condition!)

Copy link
Contributor Author

@drewhoskins-temporal drewhoskins-temporal Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly, this doesn't work predictably when it's run multiple times on the same worker. I really don't think anybody's going to be misled by it as it is, and I would greatly prefer predictability and test isolation as a user.
That said, I wouldn't block your approach if you feel strongly enough.
Can you either accept my PR and then add on top of it, or just leave it as I have it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub was having problems with this PR after I brought it up-to-date with main and it's closed currently. Are you able to reopen it?

The code in this sample falls into two categories:

  1. Code that we are providing to users, essentially saying "follow this closely; this is what your code should look like". Note that users may well copy and paste code in this category. In particular: workflows.py, activities.py

  2. Private implementation detail of the sample, which exists only to make the sample work. Users will not copy and paste this code because, in their use cases, its role will be played by something specific to their domain. In particular: test_service.py

For (2) the basic idea of the sample is to create a toy service that is stateful, only responding after a few attempts. But that statefulness needs to be private to the test_service implementation, since it's modeling real-world slowness-to-come-up. The one thing we don't want to do is give users the impression that they need to count their attempts in their activity code and send them to a service. The sample in its current form takes care not to do that, and we don't need to make the same worse in that respect.

There are a few options I think:

  1. We could move the counter in the test_service to the module level and use modulo arithmetic to address the issue that it might be used by multiple workflows.
  2. We could move the line test_service = TestService() in activities.py to the top level and use modulo arithmetic to address the issue that it might be used by multiple workflows.
  3. We could use a dict keyed by activity.info().workflow_id for the counter in test_service

All those would teach the lesson we're trying to teach correctly. (1) and (2) are suboptimal in that they'd result in some potentially confusing shared state if someone were to run multiple instances of the sample concurrently. So (3) is perhaps the best choice, seeing as it's hardly more complex than (1) and (2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented option (3) in #152, which builds upon this PR (i.e. using the test you added).

@dandavison dandavison mentioned this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants