Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement basic watermarking #24

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iasoon
Copy link
Member

@iasoon iasoon commented Jun 15, 2022

Re-enabled the watermarking code so that the watermark assertions pass again

@iasoon
Copy link
Member Author

iasoon commented Jun 15, 2022

R: @pabloem

'wrong input watermark for %s. Expected %s, but got %s.' % (
ray.get(runner_execution_context.watermark_manager.get_stage_node.remote(
bundle_context_manager.stage.name)),
stage_node.output_watermark() == timestamp.MAX_TIMESTAMP), (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was indeed supposed to be output_watermark instead of input_watermark, right?

@pabloem
Copy link
Collaborator

pabloem commented Jun 15, 2022

TODO @pabloem - what is the produced watermark and why do stages differentiate

@iasoon iasoon force-pushed the watermarks branch 3 times, most recently from 4f0f971 to 015df34 Compare June 16, 2022 22:17
@iasoon
Copy link
Member Author

iasoon commented Jun 16, 2022

Ran into an issue when implementing this:

transform.spec.payload = data_spec.SerializeToString()

Overwriting the payload here also changes the transform in the original pipeline definition. This new format is different from what the watermark manager __init__ expects. I fixed it for now by making sure that the watermark manager initializes before the RayBundleContextManager runs, How is this expected to work? Should we copy the transform before we overwrite the payload?

@pabloem
Copy link
Collaborator

pabloem commented Jun 21, 2022

once we start parallelizing the execution of bundles, our watermarking will need to change (we need to wait for all parallel tasks of a given stage before advancing its watermark and executing the downstream stages).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants