-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snowpipe: exactly once semantics #3060
Open
rockwotj
wants to merge
29
commits into
main
Choose a base branch
from
snow-once
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rockwotj
force-pushed
the
snow-once
branch
4 times, most recently
from
December 9, 2024 21:38
e5423ce
to
10a350f
Compare
rockwotj
force-pushed
the
snow-once
branch
4 times, most recently
from
December 17, 2024 05:01
c189654
to
f4c4952
Compare
this is what is required for exactly once. We don't yet use it.
NOTE that since we can't ensure certain messages can go to a specific channel at the moment, this only really works with max_in_flight=1, which is probably fine for postgres, but another commit will support channel_name properly, so one can specify explicitly the mapping from data to channel.
This will help to re-use all this logic when we create the new output that specifies channel names explicitly.
To a seperate function so it can be used between different outputs.
To clarify it, instead of spreading it out all over, this also means the schema migration function can now be a free function
One that is responsible for coordination of schema evolution and other small pieces (like custom mappings). The purpose of this is to allow for another kind of inner output that can allow for a user to specifically set the channel name (instead of using a pool).
I'm not sure if this is 100% correct, but it will work for most cases.
See the examples on what this enables with a Redpanda/Kafka input (but not kafka_franz!).
This seems a bit clearer and has nice duality with the indexed pool
By holding a lock when doing this during WriteBatch, and not having the framework call Connect outside of pipeline creation, just handle it internally.
I think this is what was missing...
We should try not to always run a SQL query everytime we startup for cost reasons. Instead of running a query (which is likely flaky because of identifier normalization anyways), just open the channel lazily and catch the specific error for the table not existing, then create the table and retry.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support 2 new properties in
snowflake_streaming
:offset_token
: A new property to support exactly once delivery: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#offset-tokenschannel_name
: The ability to explicitly assign a batch to a channel. The currentchannel_prefix
option doesn't support explicitly picking a channel, this allows exactly once from Kafka.