-
Notifications
You must be signed in to change notification settings - Fork 523
Open
Labels
Description
SMB Transform asserts that iterable inputs to the via
function are not re-iterated (https://github.com/spotify/scio/blob/v0.14.12/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketSource.java#L322-L342). This is because for performance reasons, the iterable is materialized lazily, to avoid materializing the entire group in memory in case of very large key groups.
Can we wrap SMB JobTest input in an exactly-once iterable as well so that any re-iteration is caught in JobTest? I believe this can be done by wrapping the test input here.