Awesome project - thanks for maintaining it!
Do you have any suggestions or design patterns for processing large datasets? From what I can glean, I could shard large datasets, run a pipeline horizontally on the shards, and then write a pipeline to combine.
Anything I'm missing?
Thanks!