Skip to content

Extend synchronization primitives to allow cross-chunk sync #11

@Drvi

Description

@Drvi

Imagine we want to do type inference on columns of CSV. We'd start with some initial guess for each column and then process the file in chunks, as we always do. Now one of the workers discovers that a column that was until now considered an Int has to be a Float64. How should we spread this information to the other chunks? Should this be a responsibility of the consume context? Should we define specific callback that would work as a barrier -- we'd stop parsing, wait for all results to enter the barrier, sync their schemas, and release them? Doing this in sync_tasks is problematic, because that only synchronizes chunks belonging to one of the two buffers.

More design work is needed on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions