Skip to content

Ensure committer queue uniqueness to avoid queue collisions #9

@essiembre

Description

@essiembre

Most Committers extend AbstractFileQueueCommitter. When multiple committers are used by multiple processes sharing the same working directory, the default queue directory can be the same. This results in two committers processing the same files. That's not ideal.

We should find a way to enforce uniqueness of committer queues, while having them predictable (so the same committer instance always point to the same location).

A real case for this issue is best described in Norconex/crawlers#67.

When used with Norconex Collectors, implicitly passing the collector ID and crawler ID (which is already a unique combo) and using that to create a unique directory would do it, but Committers are not tied to Collectors right now, so we can't assume we'll always have these.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions