Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make SimpleRetriever thread-safe so that different partitions can share the same SimpleRetriever #185

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brianjlai
Copy link
Contributor

What

Right now, because the SimpleRetriever has an internally managed and modified state, we cannot have multiple partitions running at the same time share the same SimpleRetriever. Otherwise we might run into data loss because the state of the SimpleRetriever (things like page number, etc) are modified by different partitions. We've seen this problem arise in some connections (link issue) and we have temporarily solved this by having every Partition instantiate it's own SimpleRetriever.

This however is very inefficient for a couple reasons like needing to perform auth for every partition (link issue). And this is a hard blocker on AsyncRetriever which must be shared across partitions in order to manage the shared job repository.

This PR replaces an internal state for SimpleRetriever and all of its dependencies which are manged via a token field and makes all methods stateless by relying on passing parameterized values for the next_page_token instead.

How

Refactor retrievers, paginators, and pagination strategy to be stateless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant