Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(project-cache): Split project cache into shared and private parts #4147

Merged
merged 16 commits into from
Oct 30, 2024

Conversation

Dav1dde
Copy link
Member

@Dav1dde Dav1dde commented Oct 16, 2024

Makes the project cache concurrently accessible using a lockfree shared map.

  • Removes support for no_cache queries, they were not used for a long time and simplify the design (no need to make project accesses awaitable).
  • Removes support for the v2 project cache endpoint, v3 has been out for > 2 years.

The PR also removed all project cache handling from the old (now called legacy) project cache and moved it to a newer, simplified project cache which only does the book-keeping for the actual project cache. The old Spool V1 implementation and some message handling remains in the legacy project cache. The spool v1 will eventually be removed, the other messages can now be replaced and no longer need to be messages and can be removed in follow up PRs. Some messages likes ProcessMetrics have already been removed.

The garbage disposal has been removed, with the new organization we generate a lot less overall 'trash' which needs to be disposed as well as it is no longer necessary to offload the disposal to another thread, since the service itself has a lot less volume/work to do and service latency is no longer a major concern for stability.

It is now possible to subscribe to project cache events, this is done using a tokio broadcast channel, the channel is inherently size limited, which is problematic, since currently subscribers cannot deal with lag and cannot miss any messages. To prevent this (until we have a better way), the channel is set to a maximum size which should theoretically be impossible to reach. This is more described in code itself.

Functional changes:

  • Projects are only expired through the eviction mechanism, they are no longer expired on access, it's possible to configure Relay in a way where expired projects are used if the eviction interval is too long. To compensate for this, the config has been tuned. Ideally the eviction interval is shorter than the grace period.

@Dav1dde Dav1dde changed the base branch from master to dav1d/remove-metric-meta October 18, 2024 09:57
@Dav1dde Dav1dde self-assigned this Oct 18, 2024
@Dav1dde Dav1dde force-pushed the dav1d/sypc branch 5 times, most recently from f52acfc to 7cba95b Compare October 24, 2024 08:31
@Dav1dde Dav1dde marked this pull request as ready for review October 24, 2024 11:37
@Dav1dde Dav1dde requested a review from a team as a code owner October 24, 2024 11:37
Copy link
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't entirely wrapped my mind around this yet. Still, I have some questions, comments, and nitpicks.

relay-dynamic-config/src/error_boundary.rs Show resolved Hide resolved
relay-quotas/src/rate_limit.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/state.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/state.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/state.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/project/info.rs Outdated Show resolved Hide resolved
relay-server/src/services/buffer/mod.rs Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/service.rs Outdated Show resolved Hide resolved
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just random comments for now. I love the concept of subscribable project events.

Cargo.toml Show resolved Hide resolved
relay-quotas/src/rate_limit.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/handle.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/project.rs Outdated Show resolved Hide resolved
/// - `message`: The type of message that was processed.
ProjectCacheMessageDuration,
/// Timing in milliseconds for processing a message in the buffer service.
/// - `task`: The type of the task the project cache does.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this now task?

Copy link
Member Author

@Dav1dde Dav1dde Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is called ProjectCacheTaskDuration, it's a task, because the timer measures more than just the message, it also measures eviction (the entire tokio::select).

Also note this is a metric for the new service, the old one has new metrics.

relay-server/src/services/projects/cache/service.rs Outdated Show resolved Hide resolved
relay-server/src/services/projects/cache/service.rs Outdated Show resolved Hide resolved
.send(ProjectEvent::Evicted(project_key));
};

self.store.evict_stale_projects(&self.config, on_evict);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in the PR description that there is less garbage in this version, why is that? Isn't the amount of project configs that need deconstruction still the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does a slightly better handling of revisions which re-creates less projects. Where we before did always send back a project from the source, the source itself can now signal that there was no change. This is relevant for the sources where we peek into the deserialized contents and compare revisions (Redis).


let _ = self
.project_events_tx
.send(ProjectEvent::Ready(project_key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail if the queue is full? If so, should we log an error just so we know we hit a "should not happen" situation?

Copy link
Member Author

@Dav1dde Dav1dde Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also fails when there are no receivers, which may happen during shutdown or in testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we differentiate these two cases? It would be good to notice somehow if the queue is full.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's deploy this very gradually, as discussed.

Base automatically changed from dav1d/remove-metric-meta to master October 30, 2024 14:18
@Dav1dde Dav1dde merged commit 857c2b1 into master Oct 30, 2024
23 checks passed
@Dav1dde Dav1dde deleted the dav1d/sypc branch October 30, 2024 16:11
Dav1dde added a commit that referenced this pull request Oct 31, 2024
Dav1dde added a commit that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants