Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve celery performance #3

Closed
wants to merge 3 commits into from

Conversation

Ian2012
Copy link
Member

@Ian2012 Ian2012 commented Nov 21, 2024

Description

This removes the gossip and mingle features to improve Celery's performance and reduce the number of messages sent to Redis. It also increases the heartbeat interval to 60s from 2s.

References:

https://www.cloudamqp.com/blog/python-celery-and-rabbitmq.html
https://ankurdhuriya.medium.com/understanding-celery-workers-concurrency-prefetching-and-heartbeats-85707f28c506

Without this change, when building images with --cache-to-registry,
BuildKit uses a proprietary cache artifact format, which breaks
when using third-party registries such as Harbor or ECR.

By adding the image-manifest=true option, BuildKit uses an
OCI-compliant cache artifact format that should be compatible with all
registries. This option requires BuildKit 0.12 or later (check with
"docker buildx ls").

See goharbor/harbor#18941 and
moby/buildkit#2251 for background
information.

Co-authored-by: Andrés González <[email protected]>

Fixes overhangio#1118.
@regisb
Copy link

regisb commented Nov 22, 2024

Can you explain, in a nutshell, the impact of these three parameters? The pages you link to do a very poor job at explaining their impact.

Also, why heartbeat=60s instead of 2s? That's a x30 increase, which seems rather drastic. Why not start with x5 or x10? What's the impact in case of a lost worker?

@Ian2012
Copy link
Member Author

Ian2012 commented Nov 22, 2024

  • without-gossip: Based on the official celery docs, workers are subscribing to worker-related events like heartbeats, and can detect if any worker goes offline, Celery by default does nothing with this information, but this opens the door for task rerouting or restarting workers when they crash (not implemented). In the context of Open edX, there isn't any implementation that depends on this feature, and the Celery changelog only shows fixes for this flag, also this scales to n^2 increasing dramatically network traffic which isn't needed.
  • without-mingle: It syncs the revoked tasks when a worker in the same cluster starts or restarts, avoiding rerunning revoked tasks. I don't have strong arguments against this flag, however, it's highly recommended by different operators to remove it.

For the heartbeat, maybe 20s is a better default in case of network issues or Redis downtimes. However, keep in mind that this heartbeat does nothing. If you disable the heartbeat the workers still will detect if the broker goes down and start the reconnect process, so we should consider removing it completely.

@Ian2012
Copy link
Member Author

Ian2012 commented Nov 22, 2024

Let's continue the discussion on overhangio#1165

@Ian2012 Ian2012 closed this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants