Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planned batch support? #87

Open
jpcamara opened this issue Dec 23, 2023 · 12 comments
Open

Planned batch support? #87

jpcamara opened this issue Dec 23, 2023 · 12 comments

Comments

@jpcamara
Copy link
Contributor

First off, congrats on the initial release! I'm really excited to start giving this a try - it includes a very robust feature set out of the gate which is amazing!

The docs mention features coming very soon:

Proper support for perform_all_later, improvements to logging and instrumentation, a better CLI tool, a way to run within an existing process in "async" mode, unique jobs and recurring, cron-like tasks are coming very soon.

Have you had any internal discussions or thoughts on batch support? By batch support I mean like GoodJob batches (https://github.com/bensheldon/good_job?tab=readme-ov-file#batches) and Sidekiq Batches (https://github.com/sidekiq/sidekiq/wiki/Batches). They're a huge benefit for job coordination.

I'd be very interested in contributing something like this, if there were no plans for it!

@rosa
Copy link
Member

rosa commented Dec 26, 2023

Thanks for your kind words, @jpcamara! I really appreciate it 😊

We have batch support in our list of possible features to add, but it's not in the immediate plans because it's a bit at odds with the simplicity we're aiming for with Solid Queue. If you'd like to try implementing it, you're, of course, welcome to do so! 😊 I'm not quite sure yet how it could look like, in a way that maintains the overall simplicity of the gem, so I'm all open to ideas!

Thanks again!

@mbajur
Copy link

mbajur commented Dec 30, 2023

Would be amazing to have this feature implemented in solid queue. Keeping my fingers crossed for it 🤞

@McRip
Copy link

McRip commented Jan 2, 2024

This is one of the features which makes us rely on sidekiq currently. I would also like to see this being implemented in solidqueue.

@matteeyah
Copy link

@jpcamara You could take a look at https://github.com/cdale77/active_job_status. It's fairly simple, if it doesn't work with SolidQueue, it shouldn't be too hard to update it to work with SolidQueue as well.

@matteeyah
Copy link

I've been working on inkstak/activejob-status#32 and I just noticed 5ad8727 landed in main.

It could be a relatively simple way to approach batches - a batch could just be a collection of a job statuses - with different combinations of statuses adding up to different final batch statuses.

Here's my idea - The batch status can be queued, failed, completed or working.

  1. The batch is considered queued if all of the jobs are queued
  2. The batch is considered failed if one of the jobs is failed
  3. The batch is considered completed if all of the jobs are completed
  4. The batch is considered working in all other circumstances

On one hand, I dislike the solution we're currently using (with https://github.com/inkstak/activejob-status), because it relies on the cache for functionality - without the job status being stored in the cache, the application would simply break.


On the other hand, SolidQueue::Job#status is implementation specific, and like @rosa said, it's

This is prone to race conditions and stale reads, and it's just intended
for use in Mission Control or other reporting tools, not for regular
operation.

Even though using the collection of job statuses is a simple and effective approach, I'm still not sure what would be the ideal way forward here.

@jpcamara jpcamara mentioned this issue Feb 2, 2024
8 tasks
@jpcamara
Copy link
Contributor Author

jpcamara commented Feb 2, 2024

Hi @rosa! I've opened a draft PR to introduce how I think batches could be implemented in SolidQueue: #142

Looking forward to feedback on whether it's the right direction or not, when you get some time!

@kanejamison
Copy link

Following this issue as well as #142, and just wanted to throw another vote on here that batches would be awesome to support natively.

In case it's helpful to know use cases:

We currently do a lot of heavy orchestration using Doximity's Simplekiq gem, which piggybacks Sidekiq Pro's batches. It has been awesome for handling race conditions and other orchestration headaches that arise from doing ~100+ API calls and post processing when users run detailed reports in our system.

Example orchestration might be "fetch 20 API calls, now do an aggregate analysis across that batch but only once all items in the batch are completed". Repeat that across 5-10 different data types that go into a report.

Before batches we had a pattern where all 20 API calls would report up to the parent who would check the ~batch for completion and it was a giant mess of race conditions when the last couple API calls would repeat.

@jiri1337
Copy link

jiri1337 commented Jan 6, 2025

Very sad about not supporting batches. Batch support should be the de-facto standard in 2025. I cannot switch from sidekiq/good_job until batches are added.

@vicentereig
Copy link

vicentereig commented Jan 7, 2025

@jpcamara @rosa Hey hi! I'd like to test this in production safely. I'm not sure how to proceed. I'd appreciate some feedback on which approach would be best. Here's what I've got in mind:

  • JP I could point my Gemfile to your branch, but I'm not sure if you are in the business of keeping your fork up to date! 😊
  • It'd be worth extracting this PR temporarily to a separate gem. That way we could test it out before committing to incorporating as a core feature!

Not using the Royal We™ here: I am writing some workflows that would benefit from this feature as well and I would love to help moving this forward.

@rosa @jpcamara Let me know what you think. Happy to grab a coffee as well and 🌮 about it!

@rosa
Copy link
Member

rosa commented Jan 7, 2025

Ohhh, @vicentereig, that'd be awesome! Thank you so much! 🙏

Either of those alternatives sounds good to me. I'm also happy to keep the fork up-to-date, assuming I can do that. I think GitHub allows pushing to the fork's branch in PRs to a repo of which you're a maintainer under some circumstances 🤔

@jpcamara
Copy link
Contributor Author

jpcamara commented Jan 8, 2025

Hey @vicentereig! Thanks for the response here. A few people have voiced interest/desire to see this feature land which has been a good kickstart for me to dust it off.

I just rebased and pushed based on the latest solid_queue (as of earlier today). I've also started testing it on some of my own code to see what might be missing, if anything, for my own uses as well.

As an interface, I think it's getting into pretty good shape. There are a few things on my mind about the overall implementation though:

  • After watching @rosa's RailsWorld talk (recommended btw), I'm not sure if the internals match the spirit of SolidQueue. By that, I mean I was wondering whether it should better match the "execution" tables style everything else uses. Basically every feature creates a new executions table - there probably should still be a JobBatch table but possible a "batch_execution" model to lighten load on the jobs table (would love your thoughts on this @rosa)
  • The dispatcher logic for checking jobs in a batch feels expensive. I haven't done much in the way of optimization and it needs to iterate jobs heavily. I think it will hit scaling limits if people create huge batches of jobs
  • Sidekiq comes with a "status" concept where you can get counts of things like failures, total jobs, pending jobs. Primarily I've used that in the past to check things like whether there are failures in a completion job (on_finish in our case). It'd be nice to track that as integers which can be efficiently updated in the record, so you don't have to query all the jobs to find that info
  • I hand the batch as the first argument of any "completion" job (success, failure, finish). GoodJob uses a wrapper class in this case but I hand in the direct batch model. This may be ok but I'm not sure if people could become dependent on the implementation vs the having a more specific interface

That said, it is functioning pretty well for me so far in my testing.

Would you be interested in collaborating a bit on these - or discussing my points above? I am open to the gem idea to be able to more easily collaborate, or give access to my fork as well. Some of the smaller changes I've made do touch a couple solid queue internals (like not immediately deleting a job if it's part of a batch), so a gem may be harder to manage for that

Also would love any feedback on my points above you'd have time to consider @rosa!

@vicentereig
Copy link

vicentereig commented Jan 8, 2025

Good stuff! Thank you both for the swift response. Taking also some homework with me to watch the talks you both gave recently.

Would you be interested in collaborating a bit on these - or discussing my points above? I am open to the gem idea to be able to more easily collaborate, or give access to my fork as well.

Yes to all. Looks like a —no pun intended— solid start. I need to get myself acquainted with SolidQueue's internals to put together a meaningful contribution. We can start small, collaborating in your repo, and then decide whether it'd be worth moving it to a separate gem.

What I am going to do next is migrate one of my simplest workflows to the batched implementation and let it run with some guardrails!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants