Retry failed jobs #21

gravis · 2014-09-03T08:10:17Z

It would be nice to have features like sidekiq provides (https://github.com/mperham/sidekiq/wiki/Error-Handling), especially retry failed jobs.
Something like:

"If you don't fix the bug within 25 retries (about 21 days), Sidekiq will stop retrying and move your job to the Dead Job Queue. You can fix the bug and retry the job manually anytime within the next 6 months using the Web UI."

cdrage · 2015-03-09T20:51:01Z

Doesn't Resque do this though?

Or do a rescue in Go?

rohit4813 · 2017-11-06T05:13:07Z

Hi,
I am experimenting with the goworker library.
I have a requirement of stopping and starting jobs.

Is it possible with the current version? Can anyone tell any workaround for it?

mingan · 2017-11-06T07:01:22Z

@rohit4813 The current implementation listens for few signals and if it receives them, it stops enqueuing new jobs but lets the running jobs finish.

I'm not sure I understand exactly what you're trying to do, but here's our use: We have a scenario where each job is potentially pretty long but has a natural stopping point. For this, we create a channel in the main function that gets written into when a signal is received (basically he same code that is in goworker already). Then we create another channel, this time buffered (capacity = number of workers) and pass that channel to each worker. Workers then select from that channel at natural stopping points. In a separate goroutine (kicked off from the main function), we read from the signals channel and write N times (= number of workers) to the channel.

The whole flow looks like:

The process receives a signal
Both our signals channel and goworker's signals channels are written into
Goworker stops enqueuing new jobs
We copy the event N times
When any worker finishes, it's handled normally
When a worker gets to a checkpoint where it checks the channel, it returns and goworker takes care of it

rohit4813 · 2017-11-06T08:05:33Z

@mingan Thanks for the great explanation.

If I understand correctly, all the workers will read from the workers channel(which gets populated from the signals channel)?

And I have a use case where I want to stop single/multiple worker(s), say which are running for a very long time and if that is the case all the workers will stop on passing the signal to the channel.

I can identify the worker on which the job is running for a very long time. How can I send the signal to this particular worker?

Hope this is not confusing, or am I missing something.

mingan · 2017-11-06T09:07:23Z

@rohit4813 Our use case just creates breakpoints in long-running jobs so that when we need to restart the process, we don't have to wait (tens of) minutes for the whole job to finish.

If you needed to discriminate between workers, I guess you could do that by sending some meaningful value through the channel and then the worker would decide "this msg is meant for me, I'll stop" or "this is meant for the slow one over there, I can keep running". Though, I can't imagine the use case for such behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry failed jobs #21

Retry failed jobs #21

gravis commented Sep 3, 2014

cdrage commented Mar 9, 2015

rohit4813 commented Nov 6, 2017

mingan commented Nov 6, 2017

rohit4813 commented Nov 6, 2017

mingan commented Nov 6, 2017

Retry failed jobs #21

Retry failed jobs #21

Comments

gravis commented Sep 3, 2014

cdrage commented Mar 9, 2015

rohit4813 commented Nov 6, 2017

mingan commented Nov 6, 2017

rohit4813 commented Nov 6, 2017

mingan commented Nov 6, 2017