-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop consuming messages after failure #30
Comments
@prabello have you tried using |
I don't think it's possible with the current implementation as the producer will always acknowledge the messages, even when they fail. See Handling failed messages. Maybe implementing a custom acknowledger could work somehow, but I think it can be tricky to make it play nicely with the current producer. |
@josevalim no, I may have missed, is there any docs that explain it? @msaraiva thanks for pointing it out, in some cases, I just want to stop everything and fix the upstream and having to re-read things on the dead-letter can be cumbersome |
@prabello it is the name you give to |
I am also curious on this. According to the docs, Broadway Kafka always acks messages. I don't think that is always the desired behaviour. In some cases, such as a downstream dependency being unavailable, we may chose to completely stop this consumer and fail over. In other cases, such as bad serialization of this particular message, the current solution is ideal. What does |
For things such as downstream being unavailable, I would rather consider things like retries with back-offs (say 1s, 2s, 5s, 10s). This means that some messages may still go to the dead-letter queue but the processing speed will be quite reduced. |
Thank you for the quick reply on that.
Does this mean that broadway_kafka will ack before sending messages to |
No! It is always at the end. I have updated the previous comment for clarity. :) |
Whew - scared me for a second there! It's easy for Jared or me to say the right answer is to "stop consuming", but harder to translate that into something actionable. I have 3 thoughts in my head:
It looks like backoff is on the todo list for broadway - are there previous discussions I could review to see if I might be able to contribute? |
The proposed back-off in Broadway is that it is really just a safety net. For example, imagine that in your If the back-off is in Broadway, then the best we can do is to assume it has failed altogether and just slow things down. So it is a safety net. So my suggestion is to do #3 and potentially add #2 as a safety net. |
Perhaps we should add to the docs one very important piece of information: even if As I understand (and I might be saying something dumb here), this is complicated. A common practice is to publish failed messages to another topic/queue so that you can handle it separately. Though, if by any means, publishing fails but commiting the offset succeeds we loose the ability to handle that failed message. Considering Murphy's law: if anything CAN go wrong, it WILL go wrong. I think the only way to go about this on consumer code is to always wrap whatever code we put in |
@victorolinasc the docs are already explicit about this:
However, if you think there are other places we can add this note to make it clearer, pull requests are welcome! |
I've read that part but somehow considered that Thanks! |
I am using the latest version of Broadway Kafka and i do not see any method called |
@josevalim as @amacciola says there seems to be no |
You should be able to call |
@lucacorti i also found that I was shutting down Broadway pipelines when what I really wanted to be doing was suspending the pipelines You can see our conversation here If the goal is to just pause and start pipelines use the solution in that discussion If the goal is to completely shutdown the pipeline then use Genserver.stop(pipeline_name) |
I'm using Broadway.stop(MODULE, :shutdown) to stop the pipeline, but the underlying :brod_sup restarts it after a minute. In the logs, I could see that the restart_type in child's spec is set to permanent - I'm guessing that's the cause. Could you tell if it's the same behavior you see as well or am I missing something? Logs here - #86 |
Btw, this discussion made it clear there are different desired approaches to handling failure. If anyone wants to submit pull requests documenting individual techniques, it will be very welcome! |
Full credit to @slashmili based on our conversation on Slack. Just collecting more ideas here. So there are a few cases where an error can occurs which I already mentioned in the issue.
If you manage to tag a message as failed, your When pushing the failed messages to the DLQ topic, add a Setup another broadway module that consumes the DLQ topic, and does the same action as the original broadway. If it fails bump the Using Kafka for two use cases, one is for high throughput, we can't add Oban and database to the flow, deal with DLQ by putting the failed messages into another topic |
@yordis i am just double checking but all of this you mentioned can already be done right now correct ? This would not need any new PR against Broadway Kafka lib to support ? Because I am already doing something very similar to what you mentioned, just instead of Kafka for the RetryQueue using RabbitMQ and the RabbitMQ Broadway lib because I don't need the failed messages to persist unless they reach max failed attempts and I push them into the DLQ |
Yes
I am not sure what you mean by "supporting". I am a big fan of https://diataxis.fr/ documentation framework, so if you leave it to me, I would love the Elixir ecosystem to start adding "How-To Guides" to showcase potential implementations like your situation using RabbitMQ, or my situation where we may use Oban for it ... I am not sure what is the right call here. For this particular case, Showcase the way we solve the problems |
Correct. This is a specific request for more documentation to be added via PRs. :) |
How can I configure kafka_broadway to stop once an error occurs?
Once an error occurs, I would like to stop consuming messages and keep the same offset, this may be due to the publisher sending wrong information or something that needs to be changed on my application pipeline.
Right now I'm using the handle_failed to publish into a dead-letter-topic, but its not the ideal behavior for my use case.
Is it possible to change the offset of broadway consumer group to skip some messages or even replay?
The text was updated successfully, but these errors were encountered: