-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run solid_queue with SQLite #29
Conversation
@@ -5,7 +5,7 @@ default: &default | |||
default: | |||
pool_size: 5 | |||
scheduler: | |||
polling_interval: 300 | |||
polling_interval: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test that SIGKILLs the supervisor
Without this the scheduler process hangs round for a long time, as it only notices that the supervisor is gone after the polling interval elapses.
30ad653
to
de38dd5
Compare
Update the tests to run solid_queue on SQLite. They can be triggered by: ``` TARGET_DB=sqlite rails test ``` The adapter is prone to raising `SQLite3::BusyException`s when concurrent transactions occur. Preventing this requires a couple of patches to the adapter. 1. Implement retry with backoff - add a `retries` config setting to the adapter and sleep progressively longer for each retry. This setting is currently in the Rails main branch, but that implementation has no backoff. That doesn't work well in our case. 1. Always create immediate transactions - SQLite by default creates deferred transactions, which don't take a write lock. Then later if there is a write it tries to upgrade the lock. This won't work if the transaction has a stale read so retrying the write by itself is not possible. Starting with an immediate transaction moves the write lock to that point and ensures that we only get blocked on a retryable transaction.
@@ -47,6 +47,7 @@ class SupervisorTest < ActiveSupport::TestCase | |||
end | |||
|
|||
test "abort if there's already a pidfile for a supervisor" do | |||
FileUtils.mkdir_p(File.dirname(@pidfile)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed a couple of times, I think it was relying on the test order to ensure that the directory was already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yes! My bad, thanks for catching it 🙏
def begin_db_transaction | ||
log("begin immediate transaction", "TRANSACTION") do | ||
with_raw_connection(allow_retry: true, materialize_transactions: false) do |conn| | ||
conn.transaction(:immediate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if Solid Queue could do this for its own operations without the host application having to patch the adapter. If I'm understanding it correctly, we need this to avoid it doing stale reads when fetching items from the queue, so a SQLite app that wants to use Solid Queue will have to do this, is that right? And having Solid Queue patch it globally might be too heavy-handed.
Perhaps we'll find it's a general enough problem that it should go into Rails anyway. I'm not sure whether that's the case or not.
An alternative would be to use a write operation to fetch from the queue (that's what we're doing in Campfire at the moment). That means having different statements for the different adapters, but given that the SQLite adapter already ignores part of the intended statement (the lock part), the behaviour differences do exist, so maybe not a bad idea for it to be explicit?
I could see it working either way, so, food for thought I guess :) What do you think?
If we do stick with the patch, we'll probably need to document this requirement, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLite will prevent you from reading stale data, even if you use a deferred transaction (the default for Rails). But when it realises that it has stale data we will get a busy exception without any retries. So without things will be correct, but with lots of exceptions.
If you know you are going to write in the transaction I think you's always want an immediate one, which means that for a lot of apps all transactions probably be immediate, but that's not something we should be deciding, so I think a way to choose per transaction would be useful (or maybe mark a model as using immediate transactions?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, yeah that's what I meant 👍 It's not that it would use stale data, it's that reading it will send it into that error state where the transaction is effectively broken.
I agree that having some way to mark which transactions should be immediate would be nice, and then Solid Queue could get the desired behaviour regardless of what the rest of the app is doing.
I think that's why I haven't run into this with my other queue code too, since it uses a write operation to fetch from the queue, so this particular situation doesn't come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on why you haven't had this issue, begin deferred transaction + a write is equivalent to calling begin immediate just before the write occurs (see https://www.sqlite.org/lang_transaction.html#deferred_immediate_and_exclusive_transactions).
A big reason why I think we'd want to default to immediate transactions is the transactions created internally by ActiveRecord
updates (calling save!
, create
etc).
There could be reads before writes with these depending on what the before_
callbacks do, if so we'd hit the stale read issue with these, unless we wrapped the call in our own immediate transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally. It does seem like it'd make sense for immediate transactions to be the Rails default when using SQLite, because of this behaviour. We could always provide a way to start a deferred transaction manually, and/or override the default. But having a safer default would be useful for making SQLite apps work better out of the box.
It feels like overreaching for an individual library to patch that, but that's not the case for Rails itself. It already provides good defaults for lots of things, and this seems similarly appropriate.
In the short term we can patch it in our app code anyway, that's fine. My original point was just that for Solid Queue to be easily used with SQLite by other folks, it'd be nice if the problem is solved elsewhere so apps don't have to be concerned with it. But maybe that just means upstreaming the patch.
if @config[:retries] | ||
retries = self.class.type_cast_config_to_integer(@config[:retries]) | ||
raw_connection.busy_handler do |count| | ||
(count <= retries).tap { |result| sleep count * 0.001 if result } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite a clever way of expressing this logic. Nice! 👍
I have to imagine we'll need a backoff like this in Rails. This is how I was handling it too, and it was working reliably. But since the new default removes the sleep between retries, it seems like it'd be easy to blow through all the retries very quickly while waiting for a lock to resolve. Is that what you were seeing when you tried it with the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's exactly what I was seeing 👍
Update the tests to run solid_queue on SQLite.
They can be triggered by:
The adapter is prone to raising
SQLite3::BusyException
s when concurrent transactions occur. Preventing this requires a couple of patches to the adapter.retries
config setting to the adapter and sleep progressively longer for each retry. This setting is currently in the Rails main branch, but that implementation has no backoff. That doesn't work well in our case.cc @rosa, @kevinmcconnell