Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent::Hash default initialization is not fully thread-safe #970

Open
mensfeld opened this issue Nov 17, 2022 · 8 comments
Open

Concurrent::Hash default initialization is not fully thread-safe #970

mensfeld opened this issue Nov 17, 2022 · 8 comments
Assignees
Labels
high-priority Should be done ASAP.

Comments

@mensfeld
Copy link

mensfeld commented Nov 17, 2022

Based on the docs:

A thread-safe subclass of Hash. This version locks against the object itself for every method call, ensuring only one thread can be reading or writing at a time. This includes iteration methods like #each, which takes the lock repeatedly when reading an item.

Given this code:

h = Concurrent::Hash.new do |hash, key|
  hash[key] = Concurrent::Array.new
end

the initialization is not thread-safe.

Note from @eregon, the thread-safe variant of this code is:

h = Concurrent::Map.new do |hash, key|
  hash.compute_if_absent(key) { Concurrent::Array.new }
end

Obviously the latter part of the doc indicates that:

ensuring only one thread can be reading or writing at a time

but the initial part makes it confusing:

This version locks against the object itself for every method call

It can be demoed by running this code:

require 'concurrent-ruby'

1000.times do
  h = Concurrent::Hash.new do |hash, key|
    hash[key] = Concurrent::Array.new
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end

I would expect to either:

  1. Have the initialization block behind a mutex - so there is no conflict
  2. Have the docs updated (I can do that)

Works like so:

require 'concurrent-ruby'

m = Mutex.new

1000.times do
  h = Concurrent::Hash.new do |hash, key|
    m.synchronize do
      break hash[key] if hash.key?(key)

      hash[key] = Concurrent::Array.new
    end
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end
@mensfeld mensfeld changed the title Concurrent::Hash default initialization is not thread-sae Concurrent::Hash default initialization is not fully thread-safe Nov 17, 2022
@nijikon
Copy link

nijikon commented Nov 17, 2022

Oh, wow. This is interesting.

@mensfeld
Copy link
Author

Same applies to the Concurrent::Map:

require 'concurrent-ruby'

1000.times do
  h = Concurrent::Map.new do |hash, key|
    hash[key] = Concurrent::Array.new
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end

mensfeld added a commit to mensfeld/rails that referenced this issue Nov 20, 2022
This change makes the initialization of the hash upon missing key fully thread-safe. Before this change, initialization that would occur in two threads could overwrite each other, as illustrated here:

ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/rails that referenced this issue Nov 20, 2022
Behavior upon missing prefix partial name may cause a key to overwrite when executed in multiple threads at the same time.

ref ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/finite_machine that referenced this issue Nov 20, 2022
mensfeld added a commit to mensfeld/graphql-ruby that referenced this issue Nov 20, 2022
Initialization upon miss can lead to hard to debug scenarios where potentially a concurrent array will leak out but the value in the `@events` will be different.

as described here: ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/rom-factory that referenced this issue Nov 20, 2022
Without this change potential incrementation can "go away" and can actually mismatch by 1 if run in multiple threads the same time. This can lead to super weird errors where counter is not as expected (been there, took me ages to debug).

ref: ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/whimsy that referenced this issue Nov 21, 2022
Under intense threading usage, the `@@users` buffer will not be thread-safe fully. Details here:

ref ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/krane that referenced this issue Nov 21, 2022
In case of extensive concurrent usage, the mutex handed over to two threads under same key may differ as illustrated below:

```ruby
require 'concurrent'

10000.times do
  kind_fetcher_locks = Concurrent::Hash.new { |hash, key| hash[key] = Mutex.new }

  refs = Set.new
  100.times.map do |i|
    Thread.new { refs << kind_fetcher_locks[i % 50].object_id }
  end.each(&:join)

  raise "Not 50 but #{refs.count}" unless refs.size == 50
end
```

this can lead to really weird issues.

Works when fixed as above:

```ruby
require 'concurrent'

10000.times do
  mutex = Mutex.new
  # kind_fetcher_locks = Concurrent::Hash.new { |hash, key| hash[key] = Mutex.new }

  kind_fetcher_locks = Concurrent::Hash.new do |hash, key|
    mutex.synchronize do
      break hash[key] if hash.key?(key)
      hash[key] = Mutex.new
    end
  end

  refs = Set.new
  100.times.map do |i|
    Thread.new { refs << kind_fetcher_locks[i % 50].object_id }
  end.each(&:join)

  raise "Not 50 but #{refs.count}" unless refs.size == 50
end
```

ref ruby-concurrency/concurrent-ruby#970
mensfeld added a commit to mensfeld/puppet that referenced this issue Nov 21, 2022
Not locking the default initialization can lead to race-conditions.

Note: not sure if I should use one or two mutexes as I am not familiar with this code enough to make the judgment.

ref: ruby-concurrency/concurrent-ruby#970
sebbASF pushed a commit to apache/whimsy that referenced this issue Nov 22, 2022
Under intense threading usage, the `@@users` buffer will not be thread-safe fully. Details here:

ref ruby-concurrency/concurrent-ruby#970
@nightpool
Copy link

This pattern is widely prevalent in open source code and it's very very clear that developers assume that this works. I think it's very important to wrap this initializer block in a mutex and not just update the docs

@granthusbands
Copy link

It's now slightly complicated, as some (as above) have a fix that assumes the initializer is not in the mutex and so call compute_if_absent or such. Unless the mutex is reentrant or there's a test for this, it would then deadlock.

Also, it's suboptimal to use a mutex that's separate from the hash, as above; it would help for Concurrent::Hash to have some of the quality-of-life improvements of Concurrent::Map or at least a way to use the same lock.

@mensfeld
Copy link
Author

@granthusbands if not fixable or brings weird problems to the table, maybe we could expand rubocop to notify on common mistakes, etc.

@eregon
Copy link
Collaborator

eregon commented Dec 12, 2022

Thank you for the issue report. I generally agree we should fix this if we can. The question is how.

(1) We could (try to) use a lock around the whole initializer, but that is also a typical anti-pattern to hold a lock so long, and that can lead to deadlock (e.g., if 2 Concurrent::Hash initializer blocks refer to one another, like #627 which uses the block of each but for Hash/Map).
This seems quite difficult given the various backends. Not all backends use a Mutex for instance or even a lock for all operations on a Concurrent::Hash. We'd need to somehow make it work for each of them independently.
As a note, these are the semantics of ConcurrentHashMap#computeIfAbsent in Java. That also says: The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map. that latter part which we cannot guarantee for an arbitrary initializer block, so it feels a bit wrong at least.
Yet another challenge is Concurrent::Hash is currently just ::Hash on CRuby, but can't anymore if we fix this. From that view 1) this might be considered caused by CRuby Hash and 2) it may make sense to actually try to fix this in core Hash.

(2) We could do what I suggested in my PhD thesis to solve basically the same issue but on Hash itself (BTW, CRuby Hash does not guarantee this): https://eregon.me/blog/assets/research/thesis-thread-safe-data-representations-in-dynamic-languages.pdf page 83 Idiomatic Concurrent Hash Operations. In short, it replaces []= calls in the initializer block with put_if_absent by passing a different object than the Concurrent::Hash itself, which overrides []= and delegates the rest.

It's a classic "pick 1 or 2 but all 3 seems impossible":

  • Write into the Concurrent::Hash only once, required to fix for this issue
  • Execute the block only once, nice-to-have
  • No deadlocks caused by this

@eregon eregon self-assigned this Dec 12, 2022
@eregon
Copy link
Collaborator

eregon commented Dec 15, 2022

I've filed an issue on the CRuby tracker to see what they think about the same problem for core Hash: https://bugs.ruby-lang.org/issues/19237

@eregon
Copy link
Collaborator

eregon commented Mar 9, 2023

FWIW CRuby closed that ticket and added documentation that Hash is not thread-safe for that case: https://bugs.ruby-lang.org/issues/19237#note-2 and ruby/ruby@ffd5241.
I think it makes sense to solve this for Concurrent::Hash and Concurrent::Map.
We'll need to pick one of the two approaches above.

Using the lock approach would also fix #929, but makes it prone to deadlocks. For example we'll already need Monitor and not Mutex to let existing usages of compute_if_absent inside the block work fine and not error due to trying to lock the same Mutex again.

Using an object forwarding []= differently might be surprising.

Another option would be to pass a special object to the block, which warns on []= inside the block as that's not atomic, to let people know they should use compute_if_absent instead.

@eregon eregon added the high-priority Should be done ASAP. label Mar 10, 2023
piotrmurach pushed a commit to mensfeld/finite_machine that referenced this issue Oct 8, 2023
piotrmurach pushed a commit to mensfeld/finite_machine that referenced this issue Oct 8, 2023
piotrmurach pushed a commit to piotrmurach/finite_machine that referenced this issue Oct 8, 2023
piotrmurach pushed a commit to piotrmurach/finite_machine that referenced this issue Oct 8, 2023
joshcooper added a commit to joshcooper/puppet that referenced this issue Nov 16, 2023
Not locking the default initialization can lead to race-conditions.

ref: ruby-concurrency/concurrent-ruby#970
Co-authored-by: Maciej Mensfeld <[email protected]>
resolves puppetlabs#8951
joshcooper added a commit to joshcooper/puppet that referenced this issue Nov 16, 2023
Not locking the default initialization can lead to race-conditions.

I don't think we can switch to Concurrent::Map, and it's compute_if_absent
method, because insertion order won't be maintained. So synchronize the long
way.

ref: ruby-concurrency/concurrent-ruby#970
Co-authored-by: Maciej Mensfeld <[email protected]>
resolves puppetlabs#8951
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-priority Should be done ASAP.
Projects
None yet
Development

No branches or pull requests

5 participants