chore(deps): update dependency org.typelevel:cats-effect to v3 #2067

renovate · 2023-05-28T15:26:28Z

This PR contains the following updates:

Package	Update	Change
org.typelevel:cats-effect	major	`2.5.5` -> `3.6-1f95fd7`

Release Notes

typelevel/cats-effect

`v3.5.0`

Compare Source

This is the forty-fifth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release.

⚠️ Important note

This release contains some changes that may be semantically breaking. If you're using fs2, http4s, or other libraries from the ecosystem, make sure you've upgraded to versions of these libraries that are compatible with this release (for fs2, that's 3.7.0, for http4s it's 0.23.19)!

Additionally, if you're using methods like fromFuture, make sure you're aware of the major changes to async, described in these release notes.

This is an incredibly exciting release! 3.5.0 represents the very first steps towards a fully integrated runtime, with support for timers (IO.sleep) built directly into the Cats Effect fiber runtime. This considerably increases performance for existing Cats Effect applications, but particularly those which rely more heavily on native IO concurrency (e.g. Http4s Ember will see more benefits than Http4s Blaze).

Additionally, we've taken the opportunity presented by a minor release to fix some breaking semantic issues within some of the core IO functionality, particularly related to async. For most applications this should be essentially invisible, but it closes a long-standing loophole in the cancelation and backpressure model, ensuring a greater degree of safety in Cats Effect's guarantees.

Major Changes

Despite the deceptively short list of merged pull requests, this release contains an unusually large number of significant changes in runtime semantics. The changes in async cancelation (and particularly the implications on async_) are definitely expected to have user-facing impact, potentially breaking existing code in subtle ways. If you have any code which uses async_ (or async) directly, you should read this section very carefully and potentially make the corresponding changes.

`async` Cancelation Semantics

The IO.async (and correspondingly, Async#async) constructor takes a function which returns a value of type IO[Option[IO[Unit]]], with the Some case indicating the finalizer which should be invoked if the fiber is canceled while asynchronously suspended at this precise point, and None indicating that there is no finalizer for the current asynchronous suspension. This mechanism is most commonly used for "unregister" functions. For example, consider the following reimplementation of the sleep constructor:

def sleep(time: FiniteDuration, executor: ScheduledExecutorService): IO[Unit] =
  IO.async[Unit] { cb =>
    IO {
      val f = executor.schedule(() => cb(Right(())), time.toNanos, TimeUnit.NANOSECONDS)
      Some(IO(f.cancel()))
    }
  }

In the above, the IO returned from sleep will suspend for time. If its fiber is canceled, the f.cancel() function will be invoked (on ScheduledFuture), which in turn removes the Runnable from the ScheduledExecutorService, avoiding memory leaks and such. If we had instead returned None from the registration effect, there would have been no finalizer and no way for fiber cancelation to clean up the stray ScheduledFuture.

The entirety of Cats Effect's design is prescriptively oriented around safe cancelation. If Cats Effect cannot guarantee that a resource is safely released, it will prevent cancelation from short-circuiting until execution proceeds to a point at which all finalization is safe. This design does have some tradeoffs (it can lead to deadlocks in poorly behaved programs), but it has the helpful outcome of strictly avoiding resource leaks, either due to incorrect finalization or circumvented backpressure.

...except in IO.async. Prior to 3.5.0, defining an async effect without a finalizer (i.e. producing None) resulted in an effect which could be canceled unconditionally, without the invocation of any finalizer. This was most seriously felt in the async_ convenience constructor, which always returns None. Unfortunately, this semantic is very much the wrong default. It makes the assumption that the normal case for async is that the callback just cleans itself up (somehow) and no unregistration is possible or necessary. In almost all cases, the opposite is true.

It is exceptionally rare, in fact, for an async effect to not have an obvious finalizer. By defining the default in this fashion, Cats Effect made it very easy to engineer resource leaks and backpressure loss. This loophole is now closed, both in the IO implementation and in the laws which govern its behavior.

As of 3.5.0, the following is now considered to be uncancelable:

IO.async[A] { cb =>
  IO {
    // ...
    None    // we aren't returning a finalizer
  }
}

Previously, the above was cancelable without any caveats. Notably, this applies to all uses of the async_ constructor!

In practice, we expect that usage of the async constructor which was already well behaved will be unaffected by this change. However, any use which is (possibly unintentionally) relying on the old semantic will break, potentially resulting in deadlock as a cancelation which was previously observed will now be suppressed until the async completes. For this reason, users are advised to carefully audit their use of async to ensure that they always return Some(...) with the appropriate finalizer that unregisters their callback.

In the event that you need to restore the previous semantics, they can be approximated by producing Some(IO.unit) from the registration. This is a very rare situation, but it does arise in some cases. For example, the definition of IO.never had to be adjusted to the following:

def never: IO[Nothing] =
  IO.async(_ => IO.pure(Some(IO.unit)))  // was previously IO.pure(None)

This change can result in some very subtle consequences. If you find unexpected effects in your application after upgrading to 3.5.0, you should start your investigation with this change! (note that this change also affects third-party libraries using async, even if they have themselves not yet updated to 3.5.0 or higher!)

Integrated Timers

From the very beginning, Cats Effect and applications built on top of it have managed timers (i.e. IO.sleep and everything built on top of it) on the JVM by using a separate thread pool. In particular, ScheduledExecutorService. This is an extremely standard approach used prolifically by almost all JVM applications. Unfortunately, it is also fundamentally suboptimal.

The problem stems from the fact that ScheduledExecutorService isn't magic. It works by maintaining one or more event dispatch threads which interrogate a data structure containing all active timers. If any timers have passed their expiry, the thread invokes their Runnable. If no timers are expired, the thread blocks for the minimum time until the next timer becomes available. In its default configuration, the Cats Effect runtime provisions exactly one event dispatch thread for this purpose.

This isn't so bad when an application makes very little use of timers, since the thread in question will spend almost all of its time blocked, doing nothing. This affects timeslice granularity within the OS kernel and adds an additional GC root, but both effects are small enough that they are usually unnoticed. The bigger problem comes when an application is using a lot of timers and the thread is constantly busy reading that data structure and dispatching the next set of Runnable(s) (all of which complete asyncs and immediately shift back into the Cats Effect compute pool).

Unfortunately, this situation where a lot of timers are in use is exactly what happens in every network application, since each and every active socket must have at least one IO.sleep associated with it to time out handling if the remote side stops responding (in most cases, such as HTTP, even more than one timer is needed). In other words, the fact that IO.sleep is relatively inefficient when a lot of concurrent sleeps are scheduled is particularly egregiously bad, since this is precisely the situation that describes most real-world usage of Cats Effect.

So we made this better! Cats Effect 3.5.0 introduces a new implementation of timers based on cooperative polling, which is basically the idea that timers can be dispatched and handled entirely by the same threads which handle compute work. Every time a compute worker thread runs out of work to do (and has nothing to steal), rather than just parking and waiting for more work, it first checks to see if there are any outstanding timers. If there are some which are ready to run, it runs them. Otherwise, if there are timers which aren't yet completed, the worker parks for that period of time (or until awakened by new work), ensuring the timer fires on schedule. In the event that a worker has not had the opportunity to park in some number of iterations, it proactively checks on its timers just to see if any have expired while it has been busy doing CPU-bound work.

This technique works extremely well in Cats Effect precisely because every timer had to shift back to the compute pool anyway, meaning that it was already impossible for any timer to have a granularity which was finer than that of the compute worker thread task queue. Thus, having that same task queue manage the dispatching of the timers themselves ensures that at worst those timers run with the same precision as previously, and at best we are able to avoid a considerable amount of overhead both in the form of OS kernel scheduler contention (since we are removing a whole thread from the application!) and the expense of a round-trip context shift and passage through the external work queue.

And, as mentioned, this optimization applies specifically to a scenario which is present in almost all real-world Cats Effect applications! To that end, we tested the performance of a relatively simple Http4s Ember server while under heavy load generated using the hey benchmark tool. The result was a roughly 15-25% improvement in sustained maximum requests per second, and a roughly 15% improvement in the 99th percentile latencies (P99). In practical terms, this means that this one change makes standard microservice applications around 15% more efficient with no other adjustments.

Obviously, you should do your own benchmarking to measure the impact of this optimization, but we expect the results to be very visible in production top-line metrics.

User-Facing Pull Requests

#3615 – Fixed issue in which failing uncancelable would remain masked for one stage (@djspiewak)
#3611 – Treat non-positive sleep durations as cedes (@armanbilge)
#3610 – Catch stray exceptions in uncancelable body (@armanbilge)
#3606 – Adjusted Queue.synchronous to include a two-phase commit (@djspiewak)
#3604 – Reset the global runtime when it is shutdown (@armanbilge)
#3599 – Revised Queue.synchronous internals to simplify concurrent hand-off (@djspiewak)
#3596 – Fix Mutex memory leak (@BalmungSan)
#3496 – Add console as config in ioRuntimeConfig, pass it to CPUStarvation (@manuelcueto)
#3586 – Try to fix #3568 (@durban)
#3579 – dispatcher releasing itself rejects new tasks (@samspills)
#3562 – New AsyncMutex implementation (@BalmungSan)
#3567 – Make blockedThreadDetectionEnabled configurable via a system property (@chunjef)
#3555 – Fix mutex cancelled acquire even more (@durban)
#3556 – Fix problem with nextGaussian test (@antoniojimeneznieto)
#3549 – Fix mutex cancelled acquire (@durban)
#3546 – Cpu-starvation warnings one line (@mox692)
#3428 – Parallel map2 optimization (@durban)
#3499 – Shared timers (@durban)
#3518 – AtomicCell#get should not semantically block (@armanbilge)
#3465 – Make Console#readLine cancelable (@armanbilge)
#3435 – Further optimize IODeferred (@armanbilge)
#3480 – Make HotSwap safe to concurrent access (@armanbilge)
#3516 – Add basic tests for RandomSpec (@antoniojimeneznieto)
#3490 – Fix "support re-enablement via cancelable" test (@armanbilge)
#3484 – Allow that the renamed blocker thread is terminated (@aeons)
#3478 – Fix IORuntimeBuilder failureReporter config on JS (@armanbilge)
#3460 – Added cancelable (@djspiewak)
#3453 – Corrected handling of self-cancelation within timeout (@djspiewak)
#3432 – Fixed issues in the timer handling state machine integration (@djspiewak)
#3434 – Fix NPE in blocked thread detection (@djspiewak)
#3409 – Even faster async mutex (@armanbilge)
#3408 – Add 'flatModify', 'flatModifyFull' and corresponding 'State' methods (@seigert)
#3387 – Thread blocking detection (@TimWSpence)
#3374 – Add fromFutureCancelable and friends (@armanbilge)
#3346 – Optimize Mutex & AtomicCell (@BalmungSan)
#3405 – Remove IOLocal#scope, revert #3214 (@armanbilge)
#3347 – ConcurrentAtomicCell (@BalmungSan)
#3302 – Add 'IOLocal.lens' method to produce lens 'A <=> B' (@seigert)
#3360 – IOLocal - generalize scope function (@iRevive)
#3388 – Protect timers against Long overflow (@durban)
#3219 – Integrated timers (@djspiewak)
#3225 – Introduce a BatchingMacrotaskExecutor (@armanbilge)
#3311 – Remove Ref's flatModify (@mn98)
#3328 – Use asyncCheckAttempt in IODeferred#get (@armanbilge)
#3304 – Add IO#supervise, IO#toResource, IO#metered (@kamilkloch)
#3299 – Add IO#voidError (@armanbilge)
#3205 – Change async_ to be uncancelable (@djspiewak)
#3273 – A new combinator, flatModify, on Ref (@mn98)
#3264 – Defer instance for Resource without Sync requirement (@Odomontois)
#3091 – Add Async#asyncCheckAttempt for #3087 (@seigert)
#3214 – Add IOLocal#scope (@iRevive)
#3002, #3390, #3416, #3455, #3477 – Documentation fixes and improvements (@danicheg, @davidabrahams, @TimWSpence, @bplommer, @amast09)

A very special and heartfelt thanks to all of you!

`v3.4.11`

Compare Source

This is the forty-fourth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3591 – Fix worker thread name index (@durban)

Thank you, Daniel!

`v3.4.10`

Compare Source

This is the forty-second release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3546 – Cpu-starvation warnings one line (@mox692)
#3428 – Parallel map2 optimization (@durban)

Very special thanks to all!

`v3.4.9`

Compare Source

This is the fortieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3510 – Dispatcher: check for outstanding actions before release (@samspills)
#3530 – forward raceOutcome to correct implementation (@Jasper-M)
#3487 – Dispatcher error reporting (@samspills)
#3492 – Removed unneeded prefetch in IOFiber (@djspiewak)
#3498 – Add flush to print functions in std.Console (@zetashift)
#3489 – Actually restore IODeferred specialization (@armanbilge)
#3475 – report failures during unsafeRunAndForget (@samspills)
#3467 – IOFiber#toString (@durban)
#3454 – Ensure that deadlock detection considers results availability (@djspiewak)
#3479, #3462, #3485, #3451 – Documentation fixes and improvements (@durban, @djspiewak, @matelaszlo, @Wosin)

Special thanks to each and every one of you!

`v3.4.8`

This is the thirty-ninth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

This release fixes a very rare runtime bug which manifests in applications with a high degree of contention on blocking/interruptible operations. In some rare circumstances, a fiber could be lost during the scheduling process, which could result in application-level deadlocks.

User-Facing Pull Requests

#3444 – Demonstrate and try to fix a possible bug with a "lost" fiber (@durban)
#3446 – Replace ArrayBlockingQueue with LinkedTransferQueue (@vasilmkd)
#3425 – Fix fromCompletableFuture cancelation leak (@TimWSpence, @armanbilge)
#3424 – Fix corking when writing large strings to stdout/stderr on Node.js (@armanbilge)
#3412, #3421 – Documentation fixes and improvements (@mtomko, @durban)

Thank you, everyone!

`v3.4.7`

Compare Source

This is the thirty-sixth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3403 – Fix CallbackStack leak, restore specialized IODeferred (@armanbilge)

Thanks, Arman! <3

`v3.4.6`

Compare Source

This is the thirty-sixth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3384 – Starvation clock timestamp (@mtomko)
#3393 – Simplify ContState (@durban)
#3389 – Handle if async callback receives null (@durban)
#3381 – Feature test for clearTimeout (@armanbilge)
#3375 – Refactor fromCompletableFuture to use cont (@armanbilge)
#3372 – Move feature test for JS nowMicros() into Try (@armanbilge)
#3380, #3378 – Documentation fixes and improvements (@baber, @armanbilge)

Special thanks to each and every one of you!

`v3.4.5`

Compare Source

This is the thirty-fifth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

This release rolls back the Deferred[IO, A] optimizations for the time being due to a memory leak in certain common scenarios. In particular, any use of Fs2's interruptWhen where the stream in question naturally completes quickly would hit this case relatively hard. Like, for example, Http4s Ember. We have a fix for the memory leak which needs a bit more testing before release, and we felt that, out of an abundance of caution, it is better to revert the changes immediately rather than waiting for the hardening.

User-Facing Pull Requests

#3365 – Bypass IO-specialized Deferred for now (@djspiewak)
#3351 – Fix cancelation handling in memoize (@armanbilge)
#3364 – Fix hang in Dispatcher.sequential(await = true) release (@armanbilge)
#3342 – Move the mbean initialization out of blocking and into delay (@mox692)
#3361 – Fix CallbackStack leak on JS (@armanbilge)
#3357 – Fix null pointer in tracing on javascript (@Jasper-M)
#3355, #3354 – Documentation fixes and improvements (@armanbilge, @samspills)

Thank you so very much!

`v3.4.4`

Compare Source

This is the thirty-fourth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

This release fixes a memory leak in Deferred. The memory leak in question is relatively small, but can accumulate over a long period of time in certain common applications. Additionally, this leak regresses GC performance slightly for almost all Cats Effect applications. For this reason, it is highly recommended that users upgrade to this release as soon as possible if currently using version 3.4.3.

User-Facing Pull Requests

#3336 – Update scala-js-macrotask-executor to 1.1.1 (@armanbilge)
#3334 – Fix compile using CallbackStack#clear method (@armanbilge)
#3324 – Specialize CallbackStack on JS (@armanbilge)
#3333 – Avoid leaks in IODeferred (@durban)
#3307 – Fix propagation of ExitCase in Resource#{both,combineK} (@armanbilge)

Thank you so very much!

`v3.4.3`

Compare Source

This is the thirty-third release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

Despite being a patch release, this update contains two major notable feature additions: full tracing support for Scala Native applications (including enhanced exceptions!), and significantly improved performance for Deferred when IO is the base monad. Regarding the latter, since Deferred is at the core of most concurrent logic written against Cats Effect, it is expected that this change will result in some noticeable performance improvements in most applications, though it is hard to predict exactly how pronounced this effect will be.

User-Facing Pull Requests

#3284 – Added a specialized version of Deferred based on IOFiber's machinery (@djspiewak)
#3226 – Release loser eagerly in Resource.race (@armanbilge)
#3315 – Use configurable reportFailure for MainThread (@armanbilge)
#3305 – More detailed warning for starvation checker (@armanbilge)
#3310 – IOLocal micro-optimizations (@armanbilge)
#3195 – Tracing for Scala Native (@armanbilge)
#3322 – Documentation fixes and improvements (@djspiewak)

Very special thanks to all of you!

`v3.4.2`

Compare Source

This is the thirty-second release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

User-Facing Pull Requests

#3290 – make Deferred#complete uncancelable (@durban)
#3281 – Implement Ref without wrapping AtomicReference on JS/Native (@armanbilge)
#3280 – Suspend cell read in AtomicCell#evalModify (@armanbilge)
#3275 – Fix 'Dispatcher' deprecaton annotation 'message' and 'since' (@seigert)
#3269, #3267, #3257 – Documentation fixes and improvements (@iRevive)

Thank you so much!

`v3.4.1`

This is the thirty-first release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. The primary purpose of this release is to address a minor link-time regression which manifested when extending IOApp with a class (not a trait) which was in turn extended by another class. In this scenario, the resulting main class would hang on exit if the intervening extension class had not been recompiled against Cats Effect 3.4.0. Note that this issue with separate compilation and IOApp does remain in a limited form: the MainThread executor is inaccessible when linked in this fashion. The solution is to ensure that all compilation units which extend IOApp (directly or indirectly) are compiled against Cats Effect 3.4.0 or later.

User-Facing Pull Requests

#3254 – Workaround for IOApp deadlock (@armanbilge)
#3255, #3253 – Documentation fixes and improvements (@iRevive)

Thank you, everyone!

`v3.4.0`

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

A Note on Release Cadence

While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.

For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.

Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.

Major Changes

As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.

High Performance `Queue`

One of the core concurrency utilities in Cats Effect is Queue. Despite its ubiquity in modern applications, the implementation of Queue has always been relatively naive, based entirely on immutable data structures, Ref, and Deferred. In particular, the core of the bounded Queue implementation since 3.0 looks like the following:

final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])

final case class State[F[_], A](
    queue: ScalaQueue[A],
    size: Int,
    takers: ScalaQueue[Deferred[F, Unit]],
    offerers: ScalaQueue[Deferred[F, Unit]])

The ScalaQueue type refers to scala.collection.immutable.Queue, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take) within this implementation rely on Ref#modify to update internal state, with Deferred functioning as a signalling mechanism when take or offer need to semantically block (because the queue is empty or full, respectively).

This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref and Deferred, it is usable in any context which has a Concurrent constraint on F[_], allowing for a significant amount of generality and abstraction within downstream frameworks.

Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent abstraction.

Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async constraint in place of the Concurrent one.

The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue implementation only requires Concurrent, and it would not make much sense to force an Async constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async, both to directly implement asynchronous suspension (rather than redirecting through Deferred) and to safely suspend the side-effects required to manipulate mutable data structures.

This problem has been solved by using runtime casing on the Concurrent instance behind the scenes. In particular, whenever you construct a Queue.bounded, the runtime type of that instance is checked to see if it is secretly an Async. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.

As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue implementation from Java to Scala, making slight adjustments along the way. In particular:

The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
Several minor optimizations have been elided, most notably those which rely on sun.misc.Unsafe for manipulation of directional memory fences
Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for null values without introducing extra boxing
Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer

All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.

This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue. In particular, when offer fails on MpmcArrayQueue (because the queue is full), it simply rejects the value. When offer fails on Cats Effect's Queue, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.

In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue (from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.

Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:

This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.

The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at the top, the queue data structure itself does not appear to be the performance bottleneck in any practical application, but every bit helps, and free performance is still free performance!

Hardened `Queue` Semantics

As a part of the rework of the core data structures, it was decided to make a very subtle change to the semantics of the Queue data structure while under heavy load, particularly in true multi-producer, multi-consumer (MPMC) scenarios. Under certain circumstances, the previous implementation of Queue could actually lose data. This manifested when one fiber enqueued a value, while another fiber dequeued that value and was canceled during the dequeue. When this happened, it was possible for the value to have been removed from the underlying data structure but not fully returned from the poll effect, meaning that it could be lost without user-land code having any chance to access it within a finalizer.

This sounds like a relatively serious issue, though it's important to understand that the race condition which gives rise to this was vanishingly rare (to the point where no one has ever, to our knowledge, encountered this in the wild). However, fixing this semantic required reworking a lot of the core guarantees offered by the data structure. In particular, it is now no longer strictly guaranteed in all cases while under contention that elements read from a queue by multiple concurrent consumers will be read in exactly insertion order.

More specifically, imagine a situation where you have two consumers and two producers on an empty queue. Consumer A attaches first (using poll), followed by consumer B. Immediately after this, the first producer writes value 1, followed by the second producer writing value 2. Critically, both the first and second producer need to write to the queue at nearly exactly the same moment.

With the previous implementation of Queue, users could rely on an ironclad guarantee that consumer A would get value 1, while consumer B would get value 2. Now, this is no longer strictly guaranteed. It is possible for B to get 1 while A gets 2. In fact, there is an even stranger version of this race condition which only involves a single producer but still generates a similar outcome: consumer A calls poll, and sometime later consumer B calls poll at the same moment that the single producer offers item 1. When this scenario arises, it is possible for B to get item 1 and A to get nothing at all, despite the fact that A has been waiting patiently for some significant length of time.

More precisely, the new Queue no longer strictly guarantees fairness across multiple consumers when under concurrent contention. This loss of fairness can, under certain circumstances, manifest as a corruption of ordering, though one which is unobservable except if the user were to somehow coordinate precise timestamps across multiple consuming fibers. And, as it turns out, the weakening of these guarantees are directly connected to the fix for the (rare) loss of data during fiber cancelation.

To be clear, multi-consumer scenarios are rather rare to begin with, and I cannot think of a single circumstance under which someone would have a multi-consumer Queue and have any expectation of strong ordering or fairness between their consumers. As an appeal to authority, this kind of loss of fairness is extremely standard across all MPMC queue implementations in other languages and runtimes, specifically because data loss is a much more dangerous and impactful outcome and must be avoided at all costs.

To that end, it is considered very unlikely that users will even notice this change, but it is still a significant and subtle adjustment in the core semantics of Queue. The upside of all of this is users can now rely on the guarantee that, if an effect offer(a) completes successfully, then the value a will be "in the queue" and will be later readable by a poll effect. Additionally, if and only if poll removes the element, a, from the queue, it will complete successfully even if externally canceled; conversely, if poll is canceled before it removes a from the queue, then a will remain available for subsequent polls. Thus, data loss is avoided in all cases.

More Robust `Dispatcher` (and `Supervisor`!)

Dispatcher was one of the most significant changes from Cats Effect 2 to 3. In particular, it addresses a long-standing annoyance when working with effect types: the tongue-and-cheek termed "Soviet interop" case, where unsafe code calls you. In previous versions of Cats Effect, this scenario was handled by the ConcurrentEffect typeclass and the universally confusing runAsync method.

The way in which Dispatcher works is effectively as a fiber-level event dispatch pattern: a single fiber (the dispatcher) polls an asynchronous queue which contains IO[Any] values (the units of work), and when a new work unit is acquired, the dispatcher spawns a fiber for that unit and continues polling. This type of pattern is extremely general: it doesn't matter how long the work units need to complete, they cannot interfere with each other because each is proactively relocated to its own fiber.

Additionally, when CE3 was released, we weren't entirely certain how users wanted to use Dispatcher in practical applications. It was believed likely that most users would create a single top-level Dispatcher for their entire application, and thus the implementation of the event dispatch fibers was optimized with the assumption that a single Dispatcher instance would be under heavy concurrent load. These optimizations are fairly robust, but they do come with pair of costs: there is no guarantee of ordering between two sequentially-submitted work units (IO[Any] values), and every unit of work must pay the price of spawning a new fiber regardless of how long that work unit needs to execute. The former issue is well-exemplified by the following:

Dispatcher[IO] use { disp =>
  for {
    _ <- IO(disp.unsafeRunAndForget(ioa))
    _ <- IO(disp.unsafeRunAndForget(iob))
    // ... wait around for stuff…
  } yield ()
}

In the above, we submit ioa strictly before we submit iob, but iob may actually execute first! This creates a whole series of strange issues that users must account for in common Dispatcher scenarios, particularly when using it as a mechanism for inserting ordered items into Queue from impure event handlers. Accounting for this ordering issue often imposes significant overhead on user code, more than undoing the benefits of Dispatcher's own optimizations. Additionally, if ioa and iob are extremely cheap (e.g. q.offer(a)), the overhead of calling .start to create a wrapping fiber for each will exceed the total runtime of the operation itself. Fiber spawning is extremely cheap, but it's not as cheap as inserting into a queue!

For all of these reasons, Dispatcher has been adjusted to have two major modes: parallel and sequential. The previous default mode of operation corresponds to the parallel mode. When you aren't sure which to pick, select this one. The sequential mode adjusts Dispatcher's optimization mode for more localized usage (e.g. one per request, which is a common paradigm in practice), offers strong ordering guarantees (in the above example, ioa will run before iob, guaranteed), and much more efficient work unit execution (by removing the fiber wrapping). The danger is that units of work can interfere with each other, and thus sequential is not an appropriate mode for Dispatchers which are shared across an entire application.

If that weren't enough, Dispatcher has also received a new configuration option that applies to both parallel and sequential modes: await = true. In the above example, there is a deceptively annoying comment: // wait around for stuff…. Most people who have used Dispatcher in anger have received the dreaded dispatcher already shutdown error message. This happens when the use scope for the Dispatcher resource is closed before the work unit finishes. When this happens, Dispatcher invalidates its internal state, cancels all current work fibers, and shuts down. This is a very safe default, but as it turns out, this is often not what people want.

The general expectation is often that Dispatcher will simply wait for all outstanding work to finish before allowing the use block to terminate, rather than aggressively canceling all outstanding tasks. With the addition of the new await = true parameter, this is now possible. In 3.4.0, we can rewrite the above example in a more natural fashion, such that it has the guarantees we expect:

Dispatcher.sequential[IO](await = tru

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/GiganticMinecraft/SeichiAssist).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xMDIuMTAiLCJ1cGRhdGVkSW5WZXIiOiIzNS4xMDIuMTAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Lucky3028 · 2023-05-28T15:27:25Z

Catsをv3にするには、とてつもない破壊的変更を乗り越える必要があるので、それに対応する労力ができるまでは一旦クローズします

renovate · 2023-05-28T15:27:47Z

Renovate Ignore Notification

Because you closed this PR without merging, Renovate will ignore this update. You will not get PRs for any future 3.x releases. But if you manually upgrade to 3.x then Renovate will re-enable minor and patch updates automatically.

If you accidentally closed this PR, or if you changed your mind: rename this PR to get a fresh replacement PR.

chore(deps): update dependency org.typelevel:cats-effect to v3

68a76a0

renovate bot requested review from KisaragiEffective, kory33, Lucky3028 and rito528 May 28, 2023 15:26

Lucky3028 closed this May 28, 2023

renovate bot deleted the renovate/org.typelevel-cats-effect-3.x branch May 28, 2023 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): update dependency org.typelevel:cats-effect to v3 #2067

chore(deps): update dependency org.typelevel:cats-effect to v3 #2067

renovate bot commented May 28, 2023

Lucky3028 commented May 28, 2023

renovate bot commented May 28, 2023 •

edited

Loading

chore(deps): update dependency org.typelevel:cats-effect to v3 #2067

chore(deps): update dependency org.typelevel:cats-effect to v3 #2067

Conversation

renovate bot commented May 28, 2023

Release Notes

⚠️ Important note

Major Changes

async Cancelation Semantics

Integrated Timers

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

User-Facing Pull Requests

A Note on Release Cadence

Major Changes

High Performance Queue

Hardened Queue Semantics

More Robust Dispatcher (and Supervisor!)

Lucky3028 commented May 28, 2023

renovate bot commented May 28, 2023 • edited Loading

Renovate Ignore Notification

`async` Cancelation Semantics

High Performance `Queue`

Hardened `Queue` Semantics

More Robust `Dispatcher` (and `Supervisor`!)

renovate bot commented May 28, 2023 •

edited

Loading