Please log in to watch this conference skillscast.
What do you do when your quest for power leads you to implementations which are not just platform- but processor- and even architecture-version-specific in nature? How do you even start tracking down a bug in a Scala-based implementation which is not only non-deterministic but only manifests on certain hardware? In this talk, we will dive into the wild and ill-understood world of CPU architecture, memory models, and JVM intrinsics (all through the lens of very high-level purely functional abstractions!) as we examine the story of the most convoluted and mind-bending bug hunt of my entire career.
Q&A
Question: Now I'm wondering how Rosetta on the M1 solves this problem.
Answer: They did something really clever! they basically embedded a version of x86's memory semantics within the M1 silicon! It's a separate CPU mode that Rosetta just… turns on - this is how the M1 can still be really fast even when emulating x86
Also I can confirm that, when running the Cats Effect 3 test suite on the Apple M1 using hte Azul VM, the bug does not manifest. So M1 in general is just very impressive work, they put a lot of effort into making it as easy as they could
Question: To be clear: Java 9 fixes this issue by giving better documentation, but the bug is on both 8 and 9? Or is the bug reproducible on Java 9?
Answer: yes! in fact the bug even manifests on Java 16. that was one of our very first thoughts: maybe it was just java 8 specific! but sadly no.
Question: I encountered a situation where cancelling and then joining a Fiber results in the program hanging:
object Main extends IOApp { override def run(args: List[String]): IO[ExitCode] = { val io = IO.sleep(2.seconds) for { _ <- IO(println("Start a fiber")) fiber <- io.start _ <- IO(println("Cancel the fiber")) _ <- fiber.cancel _ <- IO(println("Join the fiber")) _ <- fiber.join _ <- IO(println("Return success")) <---- never gets printed } yield ExitCode.Success } }
This program never outputs Return success
and runs forever. This
is on Cats Effect 2.1.3. Do you have any insight into this?
Answer: Yes so this is a very subtle issue with cats effect 2.
The problem is that join
in CE2 promises something of type
IO[A]
. But then… what can it give you if the fiber was
canceled? it never got an A.
The answer is that it just
deadlocks which is… exactly not what you want. In CE3, we changed
join
to produce an Outcome
, which lets it signal back
to you when cancelation took place.
Question: Will this change be introduced to CE2 also?
Answer: Unfortunately we can't without breaking binary
compatibility. So in general, I would recommend using Deferred
explicitly rather than relying on join
in CE2, in part for this
reason. It gives you more control over what happens.
Question: You mentioned that someone from Cats Effect identified the bug, how did they/you go about diving into it to figure out where the bug was and what tools did you use for it?
Answer: It was a pretty painful process. so the tools used
were a unit test suite where we had set the iteration count so that we could
reproduce it relatively consistently, sbt
, tmux
, an
EC2 Graviton instance, and a lot of patience.
Raas was the one who did the bulk of the work. Here's the reproduction: https://gist.github.com/RaasAhsan/8e3554a41e07068536425ca0de46c9e8
Question: I have a question which is somewhat related. I would still be interested to know if there are any performance implications of putting values only accessed from within an IO into the same IO or the surrounding scope. I.e.
val a = SomeBigArray(...) IO { doNetworkCall(a) } vs IO { val a = SomeBigArray(...) doNetworkCall(a) }
Answer: In general, two IO
s will be slower than
one. whether or not that matters is kind of an interesting question
that depends on what doNetworkCall
is doing.
The advantage to pulling it apart into two is 2-fold:
- You get a cancelation check between each one
- You get better composability (so you can refactor a little more easily and break it apart)
it's kind of a judgment call as to which you do. I always bias towards smaller
IO { ... }
blocks until I measure performance costs that make me
roll it back, and that's pretty rarely something that has to happen.
the PR review was hilarious
I think we added some comments, so it was more like a +20/-1 PR or something
Ref: https://github.com/typelevel/cats-effect/pull/1416
YOU MAY ALSO LIKE:
Journey to the Centre of the JVM
Daniel Spiewak
Principal Engineer
Disney Streaming Services