What went wrong with the libdispatch. A tale of caution for the future of concurrency.

Background

Back in the mid 2000s processors performance started to plateau and chip makers like Intel told the world that the time of ever increasing the CPU clock speed was not going to be enough. They would not be able to meet Moore's law that way anymore but they had another way: to pack more cores onto the same chip. There was a catch of course, developers would need to update their software if they wanted to be able to take advantage of these many cores. Back then some believed that in 10 years time consumer machines with 80 cores (or even more) would be common. Fast-forward to today's 2020 and most consumer machines have about 4 cores and pro machines have about 8 to 12 cores. Something must have gone wrong along the way. Spoiler: multithreading is hard.

The good times

Apple responded in 2008 with the announcement of Mac OS X 10.6 Snow Leopard (considered by some to be the best Mac OS version ever released) which included the libdispatch (a.k.a. Grand Central Dispatch). I was there at WWDC 2008 when it was announced and we were all ecstatic, it was probably the most exciting WWDC I've attended (I would attend 5 other times after that). The libdispatch and the new inline block syntax were nothing short of amazing and offered the promise to finally easily access the power of multicore machines. Multicore machines have been available for a long time before that (dual processors really) but it was mainly used by pro apps such as Photoshop. In the 2000-2008 era, developers would generally only start multithreading their app when they had to, for example because a piece of work was long-running and would block the user-events run loop of the app for too long (causing the infamous spinning beach ball to appear).

Apple demonstrated the libdispatch and the promise seemed great, they introduced the notion of serial queues and told us that we should stop thinking in term of threads and start thinking in term of queues. We would submit various program tasks to be executed serially or concurrently and the libdispatch would do the rest, automatically scaling based on the available hardware. Queues were cheap, we could have a lot of them. I actually remember very vividly a Q&A at the end of one of the WWDC sessions, a developer got to the mic and asked how many queues we could have in a program, how cheap were they really? The Apple engineer on stage answered that most of the queue size was basically the debug label that the developer would pass to it at creation time. We could have thousands of them without a problem.

How would serial queues help us with concurrency? Well various program components would have their own private queue which would be used to ensure thread-safety (locks would not even be needed anymore) and those components would be concurrent between themselves. They told us these were "islands of serialization in a sea of concurrency".

The bad times

The future was multithreading and we had to use the libdispatch to get there. So we did.

Then the problems started. We ran into thread explosion which was really surprising because we were told that the libdispatch would automatically scale based on the available hardware so we expected the number of threads to more or less match the number of cores in the machine. A younger me in 2010 asked for help on the libdispatch mailing-list and the response from Apple at the time was to remove synchronization points and go async all the way.

As we went down that rabbit hole, things got progressively worse. Async functions have the bad habit of contaminating other functions: because a function can't call another async function and return a result without being async itself, entire chain calls had to be turned async. We started to have many async functions that actually made no sense being async, they were not executing long-running background tasks and they were not inherently async (like for example network requests are). We had to deal with the complexity of heavy callback designs which made everything harder to read. More worryingly async made our program a lot more unpredictable and hard to reason about: because every time we dispatched async we released the execution context until the work item completed, it was now possible for the program to execute new calls in an interleaved fashion in the middle of our methods. This led to all sort of very subtle and hard to debug ordering bugs. Worse, they were really hard to fix too and caused countless days of debugging, implementation tricks and hair pulling. Worse of all, we eventually realized we had terrible performance problems, turns out it's really wasteful to async small tasks and constantly dispatch on many queues. Which is a bit insane because the whole reason we started doing all this in the first place was to get better performance out of the cores, but we were actually doing worse off. Despite our most sincere efforts and actually having an extremely async program, we could still easily see between 30 and 60 threads running on a 4-cores machine during normal operation.

Turns out Apple engineers are developers just like us and met the exact same problems that we did. In Mac OS X 10.7 Lion they introduced the Security Transforms, a brand new async API to perform security operations (hashing, encryption, etc...). We were using it to decrypt files and it caused thread explosion in our program. It turns out that every security transform is backed by its own private queue causing way too many threads to be spawned. The API is now abandoned in favor of synchronous libraries such as CommonCrypto and the new CryptoKit.

An Apple engineer also revealed that a lot of the perf wins in iOS 12 were from daemons going single-threaded. Which means multithreaded code was written, maintained and shipped for a number of years in the OS itself until the engineers eventually realized that it was not working so well. The same engineer also recommended to "strongly consider not writing async/concurrent code". Yep, I know the feeling.

Interlude

I was half-joking on twitter that everything was better back when we had to use +[NSThread detachNewThreadSelector...] to do multithreading. This is not really true of course, the libdispatch has useful features and I do use it (with care). The reason I said that is that back then the upfront cost of doing multithreading was higher for the programmer (things like starting new threads, communicating across threads, etc...). The consequence was that developers would stop and think hard about whether it made sense to create threads, they would carefully think about their program design. The libdispatch kind of made it too easy, developers started dispatching left and right without really thinking anymore about what was actually going on in their software and hardware, it pushed developers away from careful design.

Up to now

It was not before 2017 that I eventually stumbled on a discussion (page 1, page 2) on the Swift mailing-list. Pierre Habouzit, the libdispatch maintainer at Apple at the time, attempted to explain to the Swift compiler engineers things I wish Apple had told us many years before (and to be fair Apple started backtracking and explained some of it in recent WWDC sessions but it didn't hit me at the time like this discussion did). I collected all the information I could find and published it here (go read that now if you haven't already). It turns out the solution was to carefully think of queues a lot more like if they were threads and to use async sparingly. For some reason Apple never updated the libdispatch API to make it harder to misuse and never updated the documentation to explain all this.

I applied the recommendations and it took some time but it was great. A lot of the code that never really needed to be async in the first place went back to being synchronous and it made all the difference. Things got much simpler, I could remove a ton of code that was just there to protect against the out-of-order interleaved calls that I mentioned earlier. Things were now async only when it actually made sense (like backgrounding long-running tasks or doing network requests). The program was more predictable and easier to read and reason about. The number of threads went down to a reasonable amount. It was faster too, a lot faster (this program received and handled various system events from a kernel extension so a lot of things were going on there).

It is now obvious that the original intent of the libdispatch failed. Developers do need to think hard about multithreading and need to carefully consider their program design. Every other OS and languages have tried their own variant of the libdispatch and from what I have read they all failed to some extent. It seems that multithreading is a hard problem after all that resists being made easy.

The future of concurrency in Swift

Now I'm a bit worried because I see all those shiny new things that Apple is planning to add into the Swift language and I wonder what might happen this time.

An "actor" is a new type of class which has its own internal private queue onto which its functions execute to ensure thread-safety, an actor's exposed functions can only be async. How does that help concurrency you may ask? Well you see, you can have many actors in your program and they execute concurrently between themselves, they are islands of serialization in a sea of concurrency. Now I don't know about you but to me it seems a hell of a lot like what we did all these years ago and that failed miserably. I'm not even sure actors can be made right because this very idea of using async everywhere to protect shared state is very problematic. If there is one thing that this whole story told me it is that async should be used with extreme caution and only when it really makes sense (and protecting shared state isn't one of those things). If you see an async interface and can't understand why it needs to be async then it probably shouldn't. I know there are discussions about suspending the actor's internal queue while awaiting in order to avoid the interleaved calls issue but to me this is a clear indication that the whole idea is misguided (I've been there already, we've done that, this is an unfortunate trick that makes deadlocks possible). As for the rest (async/await, structured concurrency) it's probably ok but lowers the cost of writing async code even more which is good if we want people to write more async code.

The most alarming thing to me is the long-lasting impact that the libdispatch has on our entire software ecosystem. 12 years after the introduction of the libdispatch I still see it being misused almost everywhere I look. I still see people tweeting and writing blog posts recommending practices that are now known to be problematic. I still see async code that shouldn't be and programs spawning way too many threads. I don't think this is ever going away. Now is probably the time to be extra-careful because of the long-lasting impact these APIs have on our software (including operating systems). Once things like actors are out, they are going to be used and there will be no stopping it. We should certainly wonder about the long-term impact and consequences.

2020, November 23rd

@tclementdev