The Haskell Concurrency Primitive Shootout
Recently I saw this tweet:
I took a look at System.Timeout
and it doesn’t appear to utilize a global lock, but it does use two global IORef
s and atomicModifyIORef
. If you dig down deep enough into atomicModifyIORef
‘s implementation, it does lead to some locking. Maybe that is causing contention (spoiler probably not) ?
To see if I could cause contention I wrote the following benchmark:
main = do
ref <- newIORef
maxCap <- getNumCapabilities defaultMain $ flip map [0 .. maxCap - 1] $ \n ->
bgroup (show (n + 1) ++ " threads")
[ bench "IORef" $ whnfIO $ do
xs <- forM [0 .. n] $ \i -> asyncOn i $
replicateM_ 10000 $ do
b <- atomicModifyIORef ref $ \x ->
let !x' = x + 1 in x' `seq` (x', ())
return $! b
mapM_ wait xs
Running it with stack run benchmarks -oIORef.html
I get :

So IORef
does slowdown considerably as one increases the number of concurrent modifications.
Is this slow? How does it compare to the other Haskell concurrency primitives.
I extended the benchmark to include MVar
and TVar
. I also compared an alternative implementation of atomic IORef modification, atomic-primops
‘s atomicModifyIORefCas,
and threw in the AtomicCounter
just for fun. Here are the results:

So what does it mean? Well MVar
is the slowest. MVar
is also always slower than TVar
s. This is surprising, since I would assume the overhead in TVar
's transactional guarantees would make it slower. Here is a good theory on why this might be the case
The next surprising thing is atomicModifyIORefCAS
is much faster than atomicModifyIORef
. I don’t get this. Both atomicModifyIORefCAS
and atomicModifyIORef
are doing the same thing. They both call a compare and swap primitive and loop if the value is not the new value. Actually, I assumed atomicModifyIORef
would be faster because it’s CAS loop is implemented in C — . Please look over my benchmarks here to ensure I’m not doing anything stupid.
Conclusion
- Prefer
TVar
toMVar
. STM is easier to use and it’s not clear there is necessarily a performance benefit toMVar
s (there was zero benefit in this test).
2. atomicModifyIORefCAS
is considerably faster than atomicModifyIORef
.
3. AtomicCounter
is very fast and performs well under contention.