The Haskell Concurrency Primitive Shootout

Published in

codeburst

2 min readJun 22, 2017

Recently I saw this tweet:

I took a look at System.Timeout and it doesn’t appear to utilize a global lock, but it does use two global IORefs and atomicModifyIORef. If you dig down deep enough into atomicModifyIORef‘s implementation, it does lead to some locking. Maybe that is causing contention (spoiler probably not) ?

To see if I could cause contention I wrote the following benchmark:

main = do
  ref <- newIORef 
  maxCap <- getNumCapabilities   defaultMain $ flip map [0 .. maxCap - 1] $ \n -> 
    bgroup (show (n + 1) ++ " threads") 
      [ bench "IORef" $ whnfIO $ do 
          xs <- forM [0 .. n] $ \i -> asyncOn i $ 
            replicateM_ 10000 $ do 
              b <- atomicModifyIORef ref $ \x -> 
                     let !x' = x + 1 in x' `seq` (x', ())
              return $! b 
          mapM_ wait xs

Running it with stack run benchmarks -oIORef.html I get :

So IORef does slowdown considerably as one increases the number of concurrent modifications.

Is this slow? How does it compare to the other Haskell concurrency primitives.

I extended the benchmark to include MVar and TVar. I also compared an alternative implementation of atomic IORef modification, atomic-primops‘s atomicModifyIORefCas, and threw in the AtomicCounter just for fun. Here are the results:

So what does it mean? Well MVar is the slowest. MVar is also always slower than TVars. This is surprising, since I would assume the overhead in TVar's transactional guarantees would make it slower. Here is a good theory on why this might be the case

The next surprising thing is atomicModifyIORefCAS is much faster than atomicModifyIORef. I don’t get this. Both atomicModifyIORefCAS and atomicModifyIORef are doing the same thing. They both call a compare and swap primitive and loop if the value is not the new value. Actually, I assumed atomicModifyIORef would be faster because it’s CAS loop is implemented in C — . Please look over my benchmarks here to ensure I’m not doing anything stupid.

Conclusion

Prefer TVar to MVar. STM is easier to use and it’s not clear there is necessarily a performance benefit to MVars (there was zero benefit in this test).

2. atomicModifyIORefCAS is considerably faster than atomicModifyIORef .

3. AtomicCounter is very fast and performs well under contention.

codeburst

The Haskell Concurrency Primitive Shootout

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in codeburst

Written by Jonathan Fischoff

No responses yet