Saturday, April 18, 2009

Synchronisation... is... slooooow...

a.k.a an argument for Shared Nothing. Continuing my !random post, I decided to see how long it would take to build up a cumulative normal distribution using the values generated by System.Random.Next(). I used and timed the following configurations:
1) Four threads on four cores sharing a Random (with thread synchronisation): 180 seconds.
2) One thread on one core with its own Random: 63 seconds
3) Four threads on four cores, each with their own Random (sharing nothing): 17 seconds.
The lesson here is that synchronisation is slow. So slow in this case, that it was actually faster to single-thread the application. However, when each thread was given its own object that it didn't have to share with the others, the speed increase was dramatic. Not only was it massively faster, but I'd put my money on it scaling pretty well with an 8, 16, 32 or even 64 way server.

Out of curiosity, I also tried this configuration:
4) Four threads on four cores, each with their own Random, locking it unnecessarily: 39 seconds.
I found it interesting that even uncontended locks could halve the speed of my algorithm.

The moral of the story is to think about your locking strategy (hint: avoid putting yourself in the position where you need one) when looking to parallelize tasks.

!random

System.Random's Next() method isn't guaranteed to be thread-safe, so if you start calling it from multiple threads at the same time, you're likely to end up with very little entropy indeed! I cooked up a static class, with a static System.Random field, and set 4 threads in motion calling Next() continuously. Each thread got its own core to run on, and very soon the only "random" number returned from Next() was 0.0 - the object had been well and truly corrupted. At this point I needed to choose: a single System.Random protected by lock() statements, or multiple System.Random objects. If I chose the single route (why?) all the synchronization would slow me down, and I'd end up not using each core to its fullest potential. If I chose the multiple route (only as many System.Random objects as there were threads), I would need to seed each one with a different value, otherwise they could - if created at the same time - return the same series across more than one System.Random object.

Interestingly, if I called the overloaded version of Next(int min, int max), soon the only return values would be min.