Friday, March 06, 2009

Goodput - Season Finale

I thought I'd take a look at a couple of different (and by no means an exhaustive list of) options for transferring a reasonably large file across a network. Over the past couple of days I tried sending a 700MB DivX using the .NET Remoting API (over both TCP and HTTP), the .NET Sockets API (over TCP), and finally using a mounted network share, reading the file as if it was local.

The table that follows shows the results of these tests:

I'd advocate taking pinch of salt when interpreting the numbers.

In general, remoting doesn't support the C# compiler generated closures it emits when it compiles an iterator block (e.g. the yield return keyword): quickly remedied by exposing the IEnumerator<T> as a remote MarshalByRefObject itself, wrapping the call to the iterator block. This gave us a nice looking (easy to read) interface, but will have increased the chattiness of the application, as every call to MoveNext() and Current would have required a network call. Further to this, the default SOAP serialization used with HTTP remoting doesn't support generic classes, so I had to write a non-generic version of my Streamable<T> class.

The performance of the HTTP/SOAP remoting was abysmal and there was very little gain by switching to a faster network. Even with what I suspect to be a massively chatty protocol (mine, not theirs), the bottleneck was probably somewhere else.

TCP remoting was next up. Under the covers it will have done all the marshalling/unmarshalling on a single TCP socket, but the chatty protocol (e.g. Current, MoveNext(), Current, MoveNext() etc.) probably let it down. TCP/Binary remoting's performance jumped 2.5x when given a 10x faster network, indicating some other bottleneck as it still used just 16% of the advertised available bandwidth.

CIFS was pretty quick, but not as quick as the System.Net.Sockets approach. Both used around 30% of the bandwidth on the Gigabit tests, indicating that some kind of parallelism might increase the utilization of the network link. An inverse-multiplexer could distribute the chunks evenly (round-robin) over 3 sockets sharing the same ethernet link, and a de-inverse-multiplexer (try saying that 10 times faster, after a couple of beers) could put them together.

Back on track...

Seeing as TCP/Binary remoting was the problem area that drove me to research this problem, I thought I'd spend a little more time trying to optimise it - without changing the algorithm/protocol/interface - by parameterizing the block size. The bigger the block size, the fewer times the network calls MoveNext() and get_Current have to be made, but the trade-off is that we have to deal with successively larger blocks of data.

What the numbers say: transmission rate is a bit of an upside down smile; at very low block sizes the algorithm is too chatty, at 4M it's at its peak, and beyond that something else becomes the bottleneck. At the 4M peak, the remote iteration invocations would only have been called 175 times, and the data transfer rate was 263Mb/s (roughly 89% of the observed CIFS' 296Mb/s).

No comments: