Wednesday, June 29, 2011

Version Tolerant Serialization

Somewhere in the vast gap between version 1.1 and version 4 of the .NET Framework, Microsoft came up with a solution to the version intolerance problem of serialization. I may have been living under a rock for several years, because I hear it was actually new in version 2.0.

In the object-oriented .NET Framework, memory to represent the state of an object instance is only ever allocated for fields*. Not properties. Properties are syntactic sugar applied to methods invoked to access the state of the object, which is always stored in fields. If you want to serialize an instance's state - it's the fields that must be written to the wire. To deserialize something off the wire - you guessed it - the fields are the destinations of the wire values.

Consider: an assembly A that exposes one type T. Initially (going against my natural desire to start counting at 0) we label them A1 and T1. And T1 looks like this:
namespace A {
[Serializable]
public class T {
private int f;
public int F {
get { return f; }
set { f = value; }
}
}
}


Another developer, D1, takes a copy of the A1 assembly and writes a fantastic application with it, connecting via (*cough*) .NET Remoting to a server that also has a copy of A1. The developer's job is done, and he retires comfortably in the Bahamas, but not before losing all the source code (and forgetting where it was even deployed).

Meanwhile, somebody working on the server team realizes that two ints are better than one, and that he can make the server even better if only he could add another int field G to type T.

Here's where the fun starts.

Prior to .NET 2.0, changing the fields of T would introduce a breaking change. Clients who only had access to A1's T1 would be unable to deserialize an instance of A2's T2, nor would they be able to serialize A1's T1 into the format required by the server (A2's T2). What they wished for (and Microsoft gave them) was:
namespace A {
[Serializable]
public class T {
private int f;
[OptionalField(VersionAdded = 2)]
private int g;
public int F {
get { return f; }
set { f = value; }
}
public int G {
get { return g; }
set { g = value; }
}
}
}

This allows the server to load A2 and serialize T2 down to the wire (and deserialize T1 off the wire).
It also allows the client to load A1 and serialize T1 down to the wire (and deserialize T2 off the wire).
Unfortunately for the fictional company stuck using .NET 1.1 with no source code, they'd have to get someone to bring them up to version 2.0 of .NET before they could appreciate the benefit.

No comments: