Thursday, October 27, 2011

Anaglyph

Red/cyan 3D anaglyphs are a mix of lo and hi-technology. Let me explain: red objects are red because that's the color of light they reflect; all other colors are absorbed. There are three "primary colors" of light - red, green and blue. So when a transparent sheet of red film absorbs all "other" colors of light, we really just mean it absorbs green and blue.

When we mix 100% green with 100% blue we get cyan. A transparent sheet of cyan film therefore allows green and blue light, yet absorbs red light. If we take a digital photograph (composed of red, green and blue light) and view it through a transparent cyan film, the film absorbs the red light allowing only green and blue to pass. In theory, if I have two photographs and I remove all the red from one, then I view both through transparent cyan film, I won't be able to tell the images apart. One's image emits no red light; the other's red light is blocked by the "filter" - in neither case does any red light reach the viewer.

What if we view an image with only red through our cyan filter? There is no green and no blue light in the image, and all red light gets absorbed by the filter. The image will appear black.


LEFT RIGHT
IMAGE (red only) (green/blue only)
FILTER (red) (cyan)
EYE (left only) (right only)



Now, put on an imaginary pair of red/cyan glasses and look at two digital photographs side by side. Close your left eye (red filter) and look with your right eye (cyan filter). Remember, an image with no green or blue appears black through cyan, so we remove these colors from the left image. An image with only green and blue appears in green and blue (there's no point in having red in the image as it would be absorbed by the filter anyway - plus it would destroy what comes next). With only your right eye, the left image is black and right image is visible. Now, close your right eye (cyan filter) and look with your left eye. The right image has gone black because it doesn't have any red - the green and blue have been absorbed by the red filter. The left image springs to life now, because it has red in it. We have found a way to show each eye a different image - and our brains do the rest.

Sunday, September 18, 2011

Diskette

I was emptying out a bunch of junk from my apartment and came across an old 3.5" floppy disk. My mind - being as it is - immediately wondered what the unit tests would look like if I were writing a virtual floppy disk (because nobody's got real floppy drives these days.)

On my Diskette class, I would expose the following methods to modify/access state:
  • bool ToggleDiskAccessSlot()
  • bool ToggleWriteProtect()
  • void RotateDisk(int numSectors)
  • int Read(int cylinder, int side)
  • void Write(int cylinder, int side, int value)

Using Roy Osherove's naming strategy (a la accepted answer to this question), I'd add a unit test class named DisketteTests. Unit tests - IMHO - are intended to ensure that public methods modify the internal state of an object as expected. A test first sets up the object under test to a known state, then invokes a method, and finally makes an assertion about the state.

I might want to test the ability to read or write while the disk access slot is closed (I'd expect some exception to be thrown, so there's no explicit calls to Assert in the method body, the assertion is made by the test framework based on the test's custom attributes):
[ExpectedException(typeof(InvalidOperationException))]
[TestMethod]
public void Read_DiskAccessSlotIsClosed_ThrowsException()
{
Diskette d = new Diskette();
// intentionally missing the step to ToggleDiskAccessSlot();
Ignore(d.Read(1, 1));
}

A more standard test might look like this:
[TestMethod]
public void Read_DiskAccessSlotIsOpen_GetsCurrentValue()
{
Diskette d = new Diskette();
if (!d.ToggleDiskAccessSlot())
{
Assert.Fail("Unable to open disk access slot");
}
int value = d.Read(1, 1);
Assert.AreEqual(value, 0);
}

Saturday, September 10, 2011

Memoize

The first time I saw this word I thought someone had mispelled it. Instead - as I later found out - it's the name for a common pattern in functional programming. Memoize caches the result of a function based on its input parameters. That may not sound particularly special, but in cases where a solution is recursive and frequently calls itself with the same parameters, memoize can turn an algorithm that takes billions of years to run[1], into one that takes a fraction of a second.

Consider this problem outlined on ROT13(Cebwrpg Rhyre). Every node in the triangle has an associated maximum cost. For the elements at the base level of the triangle, that value is the value of the element. For the element at the next highest level, the maximum cost is the value of the element itself, plus the greater value of the two elements below it. This property holds recursively throughout the entire graph, and we can use the memoize pattern to our advantage: instead of working out the cost of each element every time our algorithm requires it, we simply fetch it from the cache (if it doesn't exist, we add it to the cache, and use it on all subsequent calls.)

The original brute force C# code looked like this:
static int MaxCost(Triangle triangle, int x, int y)
{
if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MaxCost(triangle, x, y + 1);
int right = MaxCost(triangle, x + 1, y + 1);
return Math.Max(left, right) + triangle[x, y];
}
}

We need to rewrite the method body slightly because it's self-recursive, and we can't take advantage of the memoize cache if our function just calls back into itself directly.

If we're pressed for time, we might simply code the cache into the method body:
static int MaxCost(Triangle triangle, int x, int y)
{
var key = new Tuple<Triangle,int,int>(triangle, x, y);
if (cache.ContainsKey(key))
{
return cache[key];
}

if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MaxCost(triangle, x, y + 1);
int right = MaxCost(triangle, x + 1, y + 1);
int result = Math.Max(left, right) + triangle[x, y];
cache.Add(key, result);
return result;
}
}

With more time on our hands, we might separate the function into two distinct logic functions, one that deals with memoization, and the other that deals with the problem at hand:
static int MemoizedMaxCost(Triangle triangle, int y, int x)
{
var key = new Tuple<Triangle, int, int>(triangle, x, y);
if (cache.ContainsKey(key))
{
cache[key];
}
int result = InternalMaxCost(triangle, y, x);
cache.Add(key, result);
return result;
}

static int InternalMaxCost(Triangle triangle, int x, int y)
{
if (y == triangle.Depth)
{
return 0;
}
else
{
int left = MemoizedMaxCost(triangle, x, y + 1);
int right = MemoizedMaxCost(triangle, x + 1, y + 1);
return Math.Max(left, right) + triangle[x, y];
}
}

Note that we now have a pair of mutually recursive functions: the memoized function calls the internal one to evaluate any uncached values, and the internal one calls the memoized one to gain any of the performance benefits from memoizing. A match made in heaven?

Monday, September 05, 2011

Covariance / Contravariance

This is a little experiment to get your head around co- and contravariance, as applied to generic interfaces in C#. Covariance allows the result of a computation to vary, while Contravariance allows the input to vary. There are three interfaces in the code sample. The first demonstrates covariance by emulating a function that only returns a result. The second demonstrates contravariance by emulating an action that only takes an input. The third demonstrates both co- and contravariance. Let's look at the code:
#region Covariance
interface IFunction<out TResult>
{
TResult Invoke();
}

class Function<TResult> : IFunction<TResult>
{
public TResult Invoke()
{
throw new NotImplementedException();
}
}
#endregion

#region Contravariance
interface IAction<in T>
{
void Invoke(T arg);
}

class Action<T> : IAction<T>
{
public void Invoke(T arg)
{
throw new NotImplementedException();
}
}
#endregion

#region Both
interface IFunction<in T, out TResult>
{
TResult Invoke(T arg);
}

class Function<T, TResult> : IFunction<T, TResult>
{
public TResult Invoke(T arg)
{
throw new NotImplementedException();
}
}
#endregion

None of the methods are implemented, because a) I'm lazy and b) you don't need to see an implementation to witness the co- or contravariance first hand. The next code block shows the phenomena at work:
IFunction<object> f1 = new Function<string>();
IAction<string> a1 = new Action<object>();
IFunction<string, object> f2 = new Function<object, string>();

Notes:

  1. In the case of covariance, the generic argument type of the class is more specific than the interface (covariance: we can always implicitly cast from string to object when reading from the container).
  2. In the case of contravariance, the generic argument type of the interface is more specific than the class (contravariance: we can always implicitly cast from string to object when writing to the container).
  3. In the mixed case, we can see covariance for the output and contravariance for the input.

Monday, August 15, 2011

Real World Scalability

Armed with [edit: never quite] all there is to know about solving scalability problems, it's just short of impossible to walk onto a client site and fix their poorly performing application. Changes to the system require an understanding of what the system does and it's intended interactions with other systems. This complex hierarchy of dependencies starts with lines of code (hopefully encapsulated in object-oriented classes if it's C#) and quickly jumps from other processes on the same machine to distributed processes owned by other teams (especially if SOA has been employed). How do we know when we've broken some undocumented feature? Unit tests? Hardly anybody uses them. Integration tests? Does your team even know what they are? Who writes these tests, and when?

If the tests don't already exist, it will be much more difficult to determine that a scalability fix hasn't negatively altered the required behaviour of a process. But in a time when the bottom line is the final word and employers are encouraged to differentiate developers by price (for example: choosing a large off shore team) it's less common to see the tests written up front.

Friday, July 29, 2011

Map Reduce Example in F#

A while back I blogged about the canonical map reduce example (as seen in the Hadoop user manual) of counting words. Today I noticed that an alias for F#'s List.fold function is List.reduce*. I had already seen List.map, so I put two and two together, and pretty soon I had a tiny program up and running in F# Interactive. Enjoy:
/// maps an input into a tuple of input and count
let mapper x =
(x, 1)

/// reduces a list of input and count tuples, summing the counts
let reducer (a:(string * int) list) (x:string * int) =
if List.length a > 0 && fst (List.head a) = fst x then
(fst (List.head a), snd (List.head a) + snd x)::List.tail a
else
x::a

/// maps the mapper function over a list of input
let map xs =
List.map (mapper) xs

/// maps the reducer function over a list of input and count tuples
let reduce (xs:(string * int) list) =
List.fold (reducer) [] xs

/// maps the input, sorts the intermediate data, and reduces the results
let mapReduce xs =
map xs
|> List.sort
|> reduce

It turns out the functional programming appears to be made for this pattern!

Monday, July 25, 2011

FaultException<T>

If you're stuck trying to get WCF to play nicely with a FaultException<T> where T is a custom System.Exception-derived type, you may be interested to know that the exception needs to have this protected constructor in order to work as expected:
[Serializable]
public class NotFoundException : ApplicationException {
public NotFoundException() {
}

protected NotFoundException(SerializationInfo info, StreamingContext context)
:base(info, context) {
}
}
Without it, WCF will not deserialize your custom exception properly, and you'll (at best) be able to catch a non-generic FaultException (with the rather unhelpful reason of "The creator of this fault did not specify a Reason."

Sunday, July 24, 2011

Monad

Don't expect a great answer from me as to what constitutes a monad; there's already a good question/answer on Stack Overflow.

Basically, it's a bit of glue that joins two functions together. I decided to write one for myself in C# to see what it would look like without any syntactic sugar that other languages might provide:

public static class Binding {
public static Func<TInput, TOutput> Bind<TInput, TIntermediate, TOutput>(Func<TInput, TIntermediate> left, Func<TIntermediate, TOutput> right) {
return arg => right(left(arg));
}
}

It's designed so that the output of left "flows" into the input of right. The output of the rightmost function is returned to the caller as the result of the expression. Here are two functionally equivalent examples (one written normally, and the other with monads):
Func<string, IEnumerable<char>> js = s => s.Select(c => c - '0').Where(i => i % 2 == 0).Select(i => (char)(i + 'a')).Select(c => Char.ToUpper(c));

Func<string, IEnumerable<char>> ks = Binding.Bind<IEnumerable<char>, IEnumerable<int>, IEnumerable<char>>(
k1 => k1.Select(c => c - '0'),
Binding.Bind, IEnumerable<int>, IEnumerable<char>>(
k2 => k2.Where(i => i % 2 == 0),
Binding.Bind<IEnumerable<int>, IEnumerable<char>, IEnumerable<char>>(
k3 => k3.Select(i => (char)(i + 'a')),
k4 => k4.Select(c => Char.ToUpper(c)))));

Check that they work as expected:
Debug.Assert(ks("0123456789").SequenceEqual(new char[] {'A','C','E','G','I'}));

Wednesday, June 29, 2011

Version Tolerant Serialization

Somewhere in the vast gap between version 1.1 and version 4 of the .NET Framework, Microsoft came up with a solution to the version intolerance problem of serialization. I may have been living under a rock for several years, because I hear it was actually new in version 2.0.

In the object-oriented .NET Framework, memory to represent the state of an object instance is only ever allocated for fields*. Not properties. Properties are syntactic sugar applied to methods invoked to access the state of the object, which is always stored in fields. If you want to serialize an instance's state - it's the fields that must be written to the wire. To deserialize something off the wire - you guessed it - the fields are the destinations of the wire values.

Consider: an assembly A that exposes one type T. Initially (going against my natural desire to start counting at 0) we label them A1 and T1. And T1 looks like this:
namespace A {
[Serializable]
public class T {
private int f;
public int F {
get { return f; }
set { f = value; }
}
}
}


Another developer, D1, takes a copy of the A1 assembly and writes a fantastic application with it, connecting via (*cough*) .NET Remoting to a server that also has a copy of A1. The developer's job is done, and he retires comfortably in the Bahamas, but not before losing all the source code (and forgetting where it was even deployed).

Meanwhile, somebody working on the server team realizes that two ints are better than one, and that he can make the server even better if only he could add another int field G to type T.

Here's where the fun starts.

Prior to .NET 2.0, changing the fields of T would introduce a breaking change. Clients who only had access to A1's T1 would be unable to deserialize an instance of A2's T2, nor would they be able to serialize A1's T1 into the format required by the server (A2's T2). What they wished for (and Microsoft gave them) was:
namespace A {
[Serializable]
public class T {
private int f;
[OptionalField(VersionAdded = 2)]
private int g;
public int F {
get { return f; }
set { f = value; }
}
public int G {
get { return g; }
set { g = value; }
}
}
}

This allows the server to load A2 and serialize T2 down to the wire (and deserialize T1 off the wire).
It also allows the client to load A1 and serialize T1 down to the wire (and deserialize T2 off the wire).
Unfortunately for the fictional company stuck using .NET 1.1 with no source code, they'd have to get someone to bring them up to version 2.0 of .NET before they could appreciate the benefit.

Tuesday, June 21, 2011

Ambient Transactions

With a title like that, you'd be forgiven for thinking that the post was going to be about purchasing a beer at Cafe del Mar while watching the sun set. Unfortunately not.

First, consider this sample block of code:
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand(commandText, connection))
{
// do something
}


There is no mention of any transactions; if you're like me you'd think that two things happen by the end of the block:
  • SQL Server doesn't still hold any locks for the data accessed in the block, and

  • The connection was returned to the pool for somebody else to use.

Wrong on both counts.
See what happens if the block was wrapped in another block (even a couple of frames higher on the stack):
using (TransactionScope transactionScope = new TransactionScope())
{
// substitute original block here
}

Although nothing in our inner block explicitly references any transactions, an ambient transaction (i.e. one on the current thread) has been setup by our outer block and SQL Server enlists in this transaction. At the point the inner block completes, the transaction is incomplete; although the connection is returned to the connection pool, it's in a cordoned off section of the pool where it cannot be used by any other thread that's not sharing the same transaction.

Let's imagine we set Max Pool Size=1 in our connection string. This means we have 1 connection in the pool, but it's only available to the ambient transaction. If we try to obtain another connection from the pool from a different thread with no transaction or a different transaction (even a different ambient transaction) we would timeout waiting. If, instead, we repeated the inner block twice within the outer block, it would be fine: the second acquisition of a connection from the connection pool would grab the one with the (still open) ambient transaction. If we shared our transaction with another thread, we'd be able to aquire that same connection from the connection pool too.

Here's a fun exercise for the imagination: Set Max Pool Size=2; then open two connections concurrently within the same TransactionScope. You'll automatically have enlisted in a distributed transaction (not just the lightweight transaction outlined in the first part of the post). Hey presto, you're sharing a transaction across more than one SQL connection into the same server!

There are several points to take away from this post:
You can force connections not to enlist in ambient transactions by using the Enlist=false connection string property.
You can implicitly participate in transactions (whether lightweight or distributed) even when you're not expecting to.
The connection pool is slightly more complex than at first it seemed - even returning a connection to the pool doesn't guarantee its availability for other operations.
Locks can remain held long after the connection has been disposed (returned to the pool)

For more information

Sunday, June 12, 2011

External Sort

If you're ever tasked with sorting a data set larger than can fit into the memory of a single machine, you shouldn't need to panic. Put on your outside-the-box hat (pun intended) and get to work.

First of all, divide the data into blocks that are small enough to be sorted in memory, then sort them and write the results to disk (or network, or anywhere external to your process). If memory size is M, and total data to be sorted is MxN then you should now have N blocks of locally sorted data.

Next, do an N-way merge. I did it by getting N buffered readers over the N blocks. By continually getting the next lowest value from the pool of N readers (it's easy if you continually sort the readers by last obtained value in ascending order) and writing the obtained values into an output file, you will end up with N globally sorted blocks of data.

For most people attempting a very large sort, this is usually the end result (I am assuming not a lot of people have this requirement very often, and even less frequently is it their first time at seeing it.) If you're left wanting, however, you must continue...

For large values of N it first becomes prohibitive to buffer the input, then to even access one value for every N at the same time. In this case, you will need to perform a second (or higher) pass, working on less than N blocks at a time. A 32-bit Windows machine should only begin to approach this next hurdle somewhere after the 10's of terabytes mark (depending of course, in the size of the objects being sorted)...

Asynchronous ASP.NET MVC

Since ASP.NET MVC 2, Microsoft's thrown the AsyncController class into the framework, enabling asynchronous ASP.NET MVC applications without forcing developers to hand-craft their own weird and wonderful solutions. The AsyncController exposes an AsyncManager property, which allows you to increment/decrement the number of outstanding operations, and collect arguments to pass through to the XxxCompleted method when all operations are complete. To use the said controller, do this:

Derive your controller from System.Web.Mvc.AsyncController, which is treated differently by ASP.NET, and allows you access to the AsyncManager.

For each logical asynchronous method you need, provide a pair of methods that follow a set naming convention, where the method name prefix matches the action name:
1) Begin method name suffix is Async and return type is void
2) End method name suffix is Completed, parameters match those set up in the AsyncManager, and return type is an ActionResult.

For example:
public void IndexAsync()
{
ViewData["Message"] = "Welcome to Asynchronous ASP.NET MVC!";
AsyncManager.OutstandingOperations.Increment();
WebService.WebService webService = new MvcApplication1.WebService.WebService();
webService.HelloWorldCompleted += (sender, e) =>
{
AsyncManager.Parameters["greeting"] = e.Result;
AsyncManager.OutstandingOperations.Decrement();
};
webService.HelloWorldAsync();
}

public ActionResult IndexCompleted(string greeting)
{
ViewData["AsyncMessage"] = greeting;
return View();
}


In case the execution flow doesn't appear obvious: on receipt of a request for the Index action, ASP.NET uses reflection to find the method pair with the prefix Index and invokes the first half of the method pair (IndexAsync) on its thread pool. The method implementation declares one asynchronous operation to the AsyncManager. We use the Event-based Asynchronous Pattern to call a demo ASMX web service asynchronously from this client - the Completed event handler sets a parameter value and decrements the number of outstanding operations. ASP.NET waits for the number of outstanding operations to reach zero, then looks for an IndexCompleted method with a string parameter named "greeting" (because this is what we called the parameter when we assigned the result on the AsyncManager, during the web service completed event handler). It invokes it (the second half of the method pair) and the rest - they say - is history.

Friday, June 10, 2011

Asynchronous ASMX Web Services

To avoid threads from blocking in an ASMX web service, Microsoft have given us a nifty pattern to employ: instead of declaring your method with a signature like this
[WebMethod]
public ReturnType Test(ArgumentType arg);
you declare a pair of methods. The first is prefixed with Begin, returns an IAsyncResult, and takes an additional AsyncCallback and some state.
The second is prefixed with End and takes an IAsyncResult. When you provide these methods, ASP.NET treats them as a pair and ensures they are called at the right times.
The idea is that you can spawn as much asynchronous work as you like in the Begin method, then when you're ready, invoke the asyncCallback passed into the Begin method. This signals ASP.NET to call your End method, which is responsible for returning the final result of the function pair.

[WebMethod]
public IAsyncResult BeginTest(string user, AsyncCallback asyncCallback, object state)
{
SqlConnection connection = new SqlConnection(@"Server=???;Async=true");
SqlCommand command = new SqlCommand(string.Format(@"WAITFOR DELAY '00:00:04' SELECT '{0}'", user), connection);
connection.Open();
return command.BeginExecuteReader(asyncCallback, command);
}


[WebMethod]
public string EndTest(IAsyncResult asyncResult)
{
SqlDataReader reader = null;
SqlCommand command = (SqlCommand)asyncResult.AsyncState;
try
{
reader = command.EndExecuteReader(asyncResult);
do
{
while (reader.Read())
{
return reader.GetString(0);
}
} while (reader.NextResult());
throw InvalidOperationException("No results returned from reader.");
}
finally
{
if (reader != null)
reader.Dispose();
command.Connection.Close();
command.Dispose();
command.Connection.Dispose();
}
}


In a service implemented asynchronously like this, 100 different clients could concurrently (and synchronously) execute calls to Test with the server efficiently allocating just 1 thread to service all the requests.
But there's a distinction: the server is asynchronous, yet the client calls are still synchronous by default (unless you skipped ahead and read the next bit).
That is to say, if a client with just one CPU core attempted to make 100 simultaneous calls simply by threading the requests, the throughput wouldn't be great, and there would be the overhead of having 100 threads context switching, garbage collecting etc.
It would be a far better option to make the calls asynchronously from the client too. After all, a web service call is I/O.

Visual Studio 2010 gives us an option when we generate the Web Service Reference - it's a check box titled "Generate asynchronous operations".
The proxy generated when this option is checked conforms to Microsoft's Event-based Asynchronous Pattern (EAP).
You subscribe to a completion event, and invoke the proxy method (which returns void). Once the response is ready, the event is raised and your callback invoked, at which point you get the result (or error).

This gives us an incredibly efficient way of working. With just one thread on the client, and one thread on our web server, (and potentially just one thread on the SQL Server in our simple "WAITFOR" example) we can make 100 calls in the same 4* seconds it takes to make just 1 call. We could probably stretch it to 1000 calls even! The point is that a blocked thread (usually) harms a system's performance, whether it's the end client, an intermediate server, or the end server crunching the numbers.

Wednesday, June 08, 2011

Asynchronous ASP.NET / SQL Server

We all agree that waiting is evil, right? Well, threads are evil too! Ok, we need a few threads to get the job done, but they (ideally) shouldn't ever block.

Achieving high concurrency has never been only about running operations on multiple threads. In fact, the best performance is when the thread count is a number very close to the hardware thread count[1].

.NET provides us with the Asynchronous Programming Model (APM) that gives us the ability to use non-blocking operations that might otherwise have blocked up a thread waiting for them to complete. It allows us to create as few threads as are absolutely necessary and to use them efficiently.

Let's say we have an ASP.NET web application (ignoring MVC until Part Two). The page will display a piece of data that takes 30 seconds to calculate, based on some input data. The calculation doesn't happen locally; in fact it's running on a cluster of big SQL Server boxes. We have 1000 users, and the database query that takes 30 seconds scales like magic (even at 2000 users, it still takes less than 45 seconds on average). Can we put the ASP.NET part of this application onto just one box? The answer is yes - I'll show you.

First of all, we want ASP.NET to treat our page differently to a regular ASP.NET page; when we begin our asynchronous operation, we want to signal ASP.NET that the thread we started with is now free for the next request. We also want ASP.NET to call us back when our result is ready. This is done by setting the following page attribute:
<%@ Page Async="true" ...

Our request will now be handled by an IHttpAsyncHandler and will now process in 4 phases instead of 1.
Synchronously:

  1. PreInit,Init,InitComplete,PreLoad,LoadComplete,PreRender,PreRenderComplete,SaveState,SaveStateComplete,Render

Asynchronously:

  1. PreInit,Init,InitComplete,PreLoad,LoadComplete,PreRender

  2. Begin

  3. End

  4. PreRenderComplete,SaveState,SaveStateComplete,Render

See what's happened here? Instead of synchronously performing phases 1 and 4 as a single operation (on the same thread) like we would do in a regular ASP.NET page, we break the long running task into 4 chunks, running each independently, and allowing .NET to efficiently allocate tasks to physical threads. That's correct: now we have no guarantee that Render will be called on the same thread as Init. And why should it? No reason, that's why! (However, there's every possibility it *might* be more efficient to use the same physical thread - yes, just one - if the server in question had only a single core CPU. The point is we shouldn't write code that expects this to have been the case). Also note that two new phases have been sandwiched in between 1 and 4: Begin (2) and End (3).
To use Begin and End, we need only to (write and) register our callbacks, and ASP.NET will ensure they're called.
protected void Page_Load(object sender, EventArgs e)
{
AddOnPreRenderCompleteAsync(
command_BeginExecuteReader,
command_BeginExecuteReader_AsyncCallback
);
}

We register (as many as we need) pairs of Begin and End methods that match the following signatures (both the Begin and End methods are called asynchronously, hence the async look and feel of the parameter lists):
public delegate IAsyncResult BeginEventHandler(object sender, EventArgs e, AsyncCallback cb, object extraData);

public delegate void EndEventHandler(IAsyncResult ar);


So far so good (hopefully), but all this would be pointless if SQL Server didn't also give us asynchronous connections that can call back into the thread pool when the command completes. To use it, set
Asynchronous Processing=true;
in the connection string and code up your operation to the APM pattern:
void command_BeginExecuteReader_AsyncCallback(IAsyncResult asyncResult)
{
SqlCommand command = (SqlCommand)asyncResult.AsyncState;
SqlDataReader reader = command.EndExecuteReader(asyncResult);
do
{
while (reader.Read())
{
Response.Write(/*do-something-with-the-reader*/);
}
} while (reader.NextResult());
reader.Dispose();
command.Connection.Close();
command.Dispose();
command.Connection.Dispose();
}

IAsyncResult command_BeginExecuteReader(object sender, EventArgs e, AsyncCallback cb, object extraData)
{
SqlConnection connection = new SqlConnection(@"Server=Scandium;Trusted_Connection=true;Asynchronous Processing=true;");
SqlCommand command = new SqlCommand(@"/*hectique-code*/", connection);
connection.Open();
return command.BeginExecuteReader(cb, command);
}


Did you spot the ASP.NET trick that allows this magic to work? See how our Begin method takes an AsyncCallback from the caller (ASP.NET)... well, we pass that callback to SQL Server instead of directly wiring up our End method as we might in a Windows application. That way, as soon as SQL Server is ready, ASP.NET gets notified (not us directly), which gives control to our page for us to handle the response, then when we're done (i.e. the End method call stack is unwound) ASP.NET moves onto the final phase to do it's PreRenderComplete etc.

This is how it's done. Simples. For the sake of clarity I've not bothered with error handling in the example. Resource management across threads makes C#'s using statement impossible, so implementers of IDisposable are dealt with in a slightly tricky way. Thrown exceptions would need special care too.

[1] I made this figure up, but it does seem to prove itself anecdotally quite a lot.

Sunday, February 27, 2011

Great Zero Challenge

Don't just throw your old hard disks out (especially if they're still inside a laptop, as this only makes it easier for casual analysis) without first wiping them clean. A long time ago, according to Internet folklore, somebody began the Great Zero Challenge: any professional firm to recover data from a disk written full of zeros would win $40. Not a princely sum, but it was never taken up. To protect your own disk, All You Need To Do(tm) is run this command to copy zeros from the /dev/zero device to your HDD that's going in the skip.

sudo ddrescue -n /dev/zero /dev/sdb zero_log.txt

Saturday, February 26, 2011

Supersize Me

The Mac mini is not a great beast; its 120GB hard drive is very easy to fill. Seeing as it's smaller than my iPod Classic and there's no more space on the computer, I needed to upgrade - I needed to supersize my Mac.

Ingredients:

  • 1x Mac mini
  • 1x instructional video
  • 1x 9.5mm x 2.5" SATA hard drive of suitable proportions (see mine)
  • 2x regular dinner knives (if you're not an art-school dropout with putty knife to hand)
  • 1x spare PC running ubuntu with 2x free SATA ports (you'll watch the video off this too!)


The bit in the middle of the video that's not covered:
Stop the video and shutdown
Take the incumbent drive and your new drive and plug them into the ubuntu box
Work out the device names of each drive and don't confuse their order in the next command
Run sudo ddrescue /dev/sdb /dev/sdc ./sdb_sdc_log.txt (substituting in the correct device names of your drives - you don't want to overwrite the good one with the blank one's zeroes) to copy the incumbent drive's image onto the new drive.
Shutdown when complete. You may want to disconnect the incumbent drive and put it somewhere safe. Boot again. Note that the new drive may have a different device name.
The new drive will be confused. Is it a tiny 120GB or a whopping 640GB? The GPT thinks it's small, which is incorrect. Trying to "print" the partition in parted will present you with the option of fixing this. Do it.
Run sudo parted /dev/sdb
GNU Parted 2.3
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Warning: Not all of the space available to /dev/sdb appears to be used, you can
fix the GPT to use all of the space (an extra 1015822080 blocks) or continue
with the current setting?
Fix/Ignore? Fix
Model: ATA SAMSUNG HM641JI (scsi)
Disk /dev/sdb: 640GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 20.5kB 210MB 210MB fat32 EFI system partition boot
2 210MB 120GB 120GB hfs+ Customer

Sweet. Now fire up the GUI that's found under System>Administration>Disk Utility
Create a new partition from the remaining space.
Shutdown.
Put the Mac mini back together following the steps in the video.
Boot your Mac mini
Resist the urge to open iTunes. Don't open it. Seriously. It will **** things up.
Use the system utilities to format the new partition in identical format to the existing 120GB partition. Name the old partition CODE and the new one DATA.
Move the contents of your Music folder to a new folder on DATA.
When it's complete, open a terminal window
Ensure your music is safely on DATA, and move to your user's home directory
sudo rmdir Music
sudo ln -s /Volumes/DATA/Music Music
You will now have a supersized hard drive, and a symbolic link that allows iTunes to access your music as if nothing has changed.
For the grand finale, open iTunes and play your favourite song.

Sunday, January 23, 2011

Mercator

There's a ship anchored in Oostende harbour named Mercator, after the Flemish cartographer Gerardus Mercator. I've seen it a few times and registered that there must be something vaguely interesting about it, but the light bulb in my brain never flicked on. It turns out that it did a lot of boring stuff (with the exception of bringing back two Moai from Easter Island) but the cartogopher after whom it was named is of much more interest to me lately: Google Maps uses a variant of the Mercator projection to produce a flat rectangular map of the Earth - an ellipsoid. The mathematics behind this projection can be seen at Wolfram.

I set myself a task. I have a Garmin GPSMap 62st: a great little device (if a little limited), but the utility I can derive from it is tightly coupled to the availability of maps. What's the point in knowing you're at 13°09′47″S 72°32′44″W if the map around is just empty black space? Armed with a 1024 x 1024 pixel map (the maximum tile size for the 62st) in Mercator projection, I isolated two points, in diagonally opposite corners, to get the most distance between them. Google Earth allowed me to drop markers and get the corresponding latitude and longitude co-ordinates for each pair of (x,y) pixels.

Degrees are easily converted into radians (and back again), and radians of latitude can easily be converted into the unitless measures of height used in the Mercator projection, while degrees of longitude don't need any conversion at all. If you look at a complete map of the earth, you'll see the areas surrounding the poles are most distorted. Anyway, long story short:

Given a pair of (N,E) coordinates and their respective (X,Y) positions on the raster map, it's pretty simple to work out what the left, top, right and bottom boundaries of the entire map are, and hence easy to create the KML and KMZ file required to correctly position a raster map (of the Mercator projection) on the 62st.

Longitude first because it's easiest: X2-X1 gives the pixel width between the two selected points on the raster map; X1-X0 gives the pixel padding to the left edge, and X3 - X2 gives the pixel padding to the right. We know the angle of longitude at E1 and E2, so we can work out a ratio of degrees to pixels using (X2-X1)/(E2-E1). Substituting the ratio into X3-X2 and X1-X0, and adding the correct starting point, we get the longitudinal boundaries.

Latitude has some tricky hyperbolic and trigonmetric functions (as seen on Wolfram): an angle of latitude N (in radians) is represented at height H (no units) above the equator. N1's height would be H1 = atan(sinh(N1)) and N2's height H2 = atan(sinh(N2)). The formula (Y2-Y1)/(N2-N1) gives us the ratio of pixels to "heights". It's simple to get to Y0 and Y3 from Y2 (and/or Y1) using just addition and subtraction. Also, we know Y1 and Y2's latitudes so it's simple to get the "height" of Y0 and Y3: H0 and H3. Getting back to radians from "height" is as simple as Y0 = log(tan(H0) + 1/cos(H0)). All that's left is to convert the answer from radians back into degrees. Simples.