Remember that we called
Map()
once* for every book. We now need to call Reduce once* for every distinct word. public void Reduce(string key, IEnumerable<byte> values)
{
long total = 0;
foreach (byte value in values)
total += value;
Emit(key, total);
}
Take a look at the intermediate results:
SLICE:0
WORDS:[("", ),("", )]
SLICE:1
WORDS:[("", ),("", )]
We're only going to call
Reduce()
once* per distinct word, but the key "the" exists in two of our slices. Both of those slices need to be logically joined before Reduce()
can be called, with the combined intermediate values from both slices. Seeing as we're not using a framework, we need to manually set up some sort of routing that will ensure matching keys are joined. This is why Reduce()
takes a key and a list of values as its input.We used 5
Mapper
objects, but there's no requirement for the same number of Reducer
s. Because we don't care about optimum performance yet, I've chosen to use 3. In real life, we would tune the number based on the hardware that's available, and the amount of data that needs to be pushed around to complete the process.Here's a simple routing function that partitions words based on their initial letter, always routing matching words to the same reducer:
if (word[0] <= 'e')
// route to reducer 0
else if (word[0] <= 'r')
// route to reducer 1
else
// route to reducer 2
As it turns out, the naive routing function will split our data like this:
Reducer 0 | Reducer 1 | Reducer 2 |
---|---|---|
a | feathers | sticking |
and | first | the |
beginning | god | till |
butt | heaven | up |
cared | in | which |
chicken | make | who |
created | not | would |
crossed | or | years |
does | reported | you |
down | road | your |
earth | ||
egg |
The program continues to execute, this time calling
Emit()
once for every word, and its final (total) count. The behaviour of Emit()
is not concrete, but remember that it's (probably) being called on different physical machines, and the final results - once emitted - need to be routed back to the start.
No comments:
Post a Comment