Saturday, April 25, 2020

Binary vs Decimal, Part 2

Microsoft's .NET framework gives developers options for representing floating point numbers as well as integers. The same is true of Microsoft's SQL Server, though the options differ subtly.

Remembering from my earlier post that we can convert between base 10 (decimal) and base 2 (binary) integers with no loss of information, it shouldn't surprise you that .NET and SQL Server offer similar integer types with varying number of bits. Even though we're storing numbers that we entered as a sequence of digits (i.e. literals) in C# source or SQL script, they are converted into base 2 integers. However, this post is not about integers.

If, instead of integers, we wish to store fractions (e.g. negative powers of our chosen bases) we must ask a few questions, the first being:

Are we trying to represent a fractional number that has a base 10 representation? If so, we already know that the conversion from base 10 to base 2 cannot necessarily be done without losing information. We can very easily represent the decimal number 0.625 in binary because it's simply the sum of 0.500 (or 2-1) and 0.125 (or 2-3), but we cannot exactly represent the decimal number 0.99 because the number of base 2 powers required to add up to 0.99 would quickly exceed our chosen precision (the number of bits available to represent the number.) Instead, if wanted to store our decimal number in a binary field with a fixed number of bits, we would need to approximate the decimal.

0.989999949932098388671875
0.9900000095367431640625
0.990000069141387939453125

The three numbers above are actually sums of powers of 2 and can be stored in a 24 bit binary field without losing information[1]. Notice that the middle number is closest to 0.99.

But what if we wanted to represent the price of our chocolate: 34.99? In order to fit into a field of the same width we would need to adjust the largest exponent and make a corresponding adjustment to
smallest exponent. Our closest "sum of powers of 2" approximation might be:

34.990001678466796875

In binary this same number would look like

100010.111111010111000011

2524232221202-12-22-32-42-52-62-72-82-92-102-112-122-132-142-152-162-172-18
100010111111010111000011

We're ready to ask ourselves the second question: can our application tolerate the approximation in converting from base 10 to base 2? In applications where the inputs, intermediate values and outputs are not exact numbers to start with, (e.g. in the fields of science and statistics, to name a few) the answer is probably yes, but certain applications with exact inputs (e.g. in the field of finance) might strictly require us not to approximate the values. The next section presents a way to circumvent the approximation.

In this series' first post, we learned that we can convert numbers between all bases without loss of information, but only when the lowest power for both bases is b^0 or 1. Some smart engineers realised that if we pre-scale our number by some chosen scale so that it becomes an integer, then we can post-scale it when we're done operating on it. Going back to our example number of 0.99, we could pre-scale it by 10^2 (or 100) giving us 99, which converts exactly into the binary number 0110 0011. Now we're able to store the exact value in 7 bits when previously we could only store the approximation in 24 bits.

At this point SQL Server and .NET diverge:

SQL Server offers the DECIMAL data type, which has a fixed precision and scale. In other words, when you define a value of this type you explicitly provide the precision and scale (e.g. DECIMAL(26, 2). If we wanted to store the number 31.42 in this field, SQL Server would pre-scale it by 10^2, convert the number 3142 to binary, and store the binary representation.

.NET, on the other hand, offers the System.Decimal struct type which is a floating point number. The scale that's chosen to pre-scale any given decimal number is packed and stored along with the binary integer representation, and can be accessed to post-scale the integer back into a fraction when needed. The struct is 128 bits wide, and stores the integer part in 96 of those bits. 2^96 is approximately 7.92282E+28. We can create a System.Decimal with anywhere between 0 and 27 (or 28) digits and we choose where the floating decimal point will go.

It might be worth emphasizing that SQL Server's DECIMAL is not floating point. The distinction between fixed and floating point is probably best made by looking at where the scale value is stored.

Converting between SQL Server's DECIMAL and .NET's System.Decimal can take a little forethought.

[1] https://www.h-schmidt.net/FloatConverter/IEEE754.html
[2] https://docs.microsoft.com/en-us/sql/t-sql/data-types/decimal-and-numeric-transact-sql

No comments: