Sunday, January 13, 2019

Wake On Lan (WOL)

Wake On Lan (WOL) is possible on the ASUS M3A78-CM board, but it requires a level of finesse to get it working. First, the cryptically named BIOS option "Power > APM Configuration > Power on From S5 by PMEs" must be enabled. Then, it's important to note that WOL won't work when the power switch is off (i.e. SB_PWR LED is still lit, but the computer has been shut down). Instead, the machine needs to be merely suspended in order to pick up the "magic" WOL ethernet packet. You can suspend it with the command:

sudo systemctl suspend

You'll also need to decide what to do with the BIOS setting "Power > APM Configuration > Restore on AC Power Loss"... [Last State] won't bring it back to suspended after an AC power loss, but it will bring it back to running if the server was running when the power went off. If the power goes out when you're suspended, then you're up a certain creek without a paddle.

On waking, some network drivers need to be reloaded. Check my github's private ubuntu/WOL repository for the script that does this, or follow the steps in this accepted answer: ubuntu 18.04 - ethernet disconnected after suspend

To send a WOL packet, install and run etherwake from some other machine in the same subnet:

sudo apt install etherwake
sudo etherwake -i enp1s0 00:22:15:DE:31:42

Monday, November 19, 2018

Hypothesis Testing Errors

Hypothesis testing allows us to quantify relationships between known samples and the unknown population from which they were taken. What do I mean by that? Let's say I am interested in the returns from investing in a particular stock. The daily return might be calculated as the ratio of today's closing price over yesterday's closing price. Whether I was to take a sample of 30 returns (over 31 days) or 60 returns (over 61 days), I still couldn't know the population mean return, but I could hypothesize about it... so I do.

I choose a "null hypothesis" that the population mean is 4.5%. Given that my sample mean was 5% and there was a 2% standard deviation, 30 observations would produce a test statistic of \(\frac{0.05 - 0.045}{\frac{0.02}{\sqrt{30}}} = 1.37\) standard deviations. In other words, the p-values would be 8.5% and 91.5%. For an 80% confidence two-tailed test (10% in the left tail and 10% in the right tail) we would reject the hypothesis, but at 90% confidence we would accept it. Note how we've already accepted or rejected the hypothesis regardless of its truth.

Now, imagine an all-seeing and all-knowing supernatural bystander watching the same events unfurl... they could know the population mean... and even though they wouldn't be obliged to share the exact value with me, let's say that they'd at least tell me if my hypothesis was true or false; that is to say: if I hypothesized that the population mean was 4.5% and it actually was 4.5% then the hypothesis would be true, otherwise if would be false (it could be 4.2% or 4.3% or even -5% or 63%; the point is we don't know).

If we take our two test results and combine them with the two possible truth values, it produces this 2X2 matrix of outcomes.

TrueCorrectType 1 Error
FalseType 2 ErrorCorrect

  • True/False: does the actual population mean match the hypothesized mean?
  • Reject/Accept: does our statistic fall outside/inside the confidence interval?
  • Correct/Incorrect: did we accept a true (or reject a false) hypothesis or did we commit an error?

Let's ask the bystander if our hypothesis was indeed true or if it was false.

  • Yes, it's true:
    • At 90% we accepted it
    • At 80% we rejected it (Type 1 Error)
  • No, it's false:
    • At 90% we accepted it (Type 2 Error)
    • At 80% we rejected it.
This last pair of possibilities deserves more analysis: when the bystander tells us our hypothesis was false, it doesn't seem to matter why we were correct to reject the hypothesis; all that matters is that we did.

This video tries to explain it but I am not confident the author is correct. I'd prefer to side with authors of the CFA curriculum who say - in short - "it's complicated".

Wednesday, November 07, 2018

Fourier Transform in C#

I am fascinated by the Fourier transform; specifically: its ability to decompose a wave into a series of fundamental waves. Mathematicians might prefer to see it described it more formally, but here I'll be trying to make it understandable.

Obviously, for a wave to have motion, time needs to pass. If we freeze time then our wave cannot oscillate. To say a wave has frequency, we need first to observe a period in which the wave has motion. The longer we observe its movement the more we can mathematically extract from it, but for now let's simply say that we need to acquire a range of samples measured as the wave oscillates. From this point, armed with our array of floating point numerical samples, we menacingly approach the rabbit hole.

The remainder of this post will focus on a single practical aspect of the Fourier transform: how to interpret the result, especially its magnitude. I'm using the MathNet.Numerics package from Nuget, which provides the handy method Fourier.Forward(samples, FourierOptions.NoScaling).
Note: you need to specifically pass FourierOptions.NoScaling if you want control of the output.

Since I grew up in an age when Compact Discs were seen as technological marvels, I am going to use the standard 44.1 kHz as my sample rate for the following example. A full second's worth of samples (44100) might seem a reasonable place to start, but as we'll see later, it's by no means a requirement. The chosen sample rate theoretically limits the upper frequency to 22050 Hz, and even though we pass in 44100 samples to Fourier.Forward we get back a symmetrical result with the first 22050 data points carrying the frequencies up to 22050 Hz, and the second 22050 carrying them all in again reverse. We can safely ignore the second half of the output for this post. If our input signal (encoded in the 44100 samples) was a pure sinusoidal signal (of any frequency up to 22050 Hz) oscillating between +1 and -1, it will contribute 22050 units to the frequency bin associated with the signal. Those units can be found in the Complex32.Magnitude property. Multiplying the amplitude of the input signal by any constant A will scale the magnitude to A x 22050. However, if we pass in a fraction of the full second's worth of samples - let's say \(\frac{4}{5}\), which is 35280 instead of 44100 - then we see the number of units fall to \(\sqrt{\frac{4}{5}}A\frac{samplerate}{2}\). And yes, the number would rise if the fraction was greater than one.

It's probably important to note that we haven't changed the sample rate: that's still 44100 and it still means we cannot get any higher frequencies than 22050 Hz. What we've done is alter the resolution. With \(\frac{4}{5}\) of the original resolution each frequency bin would be \(\frac{5}{4}\) as wide so that we could still accommodate all 22050 Hz, but in fewer distinct bins. It's not a net loss, though; it's a trade off: we've lost resolution in the frequency domain but we've gained resolution in the time domain (there will be \(\frac{5}{4}\) as many Fourier transforms in any given time period). Of course, this now means we cannot simply read off the frequency from the bin number; now we need to scale the bin number by our resolution factor. Staying with the \(\frac{4}{5}\) example: 800 Hz would now be found in bin 640, which is \(\frac{4}{5}\) of 800.

So... why did we choose to use the Complex32.Magnitude property, and what does it mean? Each of the data points we get as output from Fourier.Forward is a complex number, having both a Complex32.Real and an Complex32.Imaginary part. Besides these two orthogonal vectors pointing in the real and imaginary axes, another way to interpret them is as a single vector that has two properties: Complex32.Magnitude and Complex32.Phase. Together, these allow us to imagine the frequency bin as a clock with a single spinning hand of variable length. If the hand is long: there's more signal; if it's short: there is less. And the angle of the hand shows us the offset of the signal (relative to convention).

Input Shift Re Im Mag Phase
+0 0 -22050 22050 \(-\frac{π}{2}\)
+\(\frac{π}{2}\) 22050 0 22050 0
0 22050 22050 \(\frac{π}{2}\)
+\(\frac{3π}{2}\) -22050 0 22050 π

Finally, how did we get 22050 for the maximum Complex32.Magnitude given a maximum amplitude of 1 (and minimum amplitude of -1) for our input signal? Well, that's how the Fourier transform works: it essentially adds up the 22050 vector results of pairing the input signal with the known pure sinusoidal signal.

Sunday, October 28, 2018

Demand and Supply: Part 2

In the previous post, there was a table of coefficients for average cost (AC), total cost (TC) and marginal cost (MC). Let's decompose TC into two components: one that doesn't vary as quantity (Q) varies and another that does. We'll call the components total fixed cost (TFC) and total variable cost (TVC). We can also decompose AC into average fixed cost (AFC) and average variable cost (AVC)

Q-1 a a 0       0
Q0 b 0 b a a 0 b
Q1 c 0 c b 0 b 2c
Q2 d 0 d c 0 c 3d
Q3       d 0 d

AFC: \(p = \frac{a}{q}\)
AVC: \(p = b + cq + dq^2\)
TFC: \(p = a\)
TVC: \(p = bq + cq^2 + dq^3\)

Note how \(AC = AFC + AVC\) and \(TC = TFC + TVC\)
Note too how MR is unrelated to TFC, but has the same relation to both TC and TVC.

Demand and Supply: Part 1

Let's use a little bit of mathematics to assist our understanding of demand and supply, focusing here on the slightly more complicated concept of supply (demand functions identically, but we don't often decompose revenue into fixed and variable components). In the study of economics we often encounter three "curves" that describe how the price of an item is related to the quantity (Q) a firm is willing to supply: average cost (AC), total cost (TC) and marginal cost (MC). These three functions are mathematically related by two equations: \(AC = \frac{TC}{Q}\) (which also means that \(TC = ACxQ\)) and \(\frac{dTC}{dQ} = MC\) (which also means that \(TC_q = \int_0^q MC dQ\)). It's too soon to explain, but these are not quantities the firm is willing to supply, but rather the quantities the firm is able to supply.

A third degree polynomial function should be sufficient for this demonstration. In the table below, I've displayed the coefficients of each power of Q, highlighting the "fixed" component of TC (at Q0) and AC (at Q-1) noting that MC has no such "fixed" component.

Q-1 a 0
Q0 b a b
Q1 c b 2c
Q2 d c 3d
Q3 d

AC: \(p = \frac{a}{q} + b + cq + dq^2\)
TC: \(p = a + bq + cq^2 + dq^3\)
MC: \(p = b + 2cq + 3dq^2\)

If the firm has any fixed costs (e.g. monthly rental payments on commercial property), they would be represented by the coefficient \(a\); if it has variable costs (e.g. monthly wages payable to labour) they would be represented by the coefficients \(b\),\(c\) and \(d\) (these are the costs that vary as Q is varied.

Thursday, September 13, 2018


There are some nice instructional videos on YouTube that explain eigenvectors and eigenvalues. What follows is a brief summary as well as a caveat or two.

We start with a square transformation matrix M, which can have one or more eigenvectors. If you restrict eigenvalues to real numbers (i.e. if you exclude imaginary numbers) then it's possible to have a transformation matrix without any real eigenvalues (e.g. in a pure rotation). It's also possible to have just one eigenvalue but an infinite number of eigenvectors (e.g. common scale in all axes) ... but I've probably distracted you too much already...

What's an eigenvector? It's a vector that only changes (stretches/squeezes) in scale along its span when transformed by M.
What's a span? That's a radial line to any vector from the origin of the coordinate system.
Each eigenvector has a corresponding scalar eigenvalue λ that, when the two are multiplied, has the same result as M being multiplied by that eigenvector.
It's a rather circular definition so far:
\(\matrix{M}\times\vec{v} = λ\times\vec{v}\)
We already know about a transformation matrix that represents a uniform scale of λ, and that's \(λ\times\matrix{I}\) (it's the identity matrix, but instead of 1s running down the diagonal, it's got λs). Now we have a starting point for our calculation:
\(\matrix{M}\times\vec{v} = (λ\times\matrix{I})\times\vec{v}\)
As tempting as it looks, we should never divide by a vector. Instead, what we want next is the value of λ that results in:
\((\matrix{M} - (λ\times\matrix{I}))\times\vec{v} = \vec{0}\)
Let's let:
\(\matrix{T} = \matrix{M} - (λ\times\matrix{I})\)
Ordinarily, it would now be trivial to solve for the eigenvector using:
\(\matrix{T}\times\vec{v} = \vec{0}\)
\(=> \matrix{T}^{-1}\times\vec{0} = \vec{v}\)
But we cannot do that in this instance, because:
\(det(\matrix{T}) = 0\)
And that means that T is not invertible. But surely that means we've missed a step? Nobody said we need the determinant to be zero. What's happening? I'll explain: if \(\vec{v} = \vec{0}\) then anything multiplied by \(\vec{v}\) will also equal \(\vec{0}\).
If we restrict our eigenvector to non-zero vectors then the only way we can get a zero vector after transformation is if the transformation matrix collapses a dimension: i.e. the determinant (or scale) of the transform must be zero. That's how we get our constraint.

Note: Matrices are either invertible or singular. A matrix is invertible iff the determinant is non-zero, and it is singular iff the determinant is zero. Our transform T must be singular. That means: it collapses down by a dimension.

So, if the determinant of our 2X2 transformation matrix is zero:
\(det\begin{bmatrix}a-λ & b \\ c & d-λ \\\end{bmatrix} = 0\)
Then we know (from the Leibniz formula of a determinant) \((a-λ) \times (d-λ) - c \times b = 0\)
And we can expand to:
\(λ^2 - λ(a+d) + (a \times d) - (c \times b) = 0\)
That's just a quadratic, easily solved (but take care with complex numbers, i does appear in rotation matrices). So we have our eigenvalue(s) λ and (by substitution) also T. But how do we find the eigenvectors if we cannot take the inverse of T? Well, you can try a technique like row reduction. But don't expect a unique solution... in fact, because the transform has a determinant of zero, the eigenvector is strictly a ratio of X:Y in two dimensions (or X:Y:Z in three dimensions, etc.) and you're going to find the lines overlap; in jargon: they are linearly dependent.

Friday, August 31, 2018

Vinyl LP Record

In this post I will detail one 12" stereophonic 33⅓ vinyl LP record. Initially I was hoping to be able to image the grooves cut into the surface and recover the audio track visually, but the resolution of a borrowed microscope demonstrated that the task was beyond its capabilities. Unfortunately that's left me to issue a caveat from the outset: reader beware; no experimental fact checking has taken place...

First, some mathematical properties:
Rotations per minute: 33⅓ = 100/3
Rotations per second: => 5/9ths
Outermost groove radius: 146.05mm
Innermost groove radius: 60.3mm
This gives us about 85.75mm of playable radius.
We're told that each side can contain up to 23 minutes (1380 seconds) of audio, the playback of which would require the disc to rotate 767 times (5/9 x 1380).
Fitting 767 rotations into 85.75mm gives us roughly 0.112mm separation between adjacent edges of the groove.
Finally, we define the spiral travelled by the stylus at time t as some initial offset less 0.112mm for each 2πθ of rotation as it moves from the outermost to the innermost radius.
let r(t) = 146.05 - 0.112 * (5 / 9) * t;
Let's switch to using µm, as things are starting to get tiny!

Note: the groove on a stereophonic record is a 90 degree cut rotated 45 degrees so that it looks like a V cut into the vinyl. The inner surface (closest to the center of the record) modulates one of the stereo audio tracks, while the outer surface (closest to the edge of the record) modulates the other. To increase likelihood of the stylus tracking the groove the V should never be shallower (or, equivalently, narrower) than 25µm.

We'll divide the 112µm groove into a handful of regions:
  1. First, our mandatory 25µm notch, which we later split into an inner 12.5µm and an outer 12.5µm.
  2. An optional (but recommended) gap
  3. Inner and outer channels for modulating the left and right stereo data
Mathematically we say the groove width is made up of these components: 112µm - 25µm - gap = 2 * audio;
This tells us: once we've picked a size for the gap, we will know the maximum width for each of our audio channels; we don't need to refer to the gap size again once we've worked out the audio channel widths.

Now it's time to scale our input audio signals... For simplicity, let's ignore the bitness of a digital audio signal and assume that the signal amplitude fits into the range from +1 to -1. Note that the 0 point maps exactly onto the half-channel.
*Another assumption: positive samples have shallower grooves*
Remember: at time t, the stylus can be found at radius = r(t) [defined above].
let inner and outer be the left and right audio samples (numbers between -1 and +1) at time t. Combined with the rotation of the disc (giving us θ at time t) these four equations below define the groove geometry...
Surface: inner radial offset = (audio / 2) * (inner - 1) - 12.5µm
Surface: outer radial offset = (audio / 2) * (1 - outer) + 12.5µm
Groove: radial offset = (audio / 2) * (inner - outer) / 2
Groove: vertical offset = (audio / 2) * ((inner + outer) / 2 - 1) - 12.5µm

The image below is a stylized aid for identifying the described parts of a groove, as seen looking down the groove from head on. The image shows the inner channel modulated to a positive audio sample and the outer channel modulated to a negative audio sample. The stylus is offset to the right as a result of this stereo modulation.
  • Green: maximum and minimum audio modulation points (where audio signal is +1 or -1)
  • Orange: the imaginary "zero" line
  • Light blue: surface of the inner modulation area
  • Red: surface of the outer modulation area
  • Black: stylus position
  • Dark gray: uncut portion of vinyl LP record
  • Light gray: mandatory unusable 25µm notch
  • do we map left to inner and right to outer?
  • do we map positive to high (shallower V) and negative to low (deeper V)
  • what's an appropriate audio channel width? if we know the maximum modulation in the 45 degree vector, we can scale it by SQRT(2) to find the channel width.
  • is it appropriate to map the zero sample point to half of the audio channel width? it might

Thursday, May 03, 2018

Numerical Integration Basics

Consider the equation \(y = 4\). If we plot this on two axes, x and y, it has no slope; it's just a horizontal line. We could also write the equation as \(y = 0x + 4\) because the value of y is unaffected by the value of x. We could also write it as \(y = f(x)\).

The integral of a function can be used to find the area between the function and the x-axis and is written \(I = \int_a^b f(x) dx\). It's essentially the sum of tiny changes in x (\(dx\)) multiplied by their corresponding y values (\(f(x)\)). The equation reminds us that we need to restrict ourselves to a lower and an upper value of x (otherwise the area would be infinite).

Let's choose a = x = 0 (the y-axis) and b = x = 3 as those two bounds.

\(I = \int_0^3 4 dx\)
\(I = \left[4 x\right]_0^3\)
\(I = \left(4\times3\right) - \left(4\times0\right)\)
\(I = 12\)

We're not restricted to 0 as the lower bound, but it exposes an interesting property: the area under a-to-b is equal to the area under 0-to-b less the area under 0-to-a. Let's try the same integral as above, but from 0-to-1.

\(I = \int_0^1 4 dx\)
\(I = \left[4 x\right]_0^1\)
\(I = \left(4\times1\right) - \left(4\times0\right)\)
\(I = 4\)

We might then subtract 4 from 12 to arrive at 8, but why not do this in one go?

\(I = \int_1^3 4 dx\)
\(I = \left[4 x\right]_1^3\)
\(I = \left(4\times3\right) - \left(4\times1\right)\)
\(I = 8\)