Gaussian Moments 20/08/20


Normal Distributions

The Gaussian distribution (or normal distribution) is ubiquitous in mathematics and its applications. Most outside of science will have encountered it as the "Bell curve" when taking or talking about IQ tests and other standardised measurements, or perhaps in an introductory statistics course.

In the wild normal distributions crop up all the time, due to a result known as the central limit theorem. Which states that when we take a sample \(\{X_{i}\}_{i=1}^{N}\) of random numbers with some (finite) mean \(\mu\) and (finite) variance \(\sigma^{2}\) the variable \(Z = \frac{\frac{1}{N}\sum_{i}X_{i}-\mu}{\sigma/\sqrt{N}}\) will be normally distributed when \(N\) is sufficiently large. This is useful for statistical models of real world processes. Since even if data is not "normal" we can "standardise" it, by transforming to \(Z(\{X_{i}\})\), this standardised data can then be put through the various statistical tests statisticians have developed. This was particulary useful before computers, where any sample (sufficiently large!) could be standardised and compared to the same book of "statistical tables".

A great example of a wild normal distribution is the Galton board, shown as a gif below. What we see here are pebbles falling through a set of pegs. On hitting a peg we can assume a pebble has two options, fall left or right with some probability \(p\), and given enough pebbles we hope this probability is roughly consistent (despite pebble-pebble collisions). Thankfully it does work, and we end up with a Binomial distribution in the final slots. What's more, given enough pebbles this approximates a normal distribution by the central limit theorem!

galton binomial-normal

The Maths

Let's look at the functional form of a normal distribution. $$ p(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}, \enspace -\infty \leq x \leq \infty $$ The exponential gives us a strong concentration of density around the mean \(\mu\) which decays of depending on the magnitude of the variance \(\sigma^{2}\). High variance means more spread, and a low variance mean a stronger peak and a thinner spread, with the special case of \(\sigma \rightarrow 0\) being a dirac delta functional picking out the mean.

Shape

These ideas of location and shape are not just applicable to the Gaussian distribution, in fact they form an integral part of how distributions are categorised. Let's remind ourselves what the mean and variance actually are in terms of \(p(x)\). If the random variable \(X\) is defined by the probability density function \(p(x)\), on the range \(\Omega\) the the mean and variance are $$ \mathbb{E}[X] = \int_{x\in\Omega} x p(x)dx$$ $$ \text{Var}[X] = \int_{x\in\Omega} (x-\mathbb{E}[X])^{2}p(x)dx$$ For the mean the equation translates as "the sum of all possible values \(x\) of \(X\) multiplied by their probabilities of occuring", for the variance the equation means "the sum of all square deviations of all possible values of \(x\) of \(X\) from the mean of \(X\) multiplied by their probability of occuring". It's a bit of a mouthfull but essentially it tells us the average distance between realisations of a random variable \(X\) from the mean value.

Notice that both these integral functions actually come from the same family $$ \biggl\{\int_{x\in\Omega}(x-c)^{k}p(x)dx : \enspace c \in\Omega, \enspace k \in \mathbb{N}\biggr\} $$ Where \(\mathbb{N} = \{1,2,3,\ldots\}\). In particular the mean is the element with \(k=1\) and \(c=0\) and the variance is the element with \(k=2\) and \(c=\mathbb{E}[X]\). This family is known as the \(k\)-th moments centered on \(c\), higher moments are also useful (but more complicated!) such as \(k=3\) which is related to the Skewness of a distribution, and \(k=4\) which is related to the peakedness of the distribution (kurtosis). The next figure shows a few of these values off, for the Normal \(\mathcal{N}\), Exponential \(Exp\) and Binomial \(Bin\) distributions.

galton

Fractional Moments

So far so good, but what about the restriction on \(k\) that we slipped in? Namely \(k\in\mathbb{N}\). Well the "problem" of course is that, when \(k=1/2\) say, some distributions like the normal distribution, are defined over negative values. Since \(\sqrt{-1}\) is a complex number suddenly some of the moments are not so easy to interpret. Fractional moments may not be quite as intutive as their integer cousins, but nothing is gained without exploring. For the standard normal distribution the fractional moments are defined by the function \(z(k):\mathbb{R}\mapsto\mathbb{C}\) $$ z(k) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}x^{k}e^{-\frac{x^{2}}{2}}dx $$ To get an idea of what this thing does, we can approximate this integral numerically. The plots below are valid for steps of \(dx=0.01, dk=0.005\) over the range \((-100,100)\). So not quite \((-\infty,\infty)\), but still beautiful!

galton