Gaussian Moments 20/08/20


Normal Distributions

The Gaussian distribution (or normal distribution) is ubiquitous in mathematics and itsapplications. Most outside of science will have encountered it as the "Bell curve" when takingor talking about IQ testsand other standardised measurements, or perhaps in an introductory statistics course.

In the wild normal distributions crop up all the time, due to a result known as thecentral limit theorem. Which states that when we take a sample \(\{X_{i}\}_{i=1}^{N}\) ofrandom numbers with some (finite) mean \(\mu\) and (finite) variance \(\sigma^{2}\) the variable \(Z = \frac{\frac{1}{N}\sum_{i}X_{i}-\mu}{\sigma/\sqrt{N}}\)will be normally distributed when \(N\) is sufficiently large. This is useful for statistical modelsof real world processes. Since even if data is not "normal" we can "standardise" it, bytransforming to \(Z(\{X_{i}\})\), this standardised data can then be put through thevarious statistical tests statisticians have developed. This was particulary usefulbefore computers, where any sample (sufficiently large!) could be standardised and comparedto the same book of "statistical tables".

A great example of a wild normal distribution isthe Galton board, shown as a gif below. What we see here are pebbles falling througha set of pegs. On hitting a peg we can assume a pebble has two options, fall leftor right with some probability \(p\), and given enough pebbles we hope this probabilityis roughly consistent (despite pebble-pebble collisions). Thankfully it does work,and we end up with a Binomial distribution in the final slots. What's more, givenenough pebbles this approximates a normal distribution by the central limittheorem!

galton
binomial-normal


The Maths

Let's look at the functional form of a normal distribution.$$ p(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}, \enspace -\infty \leq x \leq \infty $$The exponential gives us a strong concentration of density around the mean \(\mu\) whichdecays of depending on the magnitude of the variance \(\sigma^{2}\). High variancemeans more spread, and a low variance mean a stronger peak and a thinner spread,with the special case of \(\sigma \rightarrow 0\) being a dirac delta functionalpicking out the mean.

Shape

These ideas of location and shape are not just applicable to the Gaussian distribution,in fact they form an integral part of how distributions are categorised. Let'sremind ourselves what the mean and variance actually are in terms of \(p(x)\). If the randomvariable \(X\) is defined by the probability density function \(p(x)\), on therange \(\Omega\) the the mean and variance are$$ \mathbb{E}[X] = \int_{x\in\Omega} x p(x)dx$$$$ \text{Var}[X] = \int_{x\in\Omega} (x-\mathbb{E}[X])^{2}p(x)dx$$For the mean the equation translates as "the sum of all possible values \(x\) of \(X\) multiplied by theirprobabilities of occuring", for the variance the equation means "the sum of all square deviations ofall possible values of \(x\) of \(X\) from the mean of \(X\) multiplied by theirprobability of occuring". It's a bit of a mouthfull but essentially it tells usthe average distance between realisations of a random variable \(X\) fromthe mean value.

Notice that both these integral functions actually come from the same family$$ \biggl\{\int_{x\in\Omega}(x-c)^{k}p(x)dx : \enspace c \in\Omega, \enspace k \in \mathbb{N}\biggr\} $$Where \(\mathbb{N} = \{1,2,3,\ldots\}\). In particular the mean is the element with \(k=1\) and \(c=0\) and the variance is the elementwith \(k=2\) and \(c=\mathbb{E}[X]\). This family is known as the \(k\)-th moments centeredon \(c\), higher moments are also useful (but more complicated!) such as \(k=3\)which is related to the Skewness of a distribution, and \(k=4\) which is related to thepeakedness of the distribution (kurtosis). The next figure shows a few of these valuesoff, for the Normal \(\mathcal{N}\), Exponential \(Exp\) and Binomial \(Bin\) distributions.

galton


Fractional Moments

So far so good, but what about the restriction on \(k\) that we slipped in? Namely\(k\in\mathbb{N}\). Well the "problem" of course is that, when \(k=1/2\) say, somedistributions like the normal distribution, are defined over negative values. Since\(\sqrt{-1}\) is a complex number suddenly some of the moments are notso easy to interpret. Fractional moments may not be quite as intutive astheir integer cousins, but nothing is gained without exploring. For the standardnormal distribution the fractional moments are defined by the function \(z(k):\mathbb{R}\mapsto\mathbb{C}\)$$ z(k) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}x^{k}e^{-\frac{x^{2}}{2}}dx $$To get an idea of what this thing does, we can approximate this integralnumerically. The plots below are valid for steps of \(dx=0.01, dk=0.005\) over the range\((-100,100)\). So not quite \((-\infty,\infty)\), but still beautiful!

galton