Gaussian Moments 20/08/20
Normal Distributions
The Gaussian distribution (or normal distribution) is ubiquitous in mathematics and its
applications. Most outside of science will have encountered it as the "Bell curve" when taking
or talking about IQ tests
and other standardised measurements, or perhaps in an introductory statistics course.
In the wild normal distributions crop up all the time, due to a result known as the
central limit theorem. Which states that when we take a sample \(\{X_{i}\}_{i=1}^{N}\) of
random numbers with some (finite) mean \(\mu\) and (finite) variance \(\sigma^{2}\) the variable \(Z = \frac{\frac{1}{N}\sum_{i}X_{i}-\mu}{\sigma/\sqrt{N}}\)
will be normally distributed when \(N\) is sufficiently large. This is useful for statistical models
of real world processes. Since even if data is not "normal" we can "standardise" it, by
transforming to \(Z(\{X_{i}\})\), this standardised data can then be put through the
various statistical tests statisticians have developed. This was particulary useful
before computers, where any sample (sufficiently large!) could be standardised and compared
to the same book of "statistical tables".
A great example of a wild normal distribution is
the Galton board, shown as a gif below. What we see here are pebbles falling through
a set of pegs. On hitting a peg we can assume a pebble has two options, fall left
or right with some probability \(p\), and given enough pebbles we hope this probability
is roughly consistent (despite pebble-pebble collisions). Thankfully it does work,
and we end up with a Binomial distribution in the final slots. What's more, given
enough pebbles this approximates a normal distribution by the central limit
theorem!
The Maths
Let's look at the functional form of a normal distribution.
$$ p(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}, \enspace -\infty \leq x \leq \infty $$
The exponential gives us a strong concentration of density around the mean \(\mu\) which
decays of depending on the magnitude of the variance \(\sigma^{2}\). High variance
means more spread, and a low variance mean a stronger peak and a thinner spread,
with the special case of \(\sigma \rightarrow 0\) being a dirac delta functional
picking out the mean.
Shape
These ideas of location and shape are not just applicable to the Gaussian distribution,
in fact they form an integral part of how distributions are categorised. Let's
remind ourselves what the mean and variance actually are in terms of \(p(x)\). If the random
variable \(X\) is defined by the probability density function \(p(x)\), on the
range \(\Omega\) the the mean and variance are
$$ \mathbb{E}[X] = \int_{x\in\Omega} x p(x)dx$$
$$ \text{Var}[X] = \int_{x\in\Omega} (x-\mathbb{E}[X])^{2}p(x)dx$$
For the mean the equation translates as "the sum of all possible values \(x\) of \(X\) multiplied by their
probabilities of occuring", for the variance the equation means "the sum of all square deviations of
all possible values of \(x\) of \(X\) from the mean of \(X\) multiplied by their
probability of occuring". It's a bit of a mouthfull but essentially it tells us
the average distance between realisations of a random variable \(X\) from
the mean value.
Notice that both these integral functions actually come from the same family
$$ \biggl\{\int_{x\in\Omega}(x-c)^{k}p(x)dx : \enspace c \in\Omega, \enspace k \in \mathbb{N}\biggr\} $$
Where \(\mathbb{N} = \{1,2,3,\ldots\}\). In particular the mean is the element with \(k=1\) and \(c=0\) and the variance is the element
with \(k=2\) and \(c=\mathbb{E}[X]\). This family is known as the \(k\)-th moments centered
on \(c\), higher moments are also useful (but more complicated!) such as \(k=3\)
which is related to the Skewness of a distribution, and \(k=4\) which is related to the
peakedness of the distribution (kurtosis). The next figure shows a few of these values
off, for the Normal \(\mathcal{N}\), Exponential \(Exp\) and Binomial \(Bin\) distributions.
Fractional Moments
So far so good, but what about the restriction on \(k\) that we slipped in? Namely
\(k\in\mathbb{N}\). Well the "problem" of course is that, when \(k=1/2\) say, some
distributions like the normal distribution, are defined over negative values. Since
\(\sqrt{-1}\) is a complex number suddenly some of the moments are not
so easy to interpret. Fractional moments may not be quite as intutive as
their integer cousins, but nothing is gained without exploring. For the standard
normal distribution the fractional moments are defined by the function \(z(k):\mathbb{R}\mapsto\mathbb{C}\)
$$ z(k) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}x^{k}e^{-\frac{x^{2}}{2}}dx $$
To get an idea of what this thing does, we can approximate this integral
numerically. The plots below are valid for steps of \(dx=0.01, dk=0.005\) over the range
\((-100,100)\). So not quite \((-\infty,\infty)\), but still beautiful!