Also called Gaussian distribution or bell curve, this type of distribution is ubiquitous in nature. It is therefore the most important and the most widely used distribution in statistics (see the Central Limit Theorem below for details).

But before talking about the Gaussian distribution itself, we need to understand what a statistical distribution is. Simply put it is a representation of the frequencies of the potential outcomes for an experiment.

For reference, the following image shows the most common probability distributions (full list here). Some apply to discrete data (spaced bars), others represent continuous data (close-spaced bars).

Image for post
Image for post

The normal distribution represents continuous data.

Central Limit Theorem

This theorem establishes that when random independent variables are added, their normalised sum tend to a normal distribution, regardless of the primary distribution the sample are taken from.


This concept is key! Not only does it explain why many natural phenomenons follow a normal distribution, it also means generalisations are possible. The statistical significance of the CLT allows a population mean to be inferred based on a sample mean.

Keep in mind there are a few assumptions that need to be taken for the CLT to work. The sample must be:

  • randomly selected.
  • independent. Samples can’t influence each other.
  • large enough. 30 is usually considered enough for a normally distributed population, but no more than 10% of the whole population — if there are no replacements.

Characteristics of a Normal Distribution

The equation of the normal distribution curve is:

Image for post
Image for post

Where μ is the mean of the distribution and is defined as:

Image for post
Image for post

The normal distribution is centred on its mean, which implies the mean is the median, the mode of the given population, but also its axis of symmetry.

And where σ is the standard deviation, defined as:

Image for post
Image for post

It defines the spread of the distribution.

As the curves equations shows, these two parameters define a normal distribution. The following image shows the impact of their variations on the shape of the distribution:


Along with these 2 main characteristics, the normal distributions also have some interesting properties:

  • The area under the curve is equal to 1.0.
  • They are denser in the centre and less dense in the tails.
  • Around 68% of the area of a normal distribution is within one standard deviation of the mean (μ - σ) to (μ - σ).
  • Approximately 95% of the area of a normal distribution is within two standard deviations of the mean (μ - 2σ) to (μ + 2σ).
  • Approximately 99.7% of the area of a normal distribution is within three standard deviations of the mean (μ - 3σ) to (μ + 3σ).
  • Values outside of 3σ tend to be considered extreme, with a very low probability of occurrence.

Skewness and Kurtosis

The skewness is the degree of deviation from symmetry. There are various ways to calculate the skewness (see Useful links below) of a distribution but in a nutshell:

  • When negative the left tail is stretched.
  • When null, the distribution is symmetrical.
  • When positive, the right tail is stretched.

The Kurtosis characterises the tails of a distribution. Again, there are various ways to measure the kurtosis (see Useful links below).


It is worth noting:

  • Platykurtic means the tails are going to be longer, thicker; as a result , the centre of the distribution will be less pronounced. The distribution appears flatter. (in Blue)
  • Mesokurtic means the distribution is what we’d expect from a Gaussian distribution. (in Red)
  • Leptokurtic means the tails are shorter, thinner; the centre of the distribution is denser. The distribution appears thinner. (in Green)


And this concludes our quick introduction to normal distributions.

Its properties — and especially the CLT — makes it very useful when dealing with sampling and inference. The fact it is mathematically simple is also a great advantage as a lot can be achieve when the mean and standard deviation are known (such as least square fitting) but it also means little computational power is necessary.

Useful links

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store