QQ Plots

Quantile-Quantile (QQ) plots are a way of visually comparing your data to a reference distribution. In occupational hygiene, we typically assume that our data is either normally or log-normally distributed. A QQ plot can help provide evidence (not proof!) that our assumption is reasonable, or not.

If you’ve used tools like IHStat, you may be familiar with these plots. You may already know that if your data follows a straight line it’s normal (or log-normal). But understanding how these plots work a little more clearly can give you quite useful information. Besides, learning stats is fun… right, guys?

We will first look at how QQ plots work conceptually. Then do a real-life example with full instructions if you ever wanted to do it yourself.

One definition before we start that seems important: Quantile. You probably know percentile is a percent of the distribution (e.g. 50th percentile = half the distribution is below this point). Percentile breaks it down into 100 sections. Quantile is just the generic term for these sections. In this post, you can think of them as interchangeable*.

Concept

Step 1 - Make a Reference

Take a normal distribution. Any normal distribution will do. This will be our reference we compare our data to. We then divide the curve 9 times into 10 equally size groups.

A normal distribution being divided into equal size sections

The area under the curve is probability. Each of the 10 coloured section have the same area, and therefore the same probability - 10% each. In other words, if we pick a random number from this distribution there is equal chance (10%) it will fall into any of these sections. Notice how there are more sections packed into the centre. Their thinness is balanced by their height so that they still have the same area as the fatter side sections. This is the defining characteristic of a normal distribution - a higher density of numbers in the centre and symmetrically fewer the further from the centre you get. The values at the bottom are our percentiles - the percentage of the area to the left. This is the pattern we are looking for in our data when we make a QQ plot.

Step 2 - Plot the Data

Take the data you collected in the field and plot it on the graph. The x-axis will be the reference points created in Step 1, and y-axis is the value of the results. Draw a line-of-best-fit and there's your QQ plot!

We said before that a “straight line” = “normal”. Let’s see that in action. In the animation below, we have taken our reference distribution, and use it as our data.

Our reference distribution (white dots) being copied as our data (yellow dots).

When we plot this with a line-of-best-fit we can see it makes a perfect straight line. This is expected. But what if our real data is normal but exactly the same normal distribution? Well, the only thing that matters is that the ‘pattern’ stays the same. If we increase the distance between data (i.e. more variance), we still get a straight line just with a different angle:

Slope ~ Variance, Y-intercept ~ Magnitude

Extra Notes

  1. The number of quantiles (white dots on horizontal axis) will vary to match the number of samples you have.

  2. Often “z-scores” are used instead of percentiles like we did above. It’s the same thing, just standardises the x-axis. Its convention.

Worked Example

You’ve just collected 9 inhalable dust samples and want to know if it’s ok to assume the samples are normally distributed. Our results (in mg/m3), ranked low to high, are:

2.9, 4.0, 7.2, 8.2, 11.0, 12.0, 20.0, 24.1, 33.0

Step 1 - Make a Reference

We need to create our reference just like before. We will split our reference normal distribution into 9 equal groups (because we have nine samples). Then this time we will use z-score as per convention. To calculate z-score we use a formula**:

z-Score forumla

Don’t be intimited by this equation. The maths doesn’t really matter for our sake, and excel can do it for you. Here is an excel sheet that you can use.

So now we have our results ranked from low to high, and calculated our z-scores. We are ready to plot!

Ranked results and z-Scores

Result
(mg/m^3)
Rank z-Score
2.9 1 -1.28
4.0 2 -0.84
7.2 3 -0.52
8.2 4 -0.25
11.0 5 0
12.0 6 0.25
20.0 7 0.52
24.1 8 0.84
33.0 9 1.28

Step 2 - Plot the Data
Now we make a plot where {x-axis = z-score} and {y-axis = Sample Results}. Looks like this:

QQ plot testing for normality

Bummer. Doesn’t really follow a straight line. So probably not normally distributed… What could it mean instead?

Interpretation for Normal QQ Plots

With a “U” shape it seems like our data might be right skewed. HEY! Log-normal distributions are right skewed. Maybe our data is log-normally distributed. We can make a log-normal QQ plot by keeping the same z-scores and findings the log of our data. That looks like this:

QQ plot testing for log-normality

BAM!

Our logged data fits a straight line really well! This suggests our original data probably fits a log-normal distribution.


*All percentiles are quantiles. Not all quantiles are percentiles (e.g. quartiles = 4 quantiles).

**There are lots alternatives to this equation but they are all quite similar.

Previous
Previous

Quantitative Exposure Assessment (Book Review)

Next
Next

Substitution is Fabrication