Confidence Interval Fallacies
This post is entirely based on Moray (2015) The fallacy of placing confidence in confidence intervals. Psychom Bull Rev 23:103-123 unless specified. I highly encourage you to read this paper, and others, to come to a better understanding.
Referencing correction thanks to Peter Knott.
Correction to the Land’s Exact section. An error in code gave erroneous results. Thank you to Jerome Levoue for checking my code.
In a post about upper confidence limits (UCL), I described how they are commonly misunderstood. Admittedly, I then used the same misunderstood logic to explain how to interpret these UCLs. This post will aim to correct some of those points and more clearly explain why UCLs aren’t what people generally think they are.
One small terminology point before starting:
Confidence interval procedure is an equation that generates confidence intervals for any set of results (e.g Land’s Exact in IHStats)
A confidence interval is an interval calculated for a specific set of results
Fallacy 1
An X% confidence interval contains the true parameter value with X% probability
Previously it was explained that a confidence interval is the long-term / overall probability of containing the true parameter when repeating the sampling program. It’s reasonable to think that if the ‘overall’ probability of an interval contains the true parameter, that each individual interval would have the same probability of containing the true parameter. But this is not true.
To demonstrate this, consider the following example:
You going to collect 2 airborne particulate measurements and plan to calculate a 50% confidence interval on the mean. You decide to use an unconventional confidence interval procedure - if the first measurement is larger than the second measurement, then the confidence interval is zero to infinity - (100% chance of containing the true mean). If the second measurement is larger, then the confidence is zero to zero (0% chance of containing the true mean).
Before you collect your data, both scenarios are equally likely. This means that in the long run 50% of the intervals will contain the mean - a true 50% confidence interval procedure.
Once you have collected your results, however, it should be immediately obvious whether the mean is contained within the interval. If your interval is from zero to infinity, it doesn’t make sense to say there’s a 50% chance of containing the mean. It’s a sure thing.
While extreme, it shows that the coverage of a confidence interval (in this case 50%) is decided before data is collected. Post-data collection, the confidence coefficient (50%, 95% etc) is no longer relevant. Neyman, the inventor of the confidence interval, made this very clear.
Fallacy 2
Smaller confidence intervals mean more precise knowledge of the parameter
The size of a confidence interval narrows as the sample size grows (holding everything else constant). A larger sample size means a better representation of the population. It therefore seems natural to conclude that a narrower confidence interval means you have a better estimate of the parameter. This is not true.
Alex the hygienist, and her assistant Burt conduct exposure monitoring from the same similar exposure group. Alex collects a 20 samples, and Burt only takes 3.
Alex then calculates a 95%UCL of the mean as 24ppm + 35ppm. Burt calculates his 95%UCL as 27ppm + 29ppm.
Burt claims that his estimate of the mean (27) is better because his UCL is narrower. Alex points out that Burt’s smaller sample size means that the narrower UCL is a result of less variance, and not a better estimate. In fact, her larger sample size provides a better representation of the amount of variance in exposures. To get the best estimate, she says, they should combine their data.
A narrower confidence interval does not mean a more accurate estimate of the population parameter.
Land’s Exact 95% UCL
I tested if Land’s Exact does what it says, contains the parameter 95% of the time.
Using this code, I generated log-normally distributed from known parameters. I then used Hewett’s approximation of Land’s Exact (as used in IHstats).
I found that no matter what parameter values I entered, I consistently got a coverage of +99.5%, much higher than the intended 95%. Furthermore, once the sample size was increase above 30, the coverage was 100%, every time.
I’m happy to be shown that I’m mistaken, but if I’m correct, this suggests that the confidence intervals we regularly use in hygiene aren’t even doing what they are intrinsically designed to do.
I have been shown to be mistaken! My code calculates the geometric mean, instead of the arithmetic mean. The 95%UCL does perform as intended!!
Correct Interpretation
Given all this information, what is the correct way to interpret a UCL (or any confidence interval)? You don’t. There is no meaningful interpretation of a specific UCL after the collection of data.
A confidence interval does only what it is designed to - contain the parameter value for repeated sampling x% of the time (But even then, the UCL we use in hygiene might not even do that).
Any other post-data interpretations of individual intervals (without some additional verification) are arbitrary and done by convention.
The Alternative
A bayesian credible interval provides exactly the information we have been wanting from our confidence intervals: an interval within which there is X% probability of parameter being contained, based on the data and prior. The two interval times are fundamentally different and not-interchangeable.
References
Moray et al. (2015) The fallacy of placing confidence in confidence intervals. Psychom Bull Rev 23:103-123
Neyman, J. (1941). Fiducial Argument and the Theory of Confidence Intervals. Biometrika, 32(2), 128. doi:10.2307/2332207
Hewett & Ganser (1997) Simple Procedures for Calculating Confidence Intervals around the Sample Mean and Exceedance Fraction Derived from Lognormally Distributed Data, Applied Occupational and Environmental Hygiene, 12:2, 132-142. DOI: 10.1080/1047322X.1997.10389473