Substitution is Fabrication
Summary
Don’t use substitution - i.e. don’t replace <LOQ with LOQ/2
Using a bad method of handling censored data not only leads to poor estimation of descriptive statistics, it can change trends in data.
In my opinion, the best “one-size fits all” method for occupational hygiene is Robust Regression on Order Statistics (e.g. NDExpo)
There is no objective best method. “Best” depends on how you measure it and the nature of your data.
Bayesian methods are also cool. They deserve their own post and aren’t touch on much here.
Importance
The dangers of many hazardous chemicals are becoming apparent faster than the techniques to analysis them are improving. Treatment of censored data is becoming ever more important with workplace exposure standards approaching the limits of quantification. This is exacerbated by the high levels of variability and small sample sizes often found with occupational hygiene sampling.
This means that being able to properly work with censored data is critical! But there’s not a clear consensus on what the best method is. Let’s take a look at what censored data is, and the options for dealing with it.
Censored data comes in three forms: right censored (greater than [>10]), middle censored (between [5-10]), or left censored (less than [<10]). Given its prevalence in occupational hygiene, this post focuses on left censored data.
Censoring happens when a result is below a level which the lab can report with confidence. Analysis tends to become less accurate at lower levels. Below a certain point, the analysis cannot distinguish between a tiny amount of contaminant and nothing at all. This is the limit of detection (LOD). Concentrations above the LOD, up until a certain point, are still not very reliable. The lab won’t feel confident in their reading. They know something is present, but not how much. Instead they will report the result as being somewhere below the limit of quantification (LOQ). If a result is reported as LOQ, it is censored. You may also come across the term non-detect which is used interchangeably with censored.*
Working with censored data can be frustrating. Firstly, there is little information to tell you where below the LOQ a result is. <10 could mean 9.9, 5 or 1. This also means calculating simple statistics is not so easy. The average of <5 and 15 could be anywhere between 7.5 and 10. Even simple comparison, say <5 verses 4, is challenging as there is no way of knowing which is larger. This can be a particular problem if that 4 is the exposure standard. Is <5 an exceedance of 4? Could be. But we can’t say that it is.
There are a range of methods for treating censored data to incorporate these results and estimate descriptive statistics, and other measures used for compliance testing (95%UCL of AM, 95th percentile etc).
Substitution
Substitution treatment means replacing the censored result with a fraction of the LOQ - typically 50% (LOQ/2), 71% (LOQ/square root 2), or 100% (i.e. <10 = 10). These have been used a lot as it easy, quick, and requires no statistical knowledge. However, you may have guess by the title that this is a dreadful approach. It has been well established since the 1980’s (and suggested since the 1960’s) that substitution introduces unnecessary error to statistical analysis. Substitution has been referred to as “fabrication” and should be considered as “invasive data”. This is because it tends to do one of two things: 1) Obscure real patterns in the data 2) Create false patterns in the data. This means that not only descriptive and “decision-making” statistics are unreliable, any trending or comparisons are too.
Further, the selection for what fraction of the LOQ to use in substitution is mostly arbitrary. But the impact can be significant for any given dataset. Take a look at the figures below. The top left shows 55 results. Perhaps the y-axis is worker exposure concentration, and x-axis the distance away from ventilation system. If the square results happened to be censored (66% of total), then the next four charts show the impact of the LOQ fraction used has on different statistics (I.e. at 0.5 = LOQ/2). Depending on the level used, the estimation of the mean ranges from 72 - 258! The estimates for the correlation coefficient and regression slopes get nowhere need the true values, in this case significantly underestimating the importance of the LEV.
It has been previously believed that for low levels of censoring substitution is no big deal. However, censoring as little as 10% and using (LOQ/2) leads to much poorer estimations than the more sophisticated methods. Given the typical sample sizes used in OH, this means any censored results will be too many to get away with substitution.
Even worse than substitution is to ignore/delete the censored data entirely, and use the remaining results for all the statistical calculations. A censored result is still a real result and cannot be discarded.
Alternative Options
There are lots of alternatives to substitution and just a few will be mentioned here - Kaplan-Meier, Maximum Likelihood Estimation, and robust Regression on Order Statistic.
Kaplan-Meier
Kaplan-Meier method is based on survival data and is standard in many industries (especially in medical research). It was originally designed for right censored data. Imagine you are running a trial on a drug and observing increase in life expectancy. Not everyone will have died by the end of the trial and so their age at death would be “>76 years old”, for example. It can be flipped to work for left censored data as well.
Kaplan-Meier has been studied extensively and is often found to be amongst the best performing treatment methods for censored data in many cases. It is non-parametric which means no assumptions about the distribution (OHs often assume log-normality). Its major shortcoming for OH is its performance drops when there is more than one censored point (e.g. <4 and <5) in the same dataset. This situation is quite likely in OH, given that the LOQ in OH is affected by both the lab result and duration of sample. So while Kaplan-Meier can be very effective, it’s probably not suitable as a universal approach in OH.
Maximum Likelihood Estimation
Maximum Likelihood Estimation is another method that has been found to perform really well at treating censored data. However, it too has a few shortcoming that makes it often inappropriate to be used in an OH setting. Firstly, it performs best with sample sizes greater than 50-70 which is not the norm in OH. Secondly, it is heavily reliant on the assumptions of the underlying distribution. Again, hygienist often assume log-normality, but with our small sample sizes its difficult to be confident in this assumption for any particular dataset.
Regression on Order Statistics
Robust regression on order statistics (rROS)** method uses the line of best fit on a Q-Q plot made from the uncensored results. In other words, it predicts where censored values should be if they followed the log-normal (or normal) distribution created by the uncensored values.
rROS combines flexibility and relative reliability that makes it idea for general use in OH. There are distribution assumptions but adherence is not critical, it can handle multiple censor points, and performs relativity well for small sample sizes.
It is the most widely used method in practising hygiene. Its readily available through software like NDExpo, is briefly described in EN689, and recommended by the AIHA.
Even more methods
Bayesian Treatment
There is of course also Bayesian methods for treating censored data. As with all Bayes’ math you don’t have a single number but a distribution that covers the censored range. This could be it’s own post entirely, so we will leave it at that for now.
Beta-substitution
This is a form of substitution but instead of LOQ/2 or LOQ/sqrt(2), special equations are used to find the substitution ratio depending on the statistic you want to calculate. Researchers have shown that it performs better than traditional substitution, and is often comparable to MLE. Its greatest advantage is that it’s easier than say ROS and MLE. I find the opposite true. And with calculators, “ease” shouldn’t be a major consideration.
Recommendation
There is no one objective, universal best method for handling censored data. It is dependant on the characteristics on the dataset being analysed. Any real dataset with censorship is, by definition, incomplete and you can’t know which method would have been best. Also “best” depends on how you measure it, and which statistic you are interested in calculating.
Given that most people probably want a single method to apply to everything, I believe the rROS is the best combination of flexibility, intuitiveness, and performance. It is just as easy as substituting as there are several tools that can perform the maths for you.
*Unfortunately this terminology can be confusing. Although not often distinguished by laboratory reports, there is a difference between a “non-detect” that is below the LOQ, and a result below LOD.
**There is regression on order statistics (ROS), robust ROS, log-profit regression, hensel’s rROS… Some of these are different terms of the same thing. Some of these are slightly different. The differences don’t super matter for this introduction, but just be careful if you decide to read further into the topic!
I stole the title of this post from Helsel (2010).
References
Hensel (2014). Statistics for Censored Environmental Data Using Minitab and R. Textbook.
Helsel (2010). Much Ado About Next to Nothing: Incorporating Nondetects in Science. The Annals of Occupational Hygiene, Volume 54, Issue 3, Pages 257–262.
Hewett & Ganser (2007) A Comparison of Several Methods for Analyzing Censored Data. The Annals of Occupational Hygiene, Volume 51, Issue 7, Pages 611–632.
Hewett (2015). A Strategy for Assessing and Managing Occupational Exposures: Appendix 8 - Analysis of Censored Data. AIHA Textbook.
Huynh et al. (2014). Comparison of Methods for Analyzing Left- Censored Occupational Exposure Data. The Annals of Occupational Hygiene, Volume 58, Issue 9, Pages 1126–1142.
Lavoué (2013) NDExpo: treatment of censored data with multiple limits of detection - Documentation
Singh et al. (2016). On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. U.S EPA Report - EPA/600/R-06/022
EN 689: Workplace exposure—measurement of exposure by inhalation to chemical agents—strategy for testing compliance with occupational exposure limit values. Bruxelles, Belgium: European Committee for Standardization EN 689:2018.