The NHS ‘Choose Wisely’ campaign places greater emphasis on the clinician-patient dialogue. Patients are often in receipt of their laboratory data and want to know whether they are normal. But what is meant by normal? Comparator data, to a measured value, are colloquially known as the ‘normal range’. It is often assumed that a result outside this limit signals disease and a result within health. However, this range is correctly termed the ‘reference interval’. The clinical risk from a measured value is continuous, not binary. The reference interval provides a point of reference against which to interpret an individual’s results—rather than defining normality itself. This article discusses the theory of normality—and describes that it is relative and situational. The concept of normality being not an absolute state influenced the development of the reference interval. We conclude with suggestions to optimise the use and interpretation of the reference interval, thereby facilitating greater patient understanding.

The ‘Choose Wisely’ campaign was introduced by the Academy of Medical Royal Colleges in 2016 with a view to encouraging a dialogue between clinician and patient regarding the practice of evidence-based treatment regimens.

With every test result, the clinical laboratory will provide comparator value(s) to help the clinician place the result in context. The comparator values are often referred to as the normal range. A frequent occurrence is for the results within this interval to be colour coded, for instance black if the result is within the range and red when outside of it. This reinforces the concept of a result having a binary quality: normal or abnormal.

If we say that a blood result is normal, a number of inferences of dissimilar nature could be put on this. The difficulty was neatly captured by the philosopher Edmond Murphy in 1960s (

Interpretations of ‘normal’ (modified from Murphy, 1966

Conceptions of normal | Suggested alternatives | |

| Determined statistically | Gaussian |

| Most representative of its class | Average, median, modal |

| Most commonly encountered | Habitual |

| Wild-type: most suited to survival & reproduction | Fittest |

| Harmless ‘carrying no penalty’ | Innocuous/harmless |

| Most often aspired to | Conventional |

| The most perfect of its class | Ideal |

If it is assumed that a ‘normal’ result has no pathophysiological derangement, the corollary would be that a result outside this limit would signal a disease state. This seems an arbitrary dichotomous interpretation. As the American psychiatrist, Theodore Rubin, put it ‘health may be considered a relative and not an absolute state’. Health may be conceived differently in different countries, or in the same country at different times, or even in the same individual at different ages.

How did the ‘normal range’ develop? First, some semantics; the

Until the 1960s, laboratories often worked in isolation and developed their own comparator values to define normal limits. It became apparent that multiple ‘normal ranges’ were required for different patient populations and for individual laboratories, to account for methodological variation. The clinical practice at the time was to compare a patient’s results with an ill-defined or at least inconsistently defined, range of values—called the ‘normal range’). This was derived from a population of supposedly ‘normal’ (meaning healthy) individuals. The concept of the reference interval was then introduced by Grasbeck and Saris in 1969

Often what underpins both these methods is the assumption that using a Gaussian distribution, to identify the middle 95% of individuals, will identify healthy individuals. There are three criticisms of the use and terminology applied here; First, values may not fall into a bell-shaped distribution but can be skewed. Fasting triglyceride is a good example of this. The most common triglyceride value (the mode) is not found at the midpoint of the population density curve but to one side—the distribution is skewed to the right (

Fasting triglyceride as an example of a skewed (non-Gaussian) distribution.

Second, no underlying theory assumes that the central 95% is physiologically normal. The 95% interval is based on pragmatism—two SD from the mean was considered suitably distant from the mean and was taken from Fisher’s development of the hypothesis-testing technique of Neyman and Pearson.

Third, the bell-shaped distribution that we term the ‘normal distribution’ is something of a misnomer. It was commonly referred to as ‘Gaussian’ until another mathematician, Karl Pearson, adopted the term ‘normal distribution’, referring to the fact that the distribution pattern was ubiquitous in life. The term was not introduced for its propensity to identify ‘normal’ individuals.

Just as a result within the reference interval may be ‘abnormal’, so might a result outside the interval be ‘normal’ (seen in the presence of health). For instance, mild hyponatraemia in the elderly may not necessarily represent disease. Physiological changes associated with ageing can include elevated antidiuretic hormone and atrial natriuretic hormone levels as well as an increased responsiveness to osmotic stimulation.

Two Gaussian distributions with no overlap.

Overlapping Gaussian distributions.

Thus, it can be seen that a patient may be classed as healthy at an individual level but diseased at a population level and vice versa (

Detection of an outlier.

The laboratory report seldom features one value in isolation. Rather, a whole panel of tests are requested, analysed and reported, such as U&Es, LFTs, bone profile; cumulatively, these are sometimes referred to as a comprehensive metabolic panel. What are the chances that out of a panel of tests, one will be abnormal? We can use the binomial distribution to test this. In statistics, the binomial distribution has only two outcomes: ‘success or failure’, ‘positive or negative’, ‘yes or no’. The binomial equation to determine the probability (

The chance of a positive blood test result (‘success’) is 0.05 (because 5% of population values lie outside the normal distribution and, as described above, in this instance are considered as abnormal). Therefore, the probability that one result is abnormal in a panel of 20 tests is:

There is therefore a 38% chance that 1 of the 20 of the tests will be abnormal. This value is indicative only, as it assumes independence of the analytes being tested, whereas often they are related, for example, alkaline phosphatase may change in tandem with gamma-glutamyl transferase.

We have described that the clinical risk from an observed value is continuous and so we must take care that the _{1c})—a value of 42 mmol/mol is considered ‘normal’ and 48 mmol/mol is, with specific preanalytic criteria met, consistent with a diagnosis of diabetes mellitus. If, instead of a decision limit to diagnose diabetes, a reference interval was used (where only the outlying 5% of the population are considered ‘abnormal’), a large number of individuals who currently have the diagnosis of diabetes would be reclassified as ‘normal’. This is because, using a decision limit, the prevalence of diabetes in England is estimated at 9% of the adult population,_{1c}>2 SD from the mean (ie, the tail of a Gaussian distribution) were used instead to determine diabetes.

The differences between reference intervals and decision limits are summarised in

Features of a decision limit and a reference interval (modified from Ceriotti and Henny, 2008

Reference intervals | Decision limits | |

Definition | The interval between, and including, two reference limits, which are values derived from the distribution of the results obtained from a sample of the reference population. | The best dividing lines between the diseased and the not diseased or between ‘those who need not be investigated further’ and ‘those who do’. |

Conditions influencing them |
Population Age group Gender |
Clinical question Patient category |

Information gathered | Whether or not the patient is part of the reference population | Whether or not the patient is eligible for a certain procedure (‘treatment’) |

Statistics | 95% central range of the distribution curve |
None (consensus values) ROC curves Predictive values |

Data number | Two (lower and upper limits) | One, without any CI |

ROC, receiver operating characteristic.

Language is important, as we see from

Question why the information is being gathered. Is it for benchmarking for the future; screening; completing a panel or performing a diagnostic investigation?

Will the investigation change the odds of something? Consider limiting indiscriminate testing.

Relate the observed value to preceding values whenever possible. The intraindividual variation in laboratory values is usually much smaller than the interindividual variability (ie, the variation in the population;

Interindividual variation greater than intraindividual variation.

The graphical representation of preceding data (if available) can be extremely effective in identifying trends (

Preanalytical error (how the sample was taken or transported to the laboratory).

Analytical error (how the sample was processed in the laboratory).

Intraindividual fluctuations of the variable measured (unlikely given the pattern of variation up to that point).

A real pathology. Attempts have been made to capture this from a ‘critical difference calculation’. The critical difference is defined as ‘the smallest difference between sequential laboratory results in a patient which is likely to indicate a true change in the patient’ and the calculation requires specifics of the laboratory (analytical) variation as well as within-subject biological variation.

In this situation, assuming that no pretest probability had been estimated (ie, that the sample was not specifically requested to test a theory), the sample may be: (1) supplemented by historic data and further findings—as with the example above (enhancement or lowering of probability that they form a component of disease).

(2) Repeated (reduces the chances being secondary to a preanalytic or analytic error or to intraindividual fluctuations). Repetition, if chosen, should be made at intervals appropriate to the expected rate of development of possible disease. The observed value on repetition may well have shifted closer to the centre of the reference interval—a phenomenon known as ‘regression to the mean’.

This is a statistical tendency where unusually large or small measurements tend to be followed by measurements that are closer to the mean

Regression to the mean. On repetition, values furthest from the mean tend to have greater change than values starting close to the mean.

The reference interval is an extremely useful means of contextualising a patient’s result but it is wrong to automatically assume ‘normality’ of a result within that interval, just as it is wrong to assume abnormality outside of the interval. Normality is relative and situational. With understanding of the nature of the reference interval, logical decisions can be made that will improve the effectiveness of the clinical consultation.

Health is a relative and not an absolute state.

The reference interval acts as a comparator for the patient’s blood result. It is not the arbiter of whether disease is present or not.

Natural fluctuations in a blood result can occur.

Comparison of a result against the reference interval should be informed by the clinical suspicion made beforehand.

MBW and PK contributed jointly to the manuscript.

The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

None declared.

Not required.

Not commissioned; externally peer reviewed.