Regression to the mean (RTM), a widespread statistical phenomenon that occurs when a non-random sample is selected from a group and the two variables of interest measured are imperfectly correlated. A phenomenon is an observable fact or event. The term came into its modern philosophical usage through Immanuel Kant, who contrasted it with the noumenon, which cannot be directly observed.The smaller the correlation between these two variables, the more extreme the obtained value is from the population mean and the larger the effect of RTM (that is, there is more opportunity or room for RTM). If variables X and Y have standard deviations SDx and SDy, and correlation = r, the slope of the familiar least-squares regression line can be written rSDy/SDx. Thus, a change of one standard deviation in X is associated with a change of r standard deviations in Y. Unless X and Y are perfectly linearly related, so that all the points lie along a straight line, r is less than 1. For a given value of X, the predicted value of Y is always fewer standard deviations from its mean than is X from its mean. Because RTM will be in effect to some extent unless r = 1, it almost always occurs in practice.
RTM does not depend on the assumption of linearity, the level of measurement of the variable (for example, the variable can be dichotomous), or measurement error. Given a less than perfect correlation between X and Y, RTM is a mathematical necessity. Although it is not inherent in either biological or psychological data, RTM has important predictive implications for both. In situations in which one has little information to make a judgment, often the best advice is to use the mean value as the prediction.
How to deal with RTM
If subjects are randomly allocated to comparison groups, the responses from all groups should be equally affected by RTM. With placebo and treatment groups, the mean change in the placebo group provides an estimate of the change caused by RTM (plus any other placebo effect). The difference between the mean change in the treatment group and the mean change in the placebo group is then the estimate of the treatment effect after adjusting for RTM. RTM can be reduced by basing the selection of individuals on the average of several measurements instead of a single measurement. It has also been suggested to select patients on the basis of one measurement but to use a second pretreatment measurement as the baseline from which to compute the change. If the correlation coefficient between the posttreatment and the first pretreatment measurement is the same as that between the first and the second pretreatment measurement, then there will be no expected mean change due to RTM.
Swiss commemorative stamp of mathematician Jakob Bernoulli, issued 1994, displaying the formula and the graph for the law of large numbers, first proved by Bernoulli in 1713.
Probability and statistics: Biometry called reversion, subsequently known as regression to the mean. Galton was also founder of the eugenics movement, which called for guiding the evolution of human populations the same way that breeders improve chickens or cows. He developed measures of the transmission of parental characteristics to their offspring: the children of…correlation.
Correlation, In statistics, the degree of association between two random variables. The correlation between the graphs of two data sets is the degree to which they resemble each other. However, correlation is not the same as causation, and even a very close correlation may be no more than a coincidence.…
Standard deviation, in statistics, a measure of the variability (dispersion or spread) of any set of numerical values about their arithmetic mean (average; denoted by μ). It is specifically defined as the positive square root of the variance (σ2); in symbols, σ2 = Σ(xi − μ)2/n, where Σ is a…
Regression to the mean describes the feature that “extreme” outcomes tend to be followed by more “normal” ones. It’s a statistical concept that is both easy to understand and easy to forget. When we witness “extreme” events such as unlikely successes or failures, we forget how rare such events are. When these events are followed by more “normal” events, we try to explain why these “normal” events happened — we forget that these “normal” events are…normal and that we should expect them to happen. This often leads us to attribute causal powers to people, events, and interventions that may have played no role in bringing about the “normal” event.
Identifying and dealing with RTM
Example: the Nambour Skin Cancer Prevention Trial
To illustrate the statistical methods used to detect and control for RTM we used a random subset of measurements of serum betacarotene from the Nambour Skin Cancer Prevention Trial.13 This community-based randomized trial investigated the effect of a daily betacarotene supplement and daily application of sunscreen on skin cancer. The effect of the betacarotene supplement on serum levels was investigated in a random sub-sample of trial participants, who provided a blood sample at the start of the trial, in February 1992, and another blood sample at the end of the supplementation period in July 1996 (unpublished). The betacarotene measurements (μM/l) in this study were strongly positively skewed. For our purpose we therefore log-transformed the data to make them approximately Normally distributed. The data consist of n = 96 paired measurements, n = 52 from the treatment group (betacarotene supplement) and n = 44 from the placebo group. In the analyses presented here we are interested in whether the supplements increased betacarotene levels (i.e. a genuine treatment effect).
I have illustrated the problem of regression to the mean (RTM) using some simple biological examples where the variable was approximately Normally distributed. However, RTM is not restricted to biological variables. It will occur in any measurement (biological, psychometric, anthropometric, etc) that is observed with error. Also it is not restricted to distributions that are Normal, or even to distributions that are continuous. RTM can occur in binary data where it would cause subjects to change categories without any true change in their underlying response.