How does restricting the range of a variable affect the correlation coefficient

Geoffrey Gitau

Geoffrey Gitau

Research and Statistical Analysis

Published Dec 19, 2016

Range restriction is when a sample of population under investigation contains restricted range of scores. That is, the observations in the sample are limited to some criterion, when the researcher uses a subset of data to determine whether two pieces of information are correlated. Whenever a sample has a restricted range of scores, the correlation will be reduced since the sample scores do not represent the full range of possible values.

An example of a restricted range would be selecting students with scores within a certain range to deduce the relationship between attending class and passing the exam instead of taking a random sample. The sample within the range will have a lower correlation compared to the full range of scores from a sample of the whole class

Others also viewed

Explore topics

In a recent statistical note in this journal, Dr. Bland and Dr.
Altman cautioned the use of the correlation of two random variables, X and
Y, in a restricted range of data (ref. 1). They concluded that when we
restrict the range of one of the random variables, says X, the correlation
coefficient between X and Y will be reduced naturally, therefore a smaller
correlation observed in a restricted range of one variable does not
necessarily imply any particular different relationship between two
variables in the range. They further explained that the reduction through
the meaning of the coefficient of determination, which measures the
proportion of the variability in Y explained by the variability in X: if
we restrict the range of one of the random variables X, we reduce the
variation in X. Therefore it will explain less variability in Y, and hence
the correlation between X and Y in the restricted range of X will
naturally be reduced.

We congratulate Dr. Bland and Altman for this important observation
on the natural reduction of the correlation coefficient if we restrict one
of the variables in a particular range. However, the explanation of the
reduction could be made clearer and more general. In addition, a
discussion on the magnitude of this reduction would be of great interest.
For example, we may want to know how this reduction is related to the
probability of X falling in the restricted range. Below, we shall
explicitly explain why the reduction occurs and how the reduction depends
on the range of the restriction.

Suppose that we are interested in estimating the correlation between
X and Y based on a random sample of n paired samples from a bivariate
normal distribution. As a bivariate normal distribution can be
transferred to the standard bivariate normal distribution with the same
correlation coefficient, without loss of generality, we assume the
bivariate normal distribution has mean (0, 0), variance (1, 1), and a
correlation r. Now let us consider the correlation in the restricted
interval: X is between a and b. Let f(.) and F(.) denote the standard
normal probability density function and cumulative distribution function.
Similar to what we have done before (ref. 2), it can be shown that, when n
is large, the correlation in this restricted interval of X, converges to

is the variance of the truncated standard normal variable of X
within the range of (a, b).

Knowing that the variance of the truncated standard normal variable
within the range of (a, b) is smaller than or equal to 1, the variance of
the unrestricted X, we derive that the restricted correlation is less or
equal to r. Therefore, the correlation in this restricted interval of X
attenuates the correlation between X and Y. Figure 1 illustrates how the
level of attenuation depends on the range (a, b) graphically.
Specifically, it shows how the attenuation depends on the probability of X
<=a and the probability of a< X <= b.

Contributors: LN and HC equally initiated, designed, and drafted the
paper. All authors approved this version of the paper.

Views expressed in this paper are the author's professional opinions and do not necessarily represent the official positions of the U.S. Food and Drug Administration.

Reference

1. Bland JM, Altman DG. Correlation in restricted ranges of data. BMJ
2011;342:d556.

2. Chu H, Nie L, Cole SR. Sample size and statistical power assessing the
effect of interventions in the context of mixture distributions with
detection limits. Stat Med 2006;25(15):2647-57.

Figure 1: The relationship between the correlation coefficients rr
for the restricted interval of X and the probability of X <= a and the
probability of a< X <= b. The 19 lines presented in each plot
correspond to the (unrestricted) coefficient being - 0.9 to 0.9 by 0.1
from top to bottom.

How does restricting the range of a variable affects the correlation coefficient?

Typically, restriction of range reduces correlation coefficient magnitudes and other measures of effect size (e.g., R2) when a variable is directly or indirectly restricted.

How does restricted range affect correlation?

Whenever a sample has a restricted range of scores, the correlation will be reduced. To take the most extreme example, consider what the correlation between high-school GPA and college GPA would be in a sample where every student had the same high-school GPA. The correlation would necessarily be 0.0.

In which of the following ways does restricting the range of a variable affect the correlation coefficient quizlet?

How does a restricted range affect a correlation coefficient? reduces the accuracy, producing a smaller coefficient than if hte range were not restricted and leads to an underestimate of the degree of association between the two variables.

What is the problem with restricted range?

Range restriction on a particular variable may lead to such negative effects as failing to observe or improperly characterizing a relationship between the variables of interest.