Geoffrey Gitau Published Dec 19, 2016 Range restriction is when a sample of population under investigation contains restricted range of scores. That is, the observations in the sample are limited to some criterion, when the researcher uses a subset of data to determine whether two pieces of information are correlated. Whenever a sample has a restricted range of scores, the correlation will be reduced since the sample scores do not represent the full range of possible values. An example of a restricted range
would be selecting students with scores within a certain range to deduce the relationship between attending class and passing the exam instead of taking a random sample. The sample within the range will have a lower correlation compared to the full range of scores from a sample of the whole class Geoffrey Gitau
Research and Statistical Analysis
Others also viewed
Explore topics
In a recent statistical note in this journal, Dr. Bland and Dr.
Altman cautioned the use of the correlation of two random variables, X and
Y, in a restricted range of data (ref. 1). They concluded that when we
restrict the range of one of the random variables, says X, the correlation
coefficient between X and Y will be reduced naturally, therefore a smaller
correlation observed in a restricted range of one variable does not
necessarily imply any particular different relationship between two
variables in the range. They further explained that the reduction through
the meaning of the coefficient of determination, which measures the
proportion of the variability in Y explained by the variability in X: if
we restrict the range of one of the random variables X, we reduce the
variation in X. Therefore it will explain less variability in Y, and hence
the correlation between X and Y in the
restricted range of X will
naturally be reduced.
We congratulate Dr. Bland and Altman for this important observation
on the natural reduction of the correlation coefficient if we restrict one
of the variables in a particular range. However, the explanation of the
reduction could be made clearer and more general. In addition, a
discussion on the magnitude of this reduction would be of great interest.
For example, we may want to know how this reduction is related to
the
probability of X falling in the restricted range. Below, we shall
explicitly explain why the reduction occurs and how the reduction depends
on the range of the restriction.
Suppose that we are interested in estimating the correlation between
X and Y based on a random sample of n paired samples from a bivariate
normal distribution. As a bivariate normal distribution can be
transferred to the standard bivariate normal distribution with the same
correlation
coefficient, without loss of generality, we assume the
bivariate normal distribution has mean (0, 0), variance (1, 1), and a
correlation r. Now let us consider the correlation in the restricted
interval: X is between a and b. Let f(.) and F(.) denote the standard
normal probability density function and cumulative distribution function.
Similar to what we have done before (ref. 2), it can be shown that, when n
is large, the correlation in this restricted interval of X,
converges to
is the variance of the truncated standard normal variable of X
within the range of (a, b).
Knowing that the variance of the truncated standard normal variable
within the range of (a, b) is smaller than or equal to 1, the variance of
the unrestricted X, we derive that the restricted correlation is less or
equal to r. Therefore, the correlation in this restricted interval of X
attenuates the correlation
between X and Y. Figure 1 illustrates how the
level of attenuation depends on the range (a, b) graphically.
Specifically, it shows how the attenuation depends on the probability of X
<=a and the probability of a< X <= b.
Contributors: LN and HC equally initiated, designed, and drafted the
paper. All authors approved this version of the paper.
Views expressed in this paper are the author's professional opinions and do not necessarily represent the official positions of the U.S. Food and Drug Administration.
Reference
1. Bland JM, Altman DG. Correlation in restricted ranges of data. BMJ
2011;342:d556.
2. Chu H, Nie L, Cole SR. Sample size and statistical power assessing the
effect of interventions in the context of mixture distributions with
detection limits. Stat Med 2006;25(15):2647-57.
Figure 1: The relationship between the correlation coefficients rr
for the restricted interval of X and the
probability of X <= a and the
probability of a< X <= b. The 19 lines presented in each plot
correspond to the (unrestricted) coefficient being - 0.9 to 0.9 by 0.1
from top to bottom.