How might a restricted range of variables affect a correlation of those variables?

Correlation

Correlation (co-relation) refers to the degree of relationship (or dependency) between two variables.

Linear correlation refers to straight-line relationships between two variables.

A correlation can range between -1 (perfect negative relationship) and +1 (perfect positive relationship), with 0 indicating no straight-line relationship.

The earliest known use of correlation was in the late 19th century[1].

Introduction

When we ask questions such as "Is X related to Y?", "Does X predict Y?", and "Does X account for Y"?, we are interested in measuring and better understanding the relationship between two variables.

Correlation measures the extent to which variables:

  1. covary
  2. depend on one another
  3. predict one another

The extent of correlation between two variables, by convention, is denoted r, and the correlation between variable X and variable Y is indicated by rXY.

Correlations are standardised to vary between -1 and +1, with 0 representing no relationship, -1 a perfect negative relationship, and +1 a perfect positive relationship.

A variety of bivariate correlational statistics are available, the choice of which depends on the variables' level of measurement:

  • Nominal by nominal: Contingency table, Pearson's chi-square test, Phi/Cramer's V
  • Ordinal by ordinal: Spearman's rho, Kendall's tau-b
  • Dichotomous by interval/ratio: Point biserial correlation coefficient
  • Interval/ration by interval/ratio: Pearson product-moment correlation coefficient

Correlational analyses should be accompanied by appropriate bivariate graphs, such as:

  • Nominal by nominal: Clustered bar charts
  • Ordinal by ordinal: Scatterplot (with point bins)
  • Interval/ratio by interval/ratio: Scatterplot

The world is made of covariation

Responses which vary can be measured as a variable (i.e., responses are distributed across a range).

Responses to two or more variables may covary. These variables share some variation. When the value of one variable is high, the value of other variable tends to be high (positive correlation) or low (negative correlation).

If you look around, you may notice that the world is made of covariation! e.g.,

  • pollen count is positively correlated with bee activity
  • rainfall is positively correlated with amount of vegetation
  • hours of study is positively correlated with test performance
  • number of fire trucks attending a fire is correlated with cost of repairs for the fire[2]
  • Sibling's IQ is positively correlated
  • air temperature is negatively correlated with amount of clothing worn

The more you look, the more you'll see that there are many predictable patterns of co-occurrence between phenomena (i.e., things tend to occur together).

Scatterplots[edit]

Independent variable (IV) (predictor) is placed on the X axis and dependent variable (DV) is placed on the Y axis. Each case is plotted according to its X and Y value.

How might a restricted range of variables affect a correlation of those variables?

r = .76

Visual inspection of scatterplots is essential[edit]

It is unwise to rely solely on correlation as a statistic that indicates the nature of the relationship between variables without also examining a visualisation of the data such as through a scatterplot.

For example, the linear (straight-line) correlation in each of these four scatterplots is .82, yet the nature of what the data indicated about the relationship between the variables is very different for each.

How might a restricted range of variables affect a correlation of those variables?

Four sets of data with the same correlation of 0.816

Correlation does not equal causation

How might a restricted range of variables affect a correlation of those variables?

[http://www.burns.com/wcbspurcorl.htm Spurious correlations]

See Correlation does imply causation (Wikipedia)

Range restrictio

How might a restricted range of variables affect a correlation of those variables?

Pearson/Spearman correlation coefficients between X and Y are shown when the two variables' ranges are unrestricted, and when the range of X is restricted to the interval (0,1).

How might a restricted range of variables affect a correlation of those variables?

See outliers and restricted range

(출처 : http://en.wikiversity.org/wiki/Correlation)

How does restriction of range affect correlation?

Whenever a sample has a restricted range of scores, the correlation will be reduced. To take the most extreme example, consider what the correlation between high-school GPA and college GPA would be in a sample where every student had the same high-school GPA. The correlation would necessarily be 0.0.

What is restriction of range and what impact does it have on correlation coefficient?

The range restriction causes the standard deviation of the variable x to shrink which reduces the correlation of x to some variable y. Analogue, range restriction can also affect the coefficients in multiple regression.

What are the consequences of having restricted range?

Restricted range can affect statistical inferences, typically in the direction of underestimating the effect sizes and underestimating validity coefficients of associations between predictors and outcomes.

What effect does restricted range have on the Pearson product moment correlation?

Whenever a sample has a restricted range of scores, the correlation will be reduced since the sample scores do not represent the full range of possible values.