Measures of VariabilityMany courses require that students in undergraduate degree programs have a basic understanding of descriptive statistics. Descriptive statistics are statistics that collect, summarize, classify and present data. This guide gives you an overview of one type of descriptive statistics, measures of variability. Show
Measures of variability are measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations. The measures of variability discussed in this handout are:
RangeThe range is the difference between the highest and lowest scores in a distribution. It is calculated by subtracting the lowest score from the highest score. When is the range useful? Example: 14 40 42 47 49 51 71 81 (81 - 14) = 67 The range for this distribution would be (81 – 14) = 67. However, as you can see, 14 is an outlier and skews this distribution because, if it were not in the distribution, the range would be (81 – 40) = 41. How else is the range useful Sum of SquaresThe sum of squares is a measure of variance or deviation from the mean. It is calculated by summing of the squares of each score’s difference from the mean. The total sum of squares is another sum of squares; it considers not only the sum of squares from the factors, but also from randomness or error because the squaring of each score rids the equation of negative numbers. As you can see in the above example showing how to obtain the variance, step 5 requires you to find the sum of squares (SS). EXAMPLE:
Standard Deviation (SD)The standard deviation is the square root of the variance. Unlike the variance, the standard deviation is measured in the same units as the raw scores themselves; therefore, one cannot just use the variance. This is what makes the standard deviation more meaningful. For example, it would make more sense to discuss the variability of a set of IQ scores in IQ points than in squared IQ points because they would not be congruent with the score's meaning. VariablesEXAMPLE: Data Set: 2, 4, 4, 4, 5, 5, 7, 9 Find the mean n = 5 (2+4+4+4+5+5+7+9)n=(2+4+4+4+5+5+7+9)5=5 Calculate the deviations of each data point from the mean, and square the result of each: (2-5)2 = (-3)2 = 9 (4-5)2 = (-1)2 = 1 (4-5)2 = (-1)2 = 1 (4-5)2 = (-1)2 = 1 (5-5)2 = (0)2 = 0 (5-5)2 = (0)2 = 0 (7-5)2 = 22 = 4 (9-5)2 = 42 = 16 The variance is the mean of these values: (9+1+1+1+0+0+4+16)n=(9+1+1+1+0+0+4+16)8=4 Standard deviation is equal to the square root of the variance: √4 = 2 VariabilityVariance is the degree to which scores vary from their mean. The variance uses every score in the data set. The variance is calculated by getting the average of the squared deviations from the mean. Variance Equation
To calculate the variance for a set of quiz scores:
See example Other Exercises and ResourcesSkewed Distribution: Examples & Definition Descriptive Statistics Outlier in Statistics Skewed Distribution Variance Example ProblemExample Population Size: Data Set: 5, 6, 7, 11, 12, 12, 13, 14, 18, 19, 21, 21, 22, 24, 35, 35, 50 Step 1: Find the mean n = 18 mean(n) = 5+6+7+11+12+12+13+14+18+19+21+21+22+24+35+35+5018=36018=20 Step 2 and 3: Find the deviation (D) from each score and square the deviation scores (SS). D = x-μ SS = (x-μ)2
Step 4: Find the Sum sum = 225 + 196 + 169 + 81 + 64 + 64 + 49 + 36 + 4 + 1 + 1 + 1 + 4 + 16 +225 + 225 + 225 + 900 =2486n=248618=138.11 Answer: The variance is 138.11 Try the Exercise Try this exerciseUsing the steps in the Variance Example tab, find the variance for the following data set. 8 11 12 14 17 17 18 19 22 29 35 38 Measures of Variability - Answer KeyData Set: 8 11 12 14 17 17 18 19 22 29 35 38 n = 12 μ = mean Step 1: Find the mean μ = sumn =8+11+12+14+17+17+18+19+22+29+35+3812=24012=20 Step 2 and 3: Find the Deviation and Sum of Squares
sum = (-12) + (-9) + (-8) + (-6) + (-3) + (-3) + (-2) + (-1) + 2 + 9 + 15 + 18 = 0 Step 5: Find the variance variance = sumn=98212=81.83 Measures of Central TendencyMeasures of central tendency are the methods of determining central values in a population. The following are the three main measures of central tendency.
Depending on the shape of a distribution, one of these measures may be more accurate than the others. In symmetrical, unimodal datasets, the mean is the most accurate measure of central tendency. For asymmetrical (skewed), unimodal datasets, the median is likely to be more accurate. For bimodal distributions, the only measure that can capture central tendency accurately is the mode. The mode is the most frequently occurring number within a data set. If two scores occur equally as often within a data set, the set is called bimodal because it has two modes. Any data set that has two or more modes is multimodal. There is no equation for finding the mode; you just simply count the number of times each score occurs to find the mode. If the data set is multimodal, then you report all modes. EXAMPLES:
MedianThe median is the middle score in a set of scores that have been ranked in numerical order. In sequences that have an even number of scores, the median is between the two middle scores and calculated as the middle of those two scores unless the two scores have the same value. EXERCISE: Order these sequences from smallest to largest to find the median
When should you use the median to describe your statistics? The median is a measure of central tendency that should be used with frequencies that have scores that are heavily skewed because the median is resistant to outliers. EXAMPLE: the following sequence of scores has been ranked to illustrate the skew within the distribution: 1 4 5 6 7 17 21 21 22 23 24 26 27 31 32 44 109 As you can see, the frequency is skewed, which is indicated by the abnormally large score at the end of the sequence. Sample Mean vs. Arithmetic Mean (also known as population mean)In a sample mean, the scores are from the same sample and the mean is denoted by M. When the scores are from a population, you must use an arithmetic mean, which is denoted by μ (pronounced “mew”). Therefore, the respective equations for the sample mean and arithmetic mean are as follows: M= ∑X N μ = ∑X N Notice: the equations are the same; the only difference is the symbol used to represent what kind of mean you are looking at. EXERCISE: find the mean of the following data sets
MeanThe mean is the average of a sequence of scores. The mean is calculated by summing scores and dividing that sum by the total number of scores. S, or “sigma”, is the Greek symbol for summing. M= ∑ X / N EXAMPLE: this answer was found by summing the scores within a data set and then dividing by the number of scores. 4 + 5 + 5 + 5 + 5 + 8 + 8 + 9 + 11 + 11 + 11 + 12 + 12 + 14 + 15 = 135 135/15 = 9 When should you use the mean? The mean of a data set can be helpful when it is a relatively normal distribution. However, the mean can be misleading if the frequency of scores is heavily skewed. Interactions of the Mean, Median, and ModeAs you can see in Pearson’s diagram below, the mean is equal to the mode and the median in a normal or symmetrical distribution, while in a negatively skewed distribution the mean is to the left of the median and the mode, while the positively skewed distribution has a mean that is to the right of the median and mode. The Mean and Distribution ShapeThe frequency distributions below show a normal distribution, a positively skewed distribution, and a negatively skewed distribution. Symmetrical or Normal DistributionsIn a normal (or symmetrical) distribution, the mean is in the center of a distribution. SkewnessSkewness is a measure of the lack of symmetry in a distribution. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. A skew occurs when a population’s mean or mode is shifted to the left or right of the median and/or the mode. They can be negative or positive. If there are outliers within the frequency, the distribution will be skewed and the mean will not be representative of the group. An outlier is a number that lies outside of the distribution’s range. EXAMPLE: In a distribution with an outlier or in a heavily skewed distribution (the data is not
normally distributed), the mean is pulled in the direction of the outlier or skew, and is thus not the most accurate measure of central tendency. Under these circumstances, the median will better describe the dataset. (2+2+2+3+3)/5 = 2.4 (without outlier) (2+2+2+3+3+12)/6 = 4 (with outlier) Negatively Skewed Distributions (tail to the left):In negatively skewed distributions, the mean is less than the median and the median is less than the mode. The mean is the lowest measure of central tendency in negatively skewed distributions. Why is the mean the lowest measure of central tendency in negatively skewed distributions? Positively Skewed Distributions (tail to the right):In positively skewed distributions, the mode is less than the median and the median is less than the mean. Therefore, the mean is the highest measure of central tendency in positively skewed distributions. Why is the mean the highest measure of central tendency in positively skewed distributions? KurtosisKurtosis is a measure of a distribution’s peak. It can be peaked or flat relative to a normal distribution. Leptokurtic data sets with high kurtosis have a distinct peak near the mean, decline rapidly, and have heavy tails. Platykurtic data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. Finally, mesokurtic data sets are symmetrical and have a moderate peak. DefinitionsBimodal: a frequency distribution that has two modes. Descriptive Statistics: statistics that collect, summarize, classify and present data. Mean: the average of a sample or a population of scores. Measures of Central Tendency: the methods of determining central values in a population. Measures of Variability: Measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations. Median: the middle score in a set of scores that have been ranked in numerical order. Mode: the most frequently occurring number within a data set. Multimodal: a frequency distribution that has two or more modes. Outlier: A data point that is distinctly separate from the rest of the data. Range: The range is the difference between the highest and lowest scores in a distribution. Skew: A skew occurs when a population’s mean or mode is shifted to the left or right of the median or the mode. They can be negative or positive. The mean is less than the median in a negatively skewed population because there are some low scores that shift the mean to the left. The mode is always less than the mean and median in a positively skewed population. Standard Deviation: Square root of the variance. Sum of Squares: The sum of squares is a measure of variance or deviation from the mean. It is calculated by summing the squares of each score’s difference from the mean. It is the sum of squared deviations. Variability/Variance: Degree to which the scores vary from their mean. |