You need to understand the measures of variability to: No Response |
Figure 2.5: Normal Distribution
So, what variability refers to is how dispersed or spread out the data values are, or looking at it from another point of view how wide the data distribution is when it is graphed. If all data values are the same, then, of course, there is zero variability. The graph of the distribution would have zero width. If all the values lie very close to each other there is little variability and the distribution's graph would be quite narrow. If, on the other hand, the numbers are spread out all over the place, there is more variability and the graph would be wider.
Variability has to do with the: No Response |
Variability is measured by: No Response |
1.4.2 Standard Deviation
One way to measure the spread of information or data is by looking at the standard deviation. It's just the mean spread which you extract from the information (see the standard deviation formula below).Standard Deviation Formula
To get the standard deviation, as you can see in the formula, first you square the distances values are from the mean. Then you sum those squared differences. Then you divide that sum by the number of differences. Finally, you take the square root of that quotient. The reaon that you subtract and square is pretty clear. Whether the value is above the mean or below the mean the squared difference between the value and the mean comes out the same when it is squared. So positive and negative makes no difference here. If you didn't square, they would tend to cancel each other out. When you divide by the number of values to get an average you find the square root of the whole thing because, it was squared earlier, to get back to the original measures. In other words by squaring to get rid of the negative and positive values you get squared measures. So you take the square root to get back to the original more intuitive kinds of measures like feet, cubic inches, or whatever else it might be. The standard deviation can be thought of as the average distance that values are from the mean of the distribution (see the standard deviation formula above).
The standard deviation measures: No Response |
Of course, given the formula, to compute a standard deviation you must be able to compute a meaningful mean. Consequently, computation of the standard deviation requires interval or ratio variables. Furthermore, in a distribution having a bell (normal) curve, it always turns out that when you know the standard deviation, you also know that approximately 68% of the values lie within 1 standard deviation of the mean. You also know that approximately 2.1% of the values lie in each tail of the distribution beyond 2 standard deviations from the mean (again see Figure 2.5).
In a normal distribution the percentage of scores within 1 standard deviation of the mean is approximately: No Response |
1.4.3 Interquartile Range
You should recall that the median is the point in the distribution that 50% of the sample is below and 50% is above. In other words the median is at the 50th percentile. Quartiles can also be defined. The 1st quartile is at the 25th percentile. The 2nd quartile is at the 50th percentile. The 3rd quartile is at the 75th percentile. And, the 4th quartile is at the 100th percentile.The interquartile range then extends from the 25th percentile to the 75th percentile. It includes 50% of the values in the sample. So, the interquartile range is the distance between the 25th percentile and the 75th percentile. The interquartile range then is another measure of variability. But unlike the standard deviation, it can be appropriately applied with ordinal variables. Therefore it is used especially in conjunction with nonparametric statistics (see the interquartile range in the figure below).
Ranges
So, another way to display data that's been proposed by exploratory data analysis is to rank the data from low to high, then find the median and then the quartile values, the values between which one half of the data resides. When you do this you can then plot a box plot containing half the data (see the figure below). The rest of the data is out in the wings. And, you can see the interquartile range which contains those values between the lower and upper quartiles. You'll see more explicit clinical medicine examples of this in Lesson 1.5.
Exploratory Data Analysis
Is it appropriate to compute the standard deviation when the data consists of rankings? No Response |
1.4.4 Range
The range is simply the difference between the highest and lowest value in the sample (see the figure below). It's a simple measure to compute and to understand. Unfortunately, it is particulary sensitive to extreme scores on the one hand and lacks sensitive to varying values between those extremes. Still you come across it fairly frequently in the literature.Ranges
The range measures: No Response |